Looking for an optimal online assignment algorithm - algorithm

I'm looking for a solution to an assignment problem where tasks come and need to be assigned sequentially, but you can make tasks wait for up to K periods.
Formally, let there be an ordered sequence of tasks aabbaba... and an ordered sequence of resources ABBABAA..., and the system can use a side stack. The aim is to match the most a (resp b) tasks to A (resp B) resources.
The constraints are as following: each period i the program gets the resource i and assigns it to a task. The resource is assigned either to a task from the stack, or it continues to read from the sequence of tasks in order. Each task that is read can either be immediately assigned to resource i, or can be put on the stack IF it will wait there less than K periods and will be assigned to it's match (a->A,b->B).
If K=0 than the i-th task must be assigned to the i-th resource, which is pretty bad. If K>0 than you can do better with a greedy algorithm. What is the optimal solution?
Clarification:
Denote the assignment by a permutation m where m(i)=j means that resource j was assigned to task i. If there is no stack m(i)=i . When there is a stack tasks can be assigned out of order, but if a task later than i is put in the stack than i must be assigned one of the following K resources. That is, the assignment is legal if for all tasks i:
m(i) <= Min{ m(i') s.t. i'> i } + K
I am looking for the algorithm that will find the assignment that has the minimal amount of miss assignments (aB or bA) out of all the assignments satisfying the constraints.

You can formulate the problem this way:
ressources=[a,b,b,a,b,a,a,...]
tasks=[a,a,b,b,a,b,a,...]
We can define a cost function of assigning task j to ressource i:
C(i,j)= (ressources[i]==tasks[j])*1+(ressources[i]!=tasks[j])*1000
I choose 1000 >> 1 in the case you're unable to meet the requirements.
Let's write the constraint:
xi,j =1 if you assign task j to
ressource j, and 0 otherwise.
moreover, xi,j =0 if i-j>K
since you follow the ressources one by one
and you can wait k period max (i-j<=K)
Only one xi,j can be equal to one for
all i,j.
Then you get the following linear program:
Minimise: Sum(C(i,j)*xi,j)
Subject to:
xi,j in {0,1}
Sum(xi,j) = 1 for all i
Sum(xi,j) = 1 for all j
xi,j = 0 if i-j>K
xi,j>=0 otherwise
You may need to correct a little bit the constraints...once corrected this solution should be optimal but I am not sure the greedy algorithm is not already optimal.
It gets more interesting to use this formulation with more than 2 different ressources.
Hope I understood your question and it helps
Modifications:
I will translate this constraint :
m(i) <= Min{ m(i') s.t. i'> i } + K
Note:
if xi,j =1 then Sum(j*xi,j on i) = j since only one xi,j = 1
"translation":
with the previous notations:
_m(i) <= Min{ m(i') s.t. i'> i } + K_
< = > j <= Min{j' s.t i'>i and xi',j' =1 } + K_ (OK ?)
New linear constraint:
we have :
xi,j=1 < = > Sum(j*xi,j on j) = j for i
and
xi',j'=1 < = > Sum(j'*xi',j' on j') = j' for all i'
Therefore :
j <= Min{j' s.t i'>i and xi',j' =1 } + K_
< = >
Sum(j*xi,j on j) <= Min { Sum(j'*xi',j' on j') , i'>i} + K
and less than min is equivalent to less than each .
Then the new set of constraints is:
Sum(j*xi,j on j) <= Sum(j'*xi',j' on j') + K for all i' > i
You can add these constraints to the previous ones, and you get a linear program.
You can solve this with a simplex algorithm.

Related

Efficiently sum max(Ai+Bj, Bi+Aj) over all i, j

You are given two integer arrays A and B of length N. You have to find the value of two summation:
Z=Σ Σ max(Ai+Bj, Bi+Aj)
Here is my brute force algorithm
for loop (i to length)
for loop (j to length)
sum+=Math.max(A[i]+B[j], A[j]+B[i]);
Please tell me a better efficient algorithm for this.
Rewrite the sum as Z = Σi Σj [max(Ai−Bi, Aj−Bj) + Bi + Bj] by using the distributive property of plus over max. Then construct C = A−B, sort it, and return Σi (2i+1)Ci + 2n Σi Bi (using zero-based indexing).
A minor improvement I can think of is to omit the results that you already computed. This means, instead of beginning the inner loop from 0, you can start with j = i. Since you already have computed the results for j < i in the previous loops.
To achieve this, you can change the instruction in the inner loop to the following:
if i != j
sum += 2 * Math.max(A[i]+B[j], A[j]+B[i]);
else
sum += Math.max(A[i]+B[j], A[j]+B[i]);
The reason is that every pair of i and j are visited twice by the loops.

Fast algorithm for distributing a value over a histogram?

I am looking for a fast (both in terms of complexity (the size of the problem may get close to 2^32) and in terms of the constant) algorithm, that doesn't necessarily have to compute the optimal solution (so a heuristic is acceptable if it produces results "close" to the optimal and has a "considerable" advantage in terms of computation time compared to computing the optimal solution) for a specific problem.
I have an integer histogram A: |A| = n, A[i]>0; and a value R: 0<R<=A[0]+...+A[n-1]. I must distribute -R over the histogram as evenly as possible. Formally this means something like this (there is some additional information in the formal notation too): I need to find B, such that |B| = |A| && B[i] = A[i] - C[i], where 0<=C[i]<=A[i] && C[0]+...+C[n-1] = R and C must minimize the expressions: L_2 = C[0]^2 + ... + C[n-1]^2 and L_infinity = max(C[0], ..., C[n-1]). Just from the formulation one can see that the problem doesn't necessarily have a unique solution (consider A[0] = 1, A[1] = 1 and R = 1, then both B[0]=0, B[1]=1 and B'[0]=1, B'[1]=0 are optimal solutions), an additional constraint may be added such as if A[i]<A[j] then C[i]<C[j] but it is not as important in my case. Naively one can iterate over all possibilities for C[i] (R-combination with repetitions) and find the optimal solutions, but obviously that is not very fast for larger n.
Another possible solution is finding q = R/n and r=R%n, then iterating over all elements and storing diff[i] = A[i]-q, if diff[i]<=0 then r-=diff[i] && B[i] = 0 && remove A[i], then continue with all non-removed A[i], by setting them to A[i] = diff[i], R = r, and n=n-removedElementsCount. If iterating this process, then at each step we would remove at least one element, until we reach the point where q == 0 or we have only 1 element, then we just need to only have A[i]-=1 for R such elements from A, since by then R<n in the q==0 case or just have A[i]-=R if we are in the case where we have only 1 element leftover (the case where we have 0 elements is trivial). Since we remove at least one element each step, and we need to iterate over (n - step) elements in the worst case, then we have a complexity of O((1+...+n)) = O(n^2).
I am hoping that somebody is already familiar with a better algorithm or if you have any ideas I'll be glad to hear them (I am aware that this can be regarded as an optimization problem also).
edit: made R positive so it would be easier to read.
Edit 2: I realized I messed up the optimization criteria.
Turn your histogram into an array of (value, index) pairs, and then turn it into a min heap. This operation is O(n).
Now your C is going to take some set of values to 0, reduce some by the max amount, and the rest by 1 less than the max amount. The max amount that you'd like to reduce everything by is easy to calculate, it is R/n rounded up.
Now go through the heap. As long as the value for the bottom of the heap is < ceil(R/size of heap), that value at that index will be set to zero, and remove that from the heap in time O(log(n)). Once that loop finishes, you can assign the max value and 1 less than the max value randomly to the rest.
This will run in O(n log(n)) worst time. You will hit that worst case when O(n) elements have to be zeroed out.
I came up with a very simple greedy algorithm in O(n*log(n)) time (if somebody manages to solve it in O(n) though I'll be glad to hear).
Algorithm:
Given: integer array: A[0],...,A[|A|-1]: A[i]>=0; integer: R0: 0<=R0<=A[0]+...+A[|A|-1].
Base:
Sort A in ascending order - takes O(n*log(n) time.
Set i = 0; R = R0; n = |A|; q = floor(R/n); r = R - q*n; d = q;.
if(i==|A| or R==0) goto 6.;
if(i>=|A|-r) d = q + 1;
4.
if(A[i]>=d)
{
R-=d;
A[i]-=d;
}
else
{
R-=A[i];
A[i] = 0;
n = |A|-(i+1);
q = floor(R/n);
d = q;
r = R - q*n;
}
i=i+1; goto 2.;
if(R>0) A[|A|-1] -= R; return A;
Informal solution optimality proof:
Let n = |A|.
Case 0: n==1 -> C[0] = R
Case 1: n>1 && A[i]>=q && A[j]>=q+1 for j>=max(0,n-r)
The optimal solution is given by C[i] = q for i<n-r && C[j] = q+1 for i>=n-r.
Assume there is another optimal solution given by C'[i] = C[i] + E[i], where the constraints for E are: E[0]+...+E[m-1]=0 (otherwise C' would violate C'[0] + ... + C'[n-1] = R), C[i]>=-E[i] (otherwise C'[i] would violate the non-negativity constraint), E[i] <= A[i] - C[i] (from C'[i]<=A[i]), and E[i]<=E[j] for i<=j (from C[i]<=C[j] for A[i]<=A[j] && A[i]<=A[j] for i<=j), then:
L_2' - L_2 = 2*q*(E[0]+...+E[n-r-1]) + 2*(q+1)*(E[n-r]+...+E[n-1]) + (E[0]^2 + ... + E[n-1]^2) = 2*q*0 + (E[0]^2 + ... + E[n-1]^2) + 2*(E[n-r] + ... + E[n-1]) >= 0
The last inequality is true since for every term 2*E[n-i], 1<=i<=r, there is a corresponding term E[n-i]^2, 1<=i<=r to cancel it out if it is negative at least for E[n-i]<-1. Let us analyze the case where 2*E[n-i] = -2, obviously E[n-i]^2 = 1 is not enough to cancel it out in this case. However, since all elements of E sum to 0, there exists j!=n-i: such that E[j] compensates for it, since we have the term E[j]^2. From the last inequality follows L_2<=L_2' for every possible solution C', this implies that C minimizes L_2. It is trivial to see that the L_inf minimization is also satisfied: L_inf = q + (r>0) <= L_inf' = max(q+E[0], ... , q+E[n-r-1], q+1+E[n-r], ... , q+1+E[n-1]), if we were to have an E[i]>1 for i<n-r, or E[j]>0 for j>=n-r, we get a higher maximum, we can also never decrease the maximum, since E sums to 0.
Case 2: n>1 && there exists k: A[k]<q
In this case the optimal solution requires that C[k] = A[k] for all k: A[k]<q. Let us assume that there exists an optimal solution C' such that C'[k]<A[k]<q -> C'[k]<q-1. There exists i>=k, such that C'[i]<q-1 && C'[i+1]>=q-1. Assume there is no such i, then C'[k] == C[n-1] < q-1, and C'[0]+...+C'[n-1]<n*q-n<R, this is a contradiction, which implies that such an i actually does exist. There also exists a j>k such that C[j]>q && C[j-1]<C[j] (if we assume this is untrue we once again get a contradiction with C summing to R). We needed these proofs in order to satisfy C[t]<=C[l] for t<=l. Let us consider the modified solution C''[t] = C'[t] for t!=i,j; and C''[i] = C'[i]+1, and C''[j] = C'[j]-1. L_2' - L_2'' = C'[i]^2 - (C'[i]+1)^2 + C'[j]^2 - (C'[j]-1)^2 = -2*C'[i] + 2*C'[j] - 2 = 2*((C'[j]-C'[i])-1) > 2*(1-1) = 0. The last inequality follows from (C'[i]<q-1 && C'[j]>q) -> C'[j] - C'[i] > 1. We proved that L_2'>L_2'' if we increment C[i]: C[i]<A[i]<q. By induction the optimal solution should have C[l]=A[l] for all l: A[l]<q. Once this is done one can inductively continue with the reduced problem n' = n-(i+1), R' = R - (C[0]+...+C[i]), q' = floor(R'/n'), r' = R' - q'*n', D[0] = A[i+1], ..., D[n'-1] = A[n-1].
Case 3: n>1 && A[i]>=q && A[j]<q+1 for j==max(0,n-r)
Since A[k]>=A[i] for k>=i, that implies that A[i]<q+1 for i<=j. But since we have also q<=A[i] this implies A[i]==q, so we cannot add any of the remainder in any C[i] : i<=j. The optimality of C[i]=A[i]=q for i<j follows from a proof done in case 1 (the proof there was more general with q+1 terms). Since the problem is optimal for 0<=i<j we can start solving a reduced problem: D[0] = A[j],...,D[n-j] = A[n-1].
Case 0, 1, 2, 3 are all the possible cases. Apart from case 0 and case 1 which give the solution explicitly, the solution in 2 and 3 reduces the problem to a smaller one which once again falls in one of the cases. Since the problem is reduced at every step, we get the final solution in a finite number of steps. We also never refer to an element more than once which implies O(n) time, but we need O(n*log(n)) for the sorting, so in the end we have O(n*log(n)) time complexity for the algorithm. I am unsure whether this problem can be solved in O(n) time, but I have the feeling that there is no way to get away without the sorting since case 2 and 3 rely on it heavily, so maybe O(n*log(n)) is the best possible complexity that can be achieved.

Evenly partition contiguous sequences of numbers

A frequent task in parallelizing N embarrassingly parallel work chunks contiuguously among K workers is to use the following algorithm to partition, in psuedocode:
acc = 0
for _ in range(K):
end = acc + ceil(N/K)
emit acc:end
acc = end
This will emit K contiguous paritions generally of size N/K and works fine for large N. However if K is approximately N this may cause imbalance because the last worker will get very few items. If we define imbalance as the maximum absolute difference between partition sizes, then an iterative algorithm that starts from any random partition and reduces potential until the maximum difference is 1 (or 0 if K divides N) is going to be optimal.
It seems to me that the following may be a more efficient way of getting at the same answer, by performing online "re-planning". Does this algorithm have a name and optimality proof?
acc = 0
workers = K
while workers > 0:
rem = N - acc
end = acc + ceil(rem/workers)
emit acc:end
acc = end
workers -= 1
Edit. Given that we can define the loop above recursively, I can see that an inductive optimality proof might work. In any case, the name and confirmation of its optimality would be appreciated :)
A simple way of dividing the range is:
for i in range(K):
emit (i*N // K):((i+1)*N // K)
This has the advantage of being itself parallelizable since the iterations do not need to be performed in order.
It is easy to prove that every partition has either floor(N/K) or ceil(N/K) elements, and it is evident that every element will be in exactly one partition. Since floor and ceiling differ by at most 1, the algorithm must be optimal.
The algorithm you suggest is also optimal (and the results are similar). I don't know its name, though.
Another way of dividing the ranges which can be done in parallel is to use the range start(N, K, i):start(N, K, i+1) where start(N, K, i) is (N//K)*i + min(i, N%K). (Note that N//K and N%K only need to be computed once.) This algorithm is also optimal, but distributes the imbalance so that the first partitions are the larger ones. That may or may not be useful.
Here's a simpler approach. You have floor(N/K) tasks which can be perfectly partitioned among the workers, leaving N mod K remaining tasks. To keep the regions contiguous, you can put the remaining tasks on the first N mod K workers.
Here it is in imperative style. Just to be clear, I'm numbering the tasks {0..(N-1)}, and emitting sets of contiguous task numbers.
offset = 0
for 0 <= i < K:
end = offset + floor(N/K)
if i < N mod K:
end = end + 1
emit {c | offset <= c < end}
offset = end
And in a more declarative style:
chunk = floor(N/K)
rem = N mod K
// i == worker number
function offset(i) =
i * chunk + (i if i < rem else rem)
for 0 <= i < K:
emit {c | offset(i) <= c < offset(i+1)}
The proof of optimality is pretty trivial at this point. Worker i has offset(i+1) - offset(i) tasks assigned to it. Depending on i, this is either floor(N/K) or floor(N/K) + 1 tasks.

Sequence of numbers with constraints

I have a sequence of positive numbers x_1,x_2,...x_n and I want to find a consecutive subsequence where: 0< x_i-x_j < i-j , 1<= j< i <= n holds for all i,j . Define S(t) to be the length of the longest consecutive sequence ending in x_t..
e.g. if
S(t) = 3 then the above holds for x_t,x_{t-1},x_{t-2}
I am trying to find a recursion formula, and I am completely stuck. I tried play around a little with numbers in order to find some pattern:
S(5) = 2 would mean that S(5) = 2 + S(4) and S(4) must be $0$.But then maybe S(3) could be 1 so we must stop as soon as we found out that S(4) = 0
Base cases or maybe special cases S(0) = 0 ,S(1) = 0?
Is it possible to write S(k) in terms of S(k-1)?
I am trying to construct an algorithm for this but first I need to figure out a recursion formula.
S(0) = 0
If (x_n <= x_{n-1} or x_n - x_{n-1} >= 1) => S(n) = 0
else if S(n-1) = 0 => S(n)=2
else S(n) = S(n-1) + 1
There is no need for dynamic programming or recursion in this problem due to a simple relation of the comparison operator: it's transitive. This means:
a < b and b < c => a < c
We can transform the above inequality a bit:
x_i - x_j < i - j
x_i - i < x_j - j
This makes the whole problem a lot simpler:
Define a sequence y, where y_i = x_i - i. Find the longest strictly decreasing sequence ending at y_t.
Due to transitivity in a strictly decreasing sequence your condition always holds and vice versa. In fact these two constraints are equivalent in this case, as there can't be a constraint that is both decreasing and not violating the constraints on the sequence. So there's no need for recursion in this case at all, since the linear relationship is totally sufficient (transitivity).
Of course you could translate this search for the first non-decreasing pair into a recursive function (though it's more complicated than just going the linear path):
S(t) = {t = 0: 0
t = 1: 1
{x_t - t >= x_{t-1} - (t - 1)}: 1
else: 1 + S(t - 1)}

Algorithm: Find peak in a circle

Given n integers, arranged in a circle, show an efficient algorithm that can find one peak. A peak is a number that is not less than the two numbers next to it.
One way is to go through all the integers and check each one to see whether it is a peak. That yields O(n) time. It seems like there should be some way to divide and conquer to be more efficient though.
EDIT
Well, Keith Randall proved me wrong. :)
Here's Keith's solution implemented in Python:
def findPeak(aBase):
N = len(aBase)
def a(i): return aBase[i % N]
i = 0
j = N / 3
k = (2 * N) / 3
if a(j) >= a(i) and a(j) >= a(k)
lo, candidate, hi = i, j, k
elif a(k) >= a(j) and a(k) >= a(i):
lo, candidate, hi = j, k, i + N
else:
lo, candidate, hi = k, i + N, j + N
# Loop invariants:
# a(lo) <= a(candidate)
# a(hi) <= a(candidate)
while lo < candidate - 1 or candidate < hi - 1:
checkRight = True
if lo < candidate - 1:
mid = (lo + candidate) / 2
if a(mid) >= a(candidate):
hi = candidate
candidate = mid
checkRight = False
else:
lo = mid
if checkRight and candidate < hi - 1:
mid = (candidate + hi) / 2
if a(mid) >= a(candidate):
lo = candidate
candidate = mid
else:
hi = mid
return candidate % N
Here's a recursive O(log n) algorithm.
Suppose we have an array of numbers, and we know that the middle number of that segment is no smaller than the endpoints:
A[i] <= A[m] >= A[j]
for i,j indexes into an array, and m=(i+j)/2. Examine the elements midway between the endpoints and the midpoint, i.e. those at indexes x=(3*i+j)/4 and y=(i+3*j)/4. If A[x]>=A[m], then recurse on the interval [i,m]. If A[y]>=A[m], then recurse on the interval [m,j]. Otherwise, recurse on the interval [x,y].
In every case, we maintain the invariant on the interval above. Eventually we get to an interval of size 2 which means we've found a peak (which will be A[m]).
To convert the circle to an array, take 3 equidistant samples and orient yourself so that the largest (or one tied for the largest) is in the middle of the interval and the other two points are the endpoints. The running time is O(log n) because each interval is half the size of the previous one.
I've glossed over the problem of how to round when computing the indexes, but I think you could work that out successfully.
When you say "arranged in a circle", you mean like in a circular linked list or something? From the way you describe the data set, it sounds like these integers are completely unordered, and there's no way to look at N integers and come to any kind of conclusion about any of the others. If that's the case, then the brute-force solution is the only possible one.
Edit:
Well, if you're not concerned with worst-case time, there are slightly more efficient ways to do it. The naive approach would be to look at Ni, Ni-1, and Ni+1 to see if Ni is a peak, then repeat, but you can do a little better.
While not done
If N[i] < N[i+1]
i++
Else
If N[i]>N[i-1]
Done
Else
i+=2
(Well, not quite that, because you have to deal with the case where N[i]=N[i+1]. But something very similar.)
That will at least keep you from comparing Ni to Ni+1, adding 1 to i, and then redundantly comparing Ni to Ni-1. It's a distinctly marginal gain, though. You're still marching through the numbers, but there's no way around that; jumping blindly is unhelpful, and there's no way to look ahead without taking just as long as doing the actual work would be.

Resources