Evenly partition contiguous sequences of numbers - algorithm

A frequent task in parallelizing N embarrassingly parallel work chunks contiuguously among K workers is to use the following algorithm to partition, in psuedocode:
acc = 0
for _ in range(K):
end = acc + ceil(N/K)
emit acc:end
acc = end
This will emit K contiguous paritions generally of size N/K and works fine for large N. However if K is approximately N this may cause imbalance because the last worker will get very few items. If we define imbalance as the maximum absolute difference between partition sizes, then an iterative algorithm that starts from any random partition and reduces potential until the maximum difference is 1 (or 0 if K divides N) is going to be optimal.
It seems to me that the following may be a more efficient way of getting at the same answer, by performing online "re-planning". Does this algorithm have a name and optimality proof?
acc = 0
workers = K
while workers > 0:
rem = N - acc
end = acc + ceil(rem/workers)
emit acc:end
acc = end
workers -= 1
Edit. Given that we can define the loop above recursively, I can see that an inductive optimality proof might work. In any case, the name and confirmation of its optimality would be appreciated :)

A simple way of dividing the range is:
for i in range(K):
emit (i*N // K):((i+1)*N // K)
This has the advantage of being itself parallelizable since the iterations do not need to be performed in order.
It is easy to prove that every partition has either floor(N/K) or ceil(N/K) elements, and it is evident that every element will be in exactly one partition. Since floor and ceiling differ by at most 1, the algorithm must be optimal.
The algorithm you suggest is also optimal (and the results are similar). I don't know its name, though.
Another way of dividing the ranges which can be done in parallel is to use the range start(N, K, i):start(N, K, i+1) where start(N, K, i) is (N//K)*i + min(i, N%K). (Note that N//K and N%K only need to be computed once.) This algorithm is also optimal, but distributes the imbalance so that the first partitions are the larger ones. That may or may not be useful.

Here's a simpler approach. You have floor(N/K) tasks which can be perfectly partitioned among the workers, leaving N mod K remaining tasks. To keep the regions contiguous, you can put the remaining tasks on the first N mod K workers.
Here it is in imperative style. Just to be clear, I'm numbering the tasks {0..(N-1)}, and emitting sets of contiguous task numbers.
offset = 0
for 0 <= i < K:
end = offset + floor(N/K)
if i < N mod K:
end = end + 1
emit {c | offset <= c < end}
offset = end
And in a more declarative style:
chunk = floor(N/K)
rem = N mod K
// i == worker number
function offset(i) =
i * chunk + (i if i < rem else rem)
for 0 <= i < K:
emit {c | offset(i) <= c < offset(i+1)}
The proof of optimality is pretty trivial at this point. Worker i has offset(i+1) - offset(i) tasks assigned to it. Depending on i, this is either floor(N/K) or floor(N/K) + 1 tasks.

Related

Fast algorithm for distributing a value over a histogram?

I am looking for a fast (both in terms of complexity (the size of the problem may get close to 2^32) and in terms of the constant) algorithm, that doesn't necessarily have to compute the optimal solution (so a heuristic is acceptable if it produces results "close" to the optimal and has a "considerable" advantage in terms of computation time compared to computing the optimal solution) for a specific problem.
I have an integer histogram A: |A| = n, A[i]>0; and a value R: 0<R<=A[0]+...+A[n-1]. I must distribute -R over the histogram as evenly as possible. Formally this means something like this (there is some additional information in the formal notation too): I need to find B, such that |B| = |A| && B[i] = A[i] - C[i], where 0<=C[i]<=A[i] && C[0]+...+C[n-1] = R and C must minimize the expressions: L_2 = C[0]^2 + ... + C[n-1]^2 and L_infinity = max(C[0], ..., C[n-1]). Just from the formulation one can see that the problem doesn't necessarily have a unique solution (consider A[0] = 1, A[1] = 1 and R = 1, then both B[0]=0, B[1]=1 and B'[0]=1, B'[1]=0 are optimal solutions), an additional constraint may be added such as if A[i]<A[j] then C[i]<C[j] but it is not as important in my case. Naively one can iterate over all possibilities for C[i] (R-combination with repetitions) and find the optimal solutions, but obviously that is not very fast for larger n.
Another possible solution is finding q = R/n and r=R%n, then iterating over all elements and storing diff[i] = A[i]-q, if diff[i]<=0 then r-=diff[i] && B[i] = 0 && remove A[i], then continue with all non-removed A[i], by setting them to A[i] = diff[i], R = r, and n=n-removedElementsCount. If iterating this process, then at each step we would remove at least one element, until we reach the point where q == 0 or we have only 1 element, then we just need to only have A[i]-=1 for R such elements from A, since by then R<n in the q==0 case or just have A[i]-=R if we are in the case where we have only 1 element leftover (the case where we have 0 elements is trivial). Since we remove at least one element each step, and we need to iterate over (n - step) elements in the worst case, then we have a complexity of O((1+...+n)) = O(n^2).
I am hoping that somebody is already familiar with a better algorithm or if you have any ideas I'll be glad to hear them (I am aware that this can be regarded as an optimization problem also).
edit: made R positive so it would be easier to read.
Edit 2: I realized I messed up the optimization criteria.
Turn your histogram into an array of (value, index) pairs, and then turn it into a min heap. This operation is O(n).
Now your C is going to take some set of values to 0, reduce some by the max amount, and the rest by 1 less than the max amount. The max amount that you'd like to reduce everything by is easy to calculate, it is R/n rounded up.
Now go through the heap. As long as the value for the bottom of the heap is < ceil(R/size of heap), that value at that index will be set to zero, and remove that from the heap in time O(log(n)). Once that loop finishes, you can assign the max value and 1 less than the max value randomly to the rest.
This will run in O(n log(n)) worst time. You will hit that worst case when O(n) elements have to be zeroed out.
I came up with a very simple greedy algorithm in O(n*log(n)) time (if somebody manages to solve it in O(n) though I'll be glad to hear).
Algorithm:
Given: integer array: A[0],...,A[|A|-1]: A[i]>=0; integer: R0: 0<=R0<=A[0]+...+A[|A|-1].
Base:
Sort A in ascending order - takes O(n*log(n) time.
Set i = 0; R = R0; n = |A|; q = floor(R/n); r = R - q*n; d = q;.
if(i==|A| or R==0) goto 6.;
if(i>=|A|-r) d = q + 1;
4.
if(A[i]>=d)
{
R-=d;
A[i]-=d;
}
else
{
R-=A[i];
A[i] = 0;
n = |A|-(i+1);
q = floor(R/n);
d = q;
r = R - q*n;
}
i=i+1; goto 2.;
if(R>0) A[|A|-1] -= R; return A;
Informal solution optimality proof:
Let n = |A|.
Case 0: n==1 -> C[0] = R
Case 1: n>1 && A[i]>=q && A[j]>=q+1 for j>=max(0,n-r)
The optimal solution is given by C[i] = q for i<n-r && C[j] = q+1 for i>=n-r.
Assume there is another optimal solution given by C'[i] = C[i] + E[i], where the constraints for E are: E[0]+...+E[m-1]=0 (otherwise C' would violate C'[0] + ... + C'[n-1] = R), C[i]>=-E[i] (otherwise C'[i] would violate the non-negativity constraint), E[i] <= A[i] - C[i] (from C'[i]<=A[i]), and E[i]<=E[j] for i<=j (from C[i]<=C[j] for A[i]<=A[j] && A[i]<=A[j] for i<=j), then:
L_2' - L_2 = 2*q*(E[0]+...+E[n-r-1]) + 2*(q+1)*(E[n-r]+...+E[n-1]) + (E[0]^2 + ... + E[n-1]^2) = 2*q*0 + (E[0]^2 + ... + E[n-1]^2) + 2*(E[n-r] + ... + E[n-1]) >= 0
The last inequality is true since for every term 2*E[n-i], 1<=i<=r, there is a corresponding term E[n-i]^2, 1<=i<=r to cancel it out if it is negative at least for E[n-i]<-1. Let us analyze the case where 2*E[n-i] = -2, obviously E[n-i]^2 = 1 is not enough to cancel it out in this case. However, since all elements of E sum to 0, there exists j!=n-i: such that E[j] compensates for it, since we have the term E[j]^2. From the last inequality follows L_2<=L_2' for every possible solution C', this implies that C minimizes L_2. It is trivial to see that the L_inf minimization is also satisfied: L_inf = q + (r>0) <= L_inf' = max(q+E[0], ... , q+E[n-r-1], q+1+E[n-r], ... , q+1+E[n-1]), if we were to have an E[i]>1 for i<n-r, or E[j]>0 for j>=n-r, we get a higher maximum, we can also never decrease the maximum, since E sums to 0.
Case 2: n>1 && there exists k: A[k]<q
In this case the optimal solution requires that C[k] = A[k] for all k: A[k]<q. Let us assume that there exists an optimal solution C' such that C'[k]<A[k]<q -> C'[k]<q-1. There exists i>=k, such that C'[i]<q-1 && C'[i+1]>=q-1. Assume there is no such i, then C'[k] == C[n-1] < q-1, and C'[0]+...+C'[n-1]<n*q-n<R, this is a contradiction, which implies that such an i actually does exist. There also exists a j>k such that C[j]>q && C[j-1]<C[j] (if we assume this is untrue we once again get a contradiction with C summing to R). We needed these proofs in order to satisfy C[t]<=C[l] for t<=l. Let us consider the modified solution C''[t] = C'[t] for t!=i,j; and C''[i] = C'[i]+1, and C''[j] = C'[j]-1. L_2' - L_2'' = C'[i]^2 - (C'[i]+1)^2 + C'[j]^2 - (C'[j]-1)^2 = -2*C'[i] + 2*C'[j] - 2 = 2*((C'[j]-C'[i])-1) > 2*(1-1) = 0. The last inequality follows from (C'[i]<q-1 && C'[j]>q) -> C'[j] - C'[i] > 1. We proved that L_2'>L_2'' if we increment C[i]: C[i]<A[i]<q. By induction the optimal solution should have C[l]=A[l] for all l: A[l]<q. Once this is done one can inductively continue with the reduced problem n' = n-(i+1), R' = R - (C[0]+...+C[i]), q' = floor(R'/n'), r' = R' - q'*n', D[0] = A[i+1], ..., D[n'-1] = A[n-1].
Case 3: n>1 && A[i]>=q && A[j]<q+1 for j==max(0,n-r)
Since A[k]>=A[i] for k>=i, that implies that A[i]<q+1 for i<=j. But since we have also q<=A[i] this implies A[i]==q, so we cannot add any of the remainder in any C[i] : i<=j. The optimality of C[i]=A[i]=q for i<j follows from a proof done in case 1 (the proof there was more general with q+1 terms). Since the problem is optimal for 0<=i<j we can start solving a reduced problem: D[0] = A[j],...,D[n-j] = A[n-1].
Case 0, 1, 2, 3 are all the possible cases. Apart from case 0 and case 1 which give the solution explicitly, the solution in 2 and 3 reduces the problem to a smaller one which once again falls in one of the cases. Since the problem is reduced at every step, we get the final solution in a finite number of steps. We also never refer to an element more than once which implies O(n) time, but we need O(n*log(n)) for the sorting, so in the end we have O(n*log(n)) time complexity for the algorithm. I am unsure whether this problem can be solved in O(n) time, but I have the feeling that there is no way to get away without the sorting since case 2 and 3 rely on it heavily, so maybe O(n*log(n)) is the best possible complexity that can be achieved.

Count of divisors of numbers till N in O(N)?

So, we can count divisors of each number from 1 to N in O(NlogN) algorithm with sieve:
int n;
cin >> n;
for (int i = 1; i <= n; i++) {
for (int j = i; j <= n; j += i) {
cnt[j]++; //// here cnt[x] means count of divisors of x
}
}
Is there way to reduce it to O(N)?
Thanks in advance.
Here is a simple optimization on #גלעד ברקן's solution. Rather than use sets, use arrays. This is about 10x as fast as the set version.
n = 100
answer = [None for i in range(0, n+1)]
answer[1] = 1
small_factors = [1]
p = 1
while (p < n):
p = p + 1
if answer[p] is None:
print("\n\nPrime: " + str(p))
limit = n / p
new_small_factors = []
for i in small_factors:
j = i
while j <= limit:
new_small_factors.append(j)
answer[j * p] = answer[j] + answer[i]
j = j * p
small_factors = new_small_factors
print("\n\nAnswer: " + str([(k,d) for k,d in enumerate(answer)]))
It is worth noting that this is also a O(n) algorithm for enumerating primes. However with the use of a wheel generated from all of the primes below size log(n)/2 it can create a prime list in time O(n/log(log(n))).
How about this? Start with the prime 2 and keep a list of tuples, (k, d_k), where d_k is the number of divisors of k, starting with (1,1):
for each prime, p (ascending and lower than or equal to n / 2):
for each tuple (k, d_k) in the list:
if k * p > n:
remove the tuple from the list
continue
power = 1
while p * k <= n:
add the tuple to the list if k * p^power <= n / p
k = k * p
output (k, (power + 1) * d_k)
power = power + 1
the next number the output has skipped is the next prime
(since clearly all numbers up to the next prime are
either smaller primes or composites of smaller primes)
The method above also generates the primes, relying on O(n) memory to keep finding the next prime. Having a more efficient, independent stream of primes could allow us to avoid appending any tuples (k, d_k) to the list, where k * next_prime > n, as well as free up all memory holding output greater than n / next_prime.
Python code
Consider the total of those counts, sum(phi(i) for i=1,n). That sum is O(N log N), so any O(N) solution would have to bypass individual counting.
This suggests that any improvement would need to depend on prior results (dynamic programming). We already know that phi(i) is the product of each prime degree plus one. For instance, 12 = 2^2 * 3^1. The degrees are 2 and 1, respective. (2+1)*(1+1) = 6. 12 has 6 divisors: 1, 2, 3, 4, 6, 12.
This "reduces" the question to whether you can leverage the prior knowledge to get an O(1) way to compute the number of divisors directly, without having to count them individually.
Think about the given case ... divisor counts so far include:
1 1
2 2
3 2
4 3
6 4
Is there an O(1) way to get phi(12) = 6 from these figures?
Here is an algorithm that is theoretically better than O(n log(n)) but may be worse for reasonable n. I believe that its running time is O(n lg*(n)) where lg* is the https://en.wikipedia.org/wiki/Iterated_logarithm.
First of all you can find all primes up to n in time O(n) using the Sieve of Atkin. See https://en.wikipedia.org/wiki/Sieve_of_Atkin for details.
Now the idea is that we will build up our list of counts only inserting each count once. We'll go through the prime factors one by one, and insert values for everything with that as the maximum prime number. However in order to do that we need a data structure with the following properties:
We can store a value (specifically the count) at each value.
We can walk the list of inserted values forwards and backwards in O(1).
We can find the last inserted number below i "efficiently".
Insertion should be "efficient".
(Quotes are the parts that are hard to estimate.)
The first is trivial, each slot in our data structure needs a spot for the value. The second can be done with a doubly linked list. The third can be done with a clever variation on a skip-list. The fourth falls out from the first 3.
We can do this with an array of nodes (which do not start out initialized) with the following fields that look like a doubly linked list:
value The answer we are looking for.
prev The last previous value that we have an answer for.
next The next value that we have an answer for.
Now if i is in the list and j is the next value, the skip-list trick will be that we will also fill in prev for the first even after i, the first divisible by 4, divisible by 8 and so on until we reach j. So if i = 81 and j = 96 we would fill in prev for 82, 84, 88 and then 96.
Now suppose that we want to insert a value v at k between an existing i and j. How do we do it? I'll present pseudocode starting with only k known then fill it out for i = 81, j = 96 and k = 90.
k.value := v
for temp in searching down from k for increasing factors of 2:
if temp has a value:
our_prev := temp
break
else if temp has a prev:
our_prev = temp.prev
break
our_next := our_prev.next
our_prev.next := k
k.next := our_next
our_next.prev := k
for temp in searching up from k for increasing factors of 2:
if j <= temp:
break
temp.prev = k
k.prev := our_prev
In our particular example we were willing to search downwards from 90 to 90, 88, 80, 64, 0. But we actually get told that prev is 81 when we get to 88. We would be willing to search up to 90, 92, 96, 128, 256, ... however we just have to set 92.prev 96.prev and we are done.
Now this is a complicated bit of code, but its performance is O(log(k-i) + log(j-k) + 1). Which means that it starts off as O(log(n)) but gets better as more values get filled in.
So how do we initialize this data structure? Well we initialize an array of uninitialized values then set 1.value := 0, 1.next := n+1, and 2.prev := 4.prev := 8.prev := 16.prev := ... := 1. And then we start processing our primes.
When we reach prime p we start by searching for the previous inserted value below n/p. Walking backwards from there we keep inserting values for x*p, x*p^2, ... until we hit our limit. (The reason for backwards is that we do not want to try to insert, say, 18 once for 3 and once for 9. Going backwards prevents that.)
Now what is our running time? Finding the primes is O(n). Finding the initial inserts is also easily O(n/log(n)) operations of time O(log(n)) for another O(n). Now what about the inserts of all of the values? That is trivially O(n log(n)) but can we do better?
Well first all of the inserts to density 1/log(n) filled in can be done in time O(n/log(n)) * O(log(n)) = O(n). And then all of the inserts to density 1/log(log(n)) can likewise be done in time O(n/log(log(n))) * O(log(log(n))) = O(n). And so on with increasing numbers of logs. The number of such factors that we get is O(lg*(n)) for the O(n lg*(n)) estimate that I gave.
I haven't shown that this estimate is as good as you can do, but I think that it is.
So, not O(n), but pretty darned close.

total sums with n number of digits

so I got a bit of a stumper for you. I need to write a function that takes a number (call this K) and outputs n numbers where the sum if these numbers == K.
For instance, if I give this function (100,3) it will output [1,2,97], [1,3,96], [1,4,95]... [97,1,2]
I have the function worked out for three digits:
k = 100
r = []
0.upto(k/2) do |a|
(a+1).upto(k/2) do |b|
c = k-(a+b)
r << [a,b,c]
end
end
How would I write this function that takes an n amount of digits?
This probably isn't the best solution in the world (memory required grows with O(k^3)), but it's a solution. I'm welcome to suggestions for improvement.
You might be interested in reading about integer partitions, which is what we're counting here.
You're looking for a function f(k,n) that will count the number of ways to partition a number k into exactly n parts. Part of the problem is it's hard to tell when you count a partition twice.
I'll solve this problem by using another function g(k,n,s) that counts the number of ways to partition a number k into n parts where the maximum value allowed is s. So for example, we don't count the partitions (90,8,2) or (64,20,16) in g(100,3,60) since they use values greater than s=60.
g(k,n,s) = f(k,n) when s>=k (i.e. we don't place a maximum on the values allowed in the partition).
A few facts about g(k,n,s):
k==n implies g=1 since the only way to partition k into k parts is by using all 1s (hence s is irrelevant since we use the lowest number possible)
n>k implies g=0 since we can't partition k into more than k parts
s==1 implies g=1 if k==n and g=0 otherwise, since placing a maximum value of 1 only allows a partition of k into k parts (all ones)
n==1 implies g=1 if s>=k and g=0 otherwise, since the only partition of k into n=1 parts requires that we use k itself in the partition
s<RoundUp(k/n) implies g=0 since we can't partition k into n parts using values less than k/n; for example we can't partition 100 int o 4 pieces using only values less than 25.
s>k-n+1 implies g(k,n,s) = g(k,n,s-1) since increasing the max value s after k-n+1 doesn't add any new partitions; for example, any partition of 100 into 3 parts will never include a number greater than 100-3+1 = 98
g(k,n,s) = g(k,n,s-1) + g(k-s,n-1,s) in every other case. This just adds any partitions using a value of s to all the previous partitions we've counted using a maximum value of s-1
Now I just choose maximum number K and throw all these facts into a nested for loop and deduce every value of g(k,n,s). To get f(k,n) I just find g(k,n,k).
Here's the algorithm, not suitable for large K.
g = (K by K by K) array of all zeros
for k = 1:K
for n = 1:K
for s = 1:K
if (k==n)
val = 1;
else if (n>k)
val = 0;
else if s==1
val = int(k==n);
else if n==1
val = int(s>=k);
else if s<RoundUp(k/n)
val = 0;
else if s>(k-n+1)
val = g(k,n,s-1);
else
val = g(k,n,s-1) + g(k-s,n-1,s);
end
g(k,n,s) = val;
end
end
end
For f(100,3) = g(100,3,100) I get 833 unique partitions, which you can see is correct if you use a brute force method. Please point out mistakes if you see any.

Where in this method are numbers checked for primality?

Here is a method I found to find the highest prime factor of a number.
Yet there is dark mystery within - including something I once read was forbidden - changing the condition of a loop within the loop.
def factorize(orig) # 600851475143
factors = Hash.new(0)
n = orig
i = 2
sqi = 4
while sqi <= n do
if n % i == 0
n /= i
factors[i] = true
puts "Found factor #{i}"
end
i += 1
sqi = i**2
puts "sqi is #{sqi}"
end
if (n != 1) && (n != orig)
factors[n] = true
end
p factors
end
puts factorize(600851475143).keys.max
So I see (sort of) how the factors are found.
But where in these lines is the factor checked to make sure it it is prime?
What mathematical insight am I missing?
Thanks
Your method is slightly wrong (just slightly). It should look like this:
def factorize(orig)
factors = Hash.new(0)
n, i, sqi = orig, 2, 4
while sqi <= n do
if n % i == 0
n /= i
factors[i] = true
puts "Found factor #{i}"
else
sqi += 2 * i + 1
i += 1
end
puts "sqi is #{sqi}"
end
if (n != 1) && (n != orig)
factors[n] = true
end
p factors
end
The difference here is that now, I only increase i (and sqi) when i is not a factor of n. This is because, like the example of 16 that was highlighted earlier, a number can have multiple instances of any one prime factor, so we should keep checking a number until it is no longer a factor.
Now this method does guarantee primality, because it always finds the smallest factor of the number (conversely, it only increases the factor it's checking, if it's no longer a factor, which is saying the same thing). And of course the smallest factor of a number must be prime:
Proof By Contradiction The smallest factor of a number is prime.
Suppose the smallest factor, f of a number N is not prime.
Then f has itself, has factors x and y where 1 < x, y < f holds true.
As a result, x and y must also be factors of N and, x, y are both less than f!
This is a contradiction, because we said f was the smallest factor of N.
So our original assumption about f is false, and f must be prime.
I got to this result by inspecting the invariant of the loop, which I will add to this answer in due course.
EDIT: Notes on Invariants
An invariant of a loop, is a predicate condition that remains true, before, during and after the running of the loop, and we can use it to prove that a loop is providing us with the answer we want.
In the case of our loop, there is a simple invariant we must keep track of, which is sqi = i**2 which simply states that sqi must always hold the value of the square of i. This invariant exists to save us recalculating the square every time to compare it with n. (Which by the way, is why I've changed it to incrementing by 2 * i + 1 in my method, otherwise you might as well put i*i in the condition of the loop).
The other part of the invariant is that the factors hash (which mathematically I will treat as a set of numbers) is the set of factors of the number k such that n * k = orig.
The final, and most important part of the invariant is that i <= f where f is the smallest factor of n. (This means that n % i = 0 only when i = f, which means that the loop always finds the smallest factor of n, which is a prime factor of n).
Writing the invariant is only half the battle, we also need to prove that our method always follows it:
The first part of the invariant is simple, because we see whenever we update i we update sqi correctly, and they begin as 2 and 4 respectively.
The second part, similarly is pretty simple, because we only add i as a factor when
n % i == 0 is true, and at the same time, we divide n by i, so as to ensure that the factor added to k is removed from n.
Now let's look at the part of the invariant that's crucial to ensuring the list only contains prime factors. Well, to begin with i = 2 which is the smallest factor of any
number (not including 1 due to its awkwardness when it comes to primality). Then we need to be certain that we increment i as late as possible. I.e. when we are sure that it can no longer be a factor.
Our code only increments i when it is not a factor of n. If the invariant held before, this means that i <= f and i is not a factor, therefore i < f. So the correct behaviour is to increment to get i closer to f.
This logic is enough to suggest that when i is not a factor, we should increment it, but not enough to suggest we shouldn't always increment i, for which we need this next piece of logic: If i is a factor of n, it means i = f, however, it doesn't tell us anything about whether the next smallest factor is strictly greater than f (as we've seen with 16, the next smallest factor could be equal to the previous). So this means we shouldn't increment i if it is a factor, because doing so may make us miss the next smallest factor.
I hope this bit convinces you about the correctness of the program. It is also possible to write factorize with a nested while loop, which I feel might be a little bit simpler to reason about, but they both work basically identically.
if n % i == 0 checks if n is divisible by i. If it is then it sets factors[i] = true, if a number has no factors (apart from itself and one); then it is prime.
The factor is not checked to make sure it's prime. That's why it breaks (giving non-prime factor) for orig==32:
Found factor 2
sqi is 9
sqi is 16
Found factor 4
sqi is 25
{2=>true, 4=>true}
4
It could be fixed (while retaining the same logic, i.e. without major rewrites) by replacing if n % i == 0 with while n % i == 0 do (that is, divide n by i while it's possible): then by the time we reach a composite i, all its prime factors would be already "factored out" during prior iterations.

Looking for an optimal online assignment algorithm

I'm looking for a solution to an assignment problem where tasks come and need to be assigned sequentially, but you can make tasks wait for up to K periods.
Formally, let there be an ordered sequence of tasks aabbaba... and an ordered sequence of resources ABBABAA..., and the system can use a side stack. The aim is to match the most a (resp b) tasks to A (resp B) resources.
The constraints are as following: each period i the program gets the resource i and assigns it to a task. The resource is assigned either to a task from the stack, or it continues to read from the sequence of tasks in order. Each task that is read can either be immediately assigned to resource i, or can be put on the stack IF it will wait there less than K periods and will be assigned to it's match (a->A,b->B).
If K=0 than the i-th task must be assigned to the i-th resource, which is pretty bad. If K>0 than you can do better with a greedy algorithm. What is the optimal solution?
Clarification:
Denote the assignment by a permutation m where m(i)=j means that resource j was assigned to task i. If there is no stack m(i)=i . When there is a stack tasks can be assigned out of order, but if a task later than i is put in the stack than i must be assigned one of the following K resources. That is, the assignment is legal if for all tasks i:
m(i) <= Min{ m(i') s.t. i'> i } + K
I am looking for the algorithm that will find the assignment that has the minimal amount of miss assignments (aB or bA) out of all the assignments satisfying the constraints.
You can formulate the problem this way:
ressources=[a,b,b,a,b,a,a,...]
tasks=[a,a,b,b,a,b,a,...]
We can define a cost function of assigning task j to ressource i:
C(i,j)= (ressources[i]==tasks[j])*1+(ressources[i]!=tasks[j])*1000
I choose 1000 >> 1 in the case you're unable to meet the requirements.
Let's write the constraint:
xi,j =1 if you assign task j to
ressource j, and 0 otherwise.
moreover, xi,j =0 if i-j>K
since you follow the ressources one by one
and you can wait k period max (i-j<=K)
Only one xi,j can be equal to one for
all i,j.
Then you get the following linear program:
Minimise: Sum(C(i,j)*xi,j)
Subject to:
xi,j in {0,1}
Sum(xi,j) = 1 for all i
Sum(xi,j) = 1 for all j
xi,j = 0 if i-j>K
xi,j>=0 otherwise
You may need to correct a little bit the constraints...once corrected this solution should be optimal but I am not sure the greedy algorithm is not already optimal.
It gets more interesting to use this formulation with more than 2 different ressources.
Hope I understood your question and it helps
Modifications:
I will translate this constraint :
m(i) <= Min{ m(i') s.t. i'> i } + K
Note:
if xi,j =1 then Sum(j*xi,j on i) = j since only one xi,j = 1
"translation":
with the previous notations:
_m(i) <= Min{ m(i') s.t. i'> i } + K_
< = > j <= Min{j' s.t i'>i and xi',j' =1 } + K_ (OK ?)
New linear constraint:
we have :
xi,j=1 < = > Sum(j*xi,j on j) = j for i
and
xi',j'=1 < = > Sum(j'*xi',j' on j') = j' for all i'
Therefore :
j <= Min{j' s.t i'>i and xi',j' =1 } + K_
< = >
Sum(j*xi,j on j) <= Min { Sum(j'*xi',j' on j') , i'>i} + K
and less than min is equivalent to less than each .
Then the new set of constraints is:
Sum(j*xi,j on j) <= Sum(j'*xi',j' on j') + K for all i' > i
You can add these constraints to the previous ones, and you get a linear program.
You can solve this with a simplex algorithm.

Resources