integer linear programming on 3-partition of a special set - algorithm

background: Sis a set consisting of the following 7-length sequences s: (1) each digit of s is a, b, or c; (2) s has and only has one digit that is c.
T is a set consisting of the following 7-length sequences t: (1) each digit of t is a, b, or c; (2) t has two digits that are c.
Is there a 3-partition S=A0⋃A1⋃A2, Aj∩Ai=∅ with the following property: for any Aj and any t ∈ T, there is a s ∈ Aj such that exsits a n∈{1,2,3,4,5,6,7}, sn≠tn, tn=c and sm=tm for any m≠n, where sn (or tn) is the n-th digit of s (or t).
For example, t=ccaabca and s=acaabca where n=1.
I used integer linear programming to solve the problem via lingo. I do not know how to solve the original problem directly, but I'd like to have the A0 as small as possible via lingo first.
Here is the code:
MODEL:
SETS:
Y/1..448/:C,X;
Z/1..672/;
cooperation(Y,Z):A;
ENDSETS
DATA:
A=#the big incidence matrix#
C=#1,1,1,... 448 times 1#
ENDDATA;
MIN=#SUM(Y:C*X);
#FOR(Y:#BIN(X));
#for(Z(j):#sum(Y(i):X(i)*A(i,j))>1);
#for(Z(j):#sum(Y(i):X(i)*A(i,j))<2);
END
But the code run a long time without any answer.
I appreciate any answers to original questions or suggestions for lingo code.

Seems like a coding theory problem, which tend to be very hard,
especially with integer programming due to the symmetry (maybe you have
access to a good solver with symmetry breaking, but I still tend to
think that constraint programming will fare better). The smallest part
of the partition must have at most ⌊448/3⌋ = 149 strings, yet a quick
constraint solver setup (OR-Tools CP-SAT solver, below) couldn’t get
there in the time that I ran it.
import itertools
import operator
from ortools.sat.python import cp_model
S = set()
T = set()
for s in itertools.product("abc", repeat=7):
k = s.count("c")
if k == 1:
S.add("".join(s))
elif k == 2:
T.add("".join(s))
def hamming(s, t):
return sum(map(operator.ne, s, t))
edges = [(s, t) for s in S for t in T if hamming(s, t) == 1]
model = cp_model.CpModel()
include = {s: model.NewBoolVar(s) for s in S}
for t in T:
model.AddBoolOr([include[s] for s in S if hamming(s, t) == 1])
model.Minimize(sum(include.values()))
solver = cp_model.CpSolver()
solver.parameters.log_search_progress = True
status = solver.Solve(model)
print({s for s in include if solver.Value(include[s])})

Related

Canonical form for set of lists

I am given two unordered sets, each containing m lists of n items. Example for m=4 and n=3:
D1 = {[4,2,1], [3,3,1], [4,2,3], [1,2,1]}
D2 = {[3,2,3], [4,2,3], [1,1,3], [4,2,1]}
The two sets are considered equivalent if there is a one-to-one correspondence between the elements in their respective lists. In the example above, D1 and D2 are equivalent because there is an assignment (1,2,3,4) in D1 ↔ (3,2,1,4) in D2.
In this example, the items are numbers, but this does not really matter, because I only care about the equivalence relation between two sets and not about the items themselves.
I am looking for a fast way to check if two given sets are equivalent. Rather than performing a backtracking search to find the assignments between items, can the sets be serialized in a unique (canonical) form, so that two sets can be shown to be equivalent if their canonical forms are identical?
Update: Even though this problem seems to be intractable in general (see answer below), it turns out that a search with backtracking works well in practice for my data. Below is the pseudocode for my implementation:
s = new stack(of level)
x1_x2 = new dictionary(of int, int)
bound_x2s = new set(of int)
function setsEquivalent(d1: set, d2: set) : boolean
if d1.m <> d2.m or d1.n <> d2.n: return false
s.push(new level)
do until s.size = 0
m1 = s.size
m2 = s.top.m2
if m1 > m
return true
elseif s.top.m2 > m
backtrack()
else
s.push(new level)
for k = 1..n
if not try_bind(d1.m(m1)(k), d2.m(m2)(k))
backtrack()
exit for
return false
function try_bind(x1: int, x2: int) : boolean
if x1_x2.containskey(x1)
return x1_x2(x1) = x2
elseif bound_x2s.contains(x2)
return false
else
x1_x2.add(x1,x2)
bound_x2s.add(x2)
s.top.boundx1s.add(x1)
return true
procedure backtrack()
for each x1 in s.top.boundx1s:
bound_x2s.remove(x1_x2(x1))
x1_x2.remove(x1)
s.pop
if s.size <> 0
s.top.m2 += 1
record level
m2 = 1
boundx1s = new list(of int)
Your problem is at least as hard as the Graph Isomorphism Problem. A directed graph can be represented as a set of lists of length 2, which is a special case of your problem. Furthermore, the directed graph isomorphism problem is known to have the same complexity as the graph isomorphism problem. Thus, a special case of your problem is as hard as the full graph isomorphism problem. The exact complexity of graph isomorphism isn't know. There are no known polynomial time algorithms for it, though it is not conjectured to be NP complete.
Since there is no easy solution to the graph isomorphism problem, I doubt that serialization will provide an easy solution to your problem.

How is `(d*a)mod(b)=1` written in Ruby?

How should I write this:
(d*a)mod(b)=1
in order to make it work properly in Ruby? I tried it on Wolfram, but their solution:
(da(b, d))/(dd) = -a/d
doesn't help me. I know a and b. I need to solve (d*a)mod(b)=1 for d in the form d=....
It's not clear what you're asking, and, depending on what you mean, a solution may be impossible.
First off, (da(b, d))/(dd) = -a/d, is not a solution to that equation; rather, it's a misinterpretation of the notation used for partial derivatives. What Wolfram Alpha actually gave you was:
, which is entirely unrelated.
Secondly, if you're trying to solve (d*a)mod(b)=1 for d, you may be out of luck. For any value of a and b, where a and b have a common prime factor, there are an infinite number of values of d that satisfy the equation. If a and b are coprime, you can use the formula given in LutzL's answer.
Additionally, if you're looking to perform symbolic manipulation of equations, Ruby is likely not the proper tool. Consider using a CAS, like Python's SymPy or Wolfram Mathematica.
Finally, if you're just trying to compute (d*a)mod(b), the modulo operator in Ruby is %, so you'd write (d*a)%(b).
You are looking for the modular inverse of a modulo b.
For any two numbers a,b the extended euclidean algorithm
g,u,v = xgcd(a, b)
gives coefficients u,v such that
u*a+v*b = g
and g is the greatest common divisor. You need a,b co-prime, preferably by ensuring that b is a prime number, to get g=1 and then you can set d=u.
xgcd(a,b)
if b = 0
return (a,1,0)
q,r = a divmod b
// a = q*b + r
g,u,v = xgcd(b, r)
// g = u*b + v*r = u*b + v*(a-q*b) = v*a+(u-q*v)*b
return g,v,u - q*v

Genetic Algorithm - Best crossover operator for a weights assignment

According to your experience, what is the best crossover operator for weights assignment problem.
In particular, I am facing a constraint that force to be 1 the sum of the all weights. Currently, I am using the uniform crossover operator and then I divide all the parameters by the sum to get 1. The crossover works, but I am not sure that in this way I can save the good part of my solution and go to converge to a better solution.
Do you have any suggestion? No problem, if I need to build a custom operator.
If your initial population is made up of feasible individuals you could try a differential evolution-like approach.
The recombination operator needs three (random) vectors and adds the weighted difference between two population vectors to a third vector:
offspring = A + f (B - C)
You could try a fixed weighting factor f in the [0.6 ; 2.0] range or experimenting selecting f randomly for each generation or for each difference vector (a technique called dither, which should improve convergence behaviour significantly, especially for noisy objective functions).
This should work quite well since the offspring will automatically be feasible.
Special care should be taken to avoid premature convergence (e.g. some niching algorithm).
EDIT
With uniform crossover you are exploring the entire n-dimensional space, while the above recombination limits individuals to a subspace H (the hyperplane Σi wi = 1, where wi are the weights) of the original search space.
Reading the question I assumed that the sum-of-the-weights was the only constraint. Since there are other constraints, it's not true that the offspring is automatically feasible.
Anyway any feasible solution must be on H:
If A = (a1, a2, ... an), B = (b1, ... bn), C = (c1, ... cn) are feasible:
Σi ai = 1
Σi bi = 1
Σi ci = 1
so
Σi (ai + f (bi - ci)) =
Σi ai + f (Σi bi - Σi ci) =
1 + f (1 - 1) = 1
The offspring is on the H hyperplane.
Now depending on the number / type of additional constraints you could modify the proposed recombination operator or try something based on a penalty function.
EDIT2
You could determine analytically the "valid" range of f, but probably something like this is enough:
f = random(0.6, 2.0);
double trial[] = {f, f/2, f/4, -f, -f/2, -f/4, 0};
i = 0;
do
{
offspring = A + trial[i] * (B - C);
i = i + 1;
} while (unfeasible(offspring));
return offspring;
This is just a idea, I'm not sure how it works.

Fast solution to Subset sum

Consider this way of solving the Subset sum problem:
def subset_summing_to_zero (activities):
subsets = {0: []}
for (activity, cost) in activities.iteritems():
old_subsets = subsets
subsets = {}
for (prev_sum, subset) in old_subsets.iteritems():
subsets[prev_sum] = subset
new_sum = prev_sum + cost
new_subset = subset + [activity]
if 0 == new_sum:
new_subset.sort()
return new_subset
else:
subsets[new_sum] = new_subset
return []
I have it from here:
http://news.ycombinator.com/item?id=2267392
There is also a comment which says that it is possible to make it "more efficient".
How?
Also, are there any other ways to solve the problem which are at least as fast as the one above?
Edit
I'm interested in any kind of idea which would lead to speed-up. I found:
https://en.wikipedia.org/wiki/Subset_sum_problem#cite_note-Pisinger09-2
which mentions a linear time algorithm. But I don't have the paper, perhaps you, dear people, know how it works? An implementation perhaps? Completely different approach perhaps?
Edit 2
There is now a follow-up:
Fast solution to Subset sum algorithm by Pisinger
I respect the alacrity with which you're trying to solve this problem! Unfortunately, you're trying to solve a problem that's NP-complete, meaning that any further improvement that breaks the polynomial time barrier will prove that P = NP.
The implementation you pulled from Hacker News appears to be consistent with the pseudo-polytime dynamic programming solution, where any additional improvements must, by definition, progress the state of current research into this problem and all of its algorithmic isoforms. In other words: while a constant speedup is possible, you're very unlikely to see an algorithmic improvement to this solution to the problem in the context of this thread.
However, you can use an approximate algorithm if you require a polytime solution with a tolerable degree of error. In pseudocode blatantly stolen from Wikipedia, this would be:
initialize a list S to contain one element 0.
for each i from 1 to N do
let T be a list consisting of xi + y, for all y in S
let U be the union of T and S
sort U
make S empty
let y be the smallest element of U
add y to S
for each element z of U in increasing order do
//trim the list by eliminating numbers close to one another
//and throw out elements greater than s
if y + cs/N < z ≤ s, set y = z and add z to S
if S contains a number between (1 − c)s and s, output yes, otherwise no
Python implementation, preserving the original terms as closely as possible:
from bisect import bisect
def ssum(X,c,s):
""" Simple impl. of the polytime approximate subset sum algorithm
Returns True if the subset exists within our given error; False otherwise
"""
S = [0]
N = len(X)
for xi in X:
T = [xi + y for y in S]
U = set().union(T,S)
U = sorted(U) # Coercion to list
S = []
y = U[0]
S.append(y)
for z in U:
if y + (c*s)/N < z and z <= s:
y = z
S.append(z)
if not c: # For zero error, check equivalence
return S[bisect(S,s)-1] == s
return bisect(S,(1-c)*s) != bisect(S,s)
... where X is your bag of terms, c is your precision (between 0 and 1), and s is the target sum.
For more details, see the Wikipedia article.
(Additional reference, further reading on CSTheory.SE)
While my previous answer describes the polytime approximate algorithm to this problem, a request was specifically made for an implementation of Pisinger's polytime dynamic programming solution when all xi in x are positive:
from bisect import bisect
def balsub(X,c):
""" Simple impl. of Pisinger's generalization of KP for subset sum problems
satisfying xi >= 0, for all xi in X. Returns the state array "st", which may
be used to determine if an optimal solution exists to this subproblem of SSP.
"""
if not X:
return False
X = sorted(X)
n = len(X)
b = bisect(X,c)
r = X[-1]
w_sum = sum(X[:b])
stm1 = {}
st = {}
for u in range(c-r+1,c+1):
stm1[u] = 0
for u in range(c+1,c+r+1):
stm1[u] = 1
stm1[w_sum] = b
for t in range(b,n+1):
for u in range(c-r+1,c+r+1):
st[u] = stm1[u]
for u in range(c-r+1,c+1):
u_tick = u + X[t-1]
st[u_tick] = max(st[u_tick],stm1[u])
for u in reversed(range(c+1,c+X[t-1]+1)):
for j in reversed(range(stm1[u],st[u])):
u_tick = u - X[j-1]
st[u_tick] = max(st[u_tick],j)
return st
Wow, that was headache-inducing. This needs proofreading, because, while it implements balsub, I can't define the right comparator to determine if the optimal solution to this subproblem of SSP exists.
I don't know much python, but there is an approach called meet in the middle.
Pseudocode:
Divide activities into two subarrays, A1 and A2
for both A1 and A2, calculate subsets hashes, H1 and H2, the way You do it in Your question.
for each (cost, a1) in H1
if(H2.contains(-cost))
return a1 + H2[-cost];
This will allow You to double the number of elements of activities You can handle in reasonable time.
I apologize for "discussing" the problem, but a "Subset Sum" problem where the x values are bounded is not the NP version of the problem. Dynamic programing solutions are known for bounded x value problems. That is done by representing the x values as the sum of unit lengths. The Dynamic programming solutions have a number of fundamental iterations that is linear with that total length of the x's. However, the Subset Sum is in NP when the precision of the numbers equals N. That is, the number or base 2 place values needed to state the x's is = N. For N = 40, the x's have to be in the billions. In the NP problem the unit length of the x's increases exponentially with N.That is why the dynamic programming solutions are not a polynomial time solution to the NP Subset Sum problem. That being the case, there are still practical instances of the Subset Sum problem where the x's are bounded and the dynamic programming solution is valid.
Here are three ways to make the code more efficient:
The code stores a list of activities for each partial sum. It is more efficient in terms of both memory and time to just store the most recent activity needed to make the sum, and work out the rest by backtracking once a solution is found.
For each activity the dictionary is repopulated with the old contents (subsets[prev_sum] = subset). It is faster to simply grow a single dictionary
Splitting the values in two and applying a meet in the middle approach.
Applying the first two optimisations results in the following code which is more than 5 times faster:
def subset_summing_to_zero2 (activities):
subsets = {0:-1}
for (activity, cost) in activities.iteritems():
for prev_sum in subsets.keys():
new_sum = prev_sum + cost
if 0 == new_sum:
new_subset = [activity]
while prev_sum:
activity = subsets[prev_sum]
new_subset.append(activity)
prev_sum -= activities[activity]
return sorted(new_subset)
if new_sum in subsets: continue
subsets[new_sum] = activity
return []
Also applying the third optimisation results in something like:
def subset_summing_to_zero3 (activities):
A=activities.items()
mid=len(A)//2
def make_subsets(A):
subsets = {0:-1}
for (activity, cost) in A:
for prev_sum in subsets.keys():
new_sum = prev_sum + cost
if new_sum and new_sum in subsets: continue
subsets[new_sum] = activity
return subsets
subsets = make_subsets(A[:mid])
subsets2 = make_subsets(A[mid:])
def follow_trail(new_subset,subsets,s):
while s:
activity = subsets[s]
new_subset.append(activity)
s -= activities[activity]
new_subset=[]
for s in subsets:
if -s in subsets2:
follow_trail(new_subset,subsets,s)
follow_trail(new_subset,subsets2,-s)
if len(new_subset):
break
return sorted(new_subset)
Define bound to be the largest absolute value of the elements.
The algorithmic benefit of the meet in the middle approach depends a lot on bound.
For a low bound (e.g. bound=1000 and n=300) the meet in the middle only gets a factor of about 2 improvement other the first improved method. This is because the dictionary called subsets is densely populated.
However, for a high bound (e.g. bound=100,000 and n=30) the meet in the middle takes 0.03 seconds compared to 2.5 seconds for the first improved method (and 18 seconds for the original code)
For high bounds, the meet in the middle will take about the square root of the number of operations of the normal method.
It may seem surprising that meet in the middle is only twice as fast for low bounds. The reason is that the number of operations in each iteration depends on the number of keys in the dictionary. After adding k activities we might expect there to be 2**k keys, but if bound is small then many of these keys will collide so we will only have O(bound.k) keys instead.
Thought I'd share my Scala solution for the discussed pseudo-polytime algorithm described in wikipedia. It's a slightly modified version: it figures out how many unique subsets there are. This is very much related to a HackerRank problem described at https://www.hackerrank.com/challenges/functional-programming-the-sums-of-powers. Coding style might not be excellent, I'm still learning Scala :) Maybe this is still helpful for someone.
object Solution extends App {
var input = "1000\n2"
System.setIn(new ByteArrayInputStream(input.getBytes()))
println(calculateNumberOfWays(readInt, readInt))
def calculateNumberOfWays(X: Int, N: Int) = {
val maxValue = Math.pow(X, 1.0/N).toInt
val listOfValues = (1 until maxValue + 1).toList
val listOfPowers = listOfValues.map(value => Math.pow(value, N).toInt)
val lists = (0 until maxValue).toList.foldLeft(List(List(0)): List[List[Int]]) ((newList, i) =>
newList :+ (newList.last union (newList.last.map(y => y + listOfPowers.apply(i)).filter(z => z <= X)))
)
lists.last.count(_ == X)
}
}

Combinatorial Optimization - Variation on Knapsack

Here is a real-world combinatorial optimization problem.
We are given a large set of value propositions for a certain product. The value propositions are of different types but each type is independent and adds equal benefit to the overall product. In building the product, we can include any non-negative integer number of "units" of each type. However, after adding the first unit of a certain type, the marginal benefit of additional units of that type continually decreases. In fact, the marginal benefit of a new unit is the inverse of the number of units of that type, after adding the new unit. Our product must have a least one unit of some type, and there is a small correction that we must make to the overall value because of this requirement.
Let T[] be an array representing the number of each type in a certain production run of the product. Then the overall value V is given by (pseudo code):
V = 1
For Each t in T
V = V * (t + 1)
Next t
V = V - 1 // correction
On cost side, units of the same type have the same cost. But units of different types each have unique, irrational costs. The number of types is large, but we are given an array of type costs C[] that is sorted from smallest to largest. Let's further assume that the type quantity array T[] is also sorted by cost from smallest to largest. Then the overall cost U is simply the sum of each unit cost:
U = 0
For i = 0, i < NumOfValueTypes
U = U + T[i] * C[i]
Next i
So far so good. So here is the problem: Given product P with value V and cost U, find the product Q with the cost U' and value V', having the minimal U' such that U' > U, V'/U' > V/U.
The problem you've described is nonlinear integer programming problem because it contains a product of integer variables t. Its feasibility set is not closed because of strict inequalities which can be worked around by using non-strict inequalities and adding a small positive number (epsilon) to the right hand sides. Then the problem can be formulated in AMPL as follows:
set Types;
param Costs{Types}; # C
param GivenProductValue; # V
param GivenProductCost; # U
param Epsilon;
var units{Types} integer >= 0; # T
var productCost = sum {t in Types} units[t] * Costs[t];
minimize cost: productCost;
s.t. greaterCost: productCost >= GivenProductCost + Epsilon;
s.t. greaterValuePerCost:
prod {t in Types} (units[t] + 1) - 1 >=
productCost * GivenProductValue / GivenProductCost + Epsilon;
This problem can be solved using a nonlinear integer programming solver such as Couenne.
Honestly I don't think there is an easy way to solve this. The best thing would be to write the system and solve it with a solver ( Excel solver will do the tricks, but you can use Ampl to solve this non lienar program.)
The Program:
Define: U;
V;
C=[c1,...cn];
Variables: T=[t1,t2,...tn];
Objective Function: SUM(ti.ci)
Constraints:
For all i: ti integer
SUM(ti.ci) > U
(PROD(ti+1)-1).U > V.SUM(ti.ci)
It works well with excel, (you just replace >U by >=U+d where d is the significative number of the costs- (i.e if C=[1.1, 1.8, 3.0, 9.3] d =0.1) since excel doesn't allow stric inequalities in the solver.)
I guess with a real solver like Ampl it will work perfectly.
Hope it helps,

Resources