MILP - How to set variable to be absolute value of other variable [duplicate] - algorithm

I'm modeling a reoptimisation model and I would like to include a constraint in order to reduce the distance between the initial solution and the reoptimized solution. I'm doing a staff scheduling and to do so I wanna penalized each assignment in the reoptimized solution that is different from the initial solution.
Before I start, I'm new to optimisation model and the way I built the constraint may be wrong.
#1 Extract the data from the initial solution of my main variable
ModelX_DictExtVal = model.x.extract_values()
# 2 Create a new binary variable which activate when the main variable `ModelX_DictExtVal[x,s,d]` of the initial
#solution is =1 (an employee n works days d and sifht s) and the value of `model.x[n,s,d]` of the reoptimized solution are different.
model.alpha_distance = Var(model.N_S_D, within=Binary)
#3 Model a constraint to activate my variable.
def constraint_distance(model, n, s, d):
v = ModelX_DictExtVal[n,s,d]
if v == 1 and ModelX_DictExtVal[n,s,d] != model.x[n,s,d]:
return model.alpha_distance[n,s,d] == 1
elif v == 0:
return model.alpha_distance[n,s,d] == 0
model.constraint_distance = Constraint(model.N_S_D, rule = constraint_distance)
#4 Penalize in my objective function every time the varaible is equal to one
ObjFunction = Objective(expr = sum(model.alpha_distance[n,s,d] * WeightDistance
for n in model.N for s in model.S for d in model.D))
Issue: I'm not sure about what I'm doing in part 3 and I get an index error when v == 1.
ERROR: Rule failed when generating expression for constraint
constraint_distance with index (0, 'E', 6): ValueError: Constraint
'constraint_distance[0,E,6]': rule returned None
I am wondering since I am reusing the same model for re-optimization if the model keeps the value of the initial solution of model.x [n, s, d] to do the comparison ModelX_DictExtVal [n, s, d]! = model.x [n, s, d] during the re-optimization phase instead of the new assignments...

You are right to suspect part 3. :)
So you have some "initial values" that could be either the original schedule (before optimizing) or some other preliminary optimization. And your decision variable is binary, indexed by [n,s,d] if I understand your question.
In your constraint you cannot employ an if-else structure based on a comparison test of your decision variable. The value of that variable is unknown at the time the constraint is built, right?
You are on the right track, though. So, what you really want to do is to have your alpha_distance (or penalty) variable capture any changes, indicating 1 where there is a change. That is an absolute value operation, but can be captured with 2 constraints. Consider (in pseudocode):
penalty = |x.new - x.old| # is what you want
So introduce 2 constraints, (indexed fully by [n,s,d]):
penalty >= x.new - x.old
penalty >= x.old - x.new
Then, as you are doing now, include the penalty in your objective, optionally multiplied by a weight.
Comment back if that doesn't make sense...

Related

Pairing the weight of a protein sequence with the correct sequence

This piece of code is part of a larger function. I already created a list of molecular weights and I also defined a list of all the fragments in my data.
I'm trying to figure out how I can go through the list of fragments, calculate their molecular weight and check if it matches the number in the other list. If it matches, the sequence is appended into an empty list.
combs = [397.47, 2267.58, 475.63, 647.68]
fragments = ['SKEPFKTRIDKKPCDHNTEPYMSGGNY', 'KMITKARPGCMHQMGEY', 'AINV', 'QIQD', 'YAINVMQCL', 'IEEATHMTPCYELHGLRWV', 'MQCL', 'HMTPCYELHGLRWV', 'DHTAQPCRSWPMDYPLT', 'IEEATHM', 'MVGKMDMLEQYA', 'GWPDII', 'QIQDY', 'TPCYELHGLRWVQIQDYA', 'HGLRWVQIQDYAINV', 'KKKNARKW', 'TPCYELHGLRWV']
frags = []
for c in combs:
for f in fragments:
if c == SeqUtils.molecular_weight(f, 'protein', circular = True):
frags.append(f)
print(frags)
I'm guessing I don't fully know how the SeqUtils.molecular_weight command works in Python, but if there is another way that would also be great.
You are comparing floating point values for equality. That is bound to fail. You always have to account for some degree of error when dealing with floating point values. In this particular case you also have to take into account the error margin of the input values.
So do not compare floats like this
x == y
but instead like this
abs(x - y) < epsilon
where epsilon is some carefully selected arbitrary number.
I did two slight modifications to your code: I swapped the order of the f and the c loop to be able to store the calculated value of w. And I append the value of w to the list frags as well in order to better understand what is happening.
Your modified code now looks like this:
from Bio import SeqUtils
combs = [397.47, 2267.58, 475.63, 647.68]
fragments = ['SKEPFKTRIDKKPCDHNTEPYMSGGNY', 'KMITKARPGCMHQMGEY', 'AINV', 'QIQD', 'YAINVMQCL', 'IEEATHMTPCYELHGLRWV',
'MQCL', 'HMTPCYELHGLRWV', 'DHTAQPCRSWPMDYPLT', 'IEEATHM', 'MVGKMDMLEQYA', 'GWPDII', 'QIQDY',
'TPCYELHGLRWVQIQDYA', 'HGLRWVQIQDYAINV', 'KKKNARKW', 'TPCYELHGLRWV']
frags = []
threshold = 0.5
for f in fragments:
w = SeqUtils.molecular_weight(f, 'protein', circular=True)
for c in combs:
if abs(c - w) < threshold:
frags.append((f, w))
print(frags)
This prints the result
[('AINV', 397.46909999999997), ('IEEATHMTPCYELHGLRWV', 2267.5843), ('MQCL', 475.6257), ('QIQDY', 647.6766)]
As you can see, the first value for the weight differs from the reference value by about 0.0009. That's why you did not catch it with your approach.

How do I find the right optimisation algorithm for my problem?

Disclaimer: I'm not a professional programmer or mathematician and this is my first time encountering the field of optimisation problems. Now that's out of the way so let's get to the problem at hand:
I got several lists, each containing various items and number called 'mandatoryAmount':
listA (mandatoryAmountA, itemA1, itemA2, itemA2, ...)
Each item has certain values (each value is a number >= 0):
itemA1 (M, E, P, C, Al, Ac, D, Ab,S)
I have to choose a certain number of items from each list determined by 'mandatoryAmount'.
Within each list I can choose every item multiple times.
Once I have all of the items from each list, I'll add up the values of each.
For example:
totalM = listA (itemA1 (M) + itemA1 (M) + itemA3 (M)) + listB (itemB1 (M) + itemB2 (M))
The goals are:
-To have certain values (totalAl, totalAc, totalAb, totalS) reach a certain number cap while going over that cap as little as possible. Anything over that cap is wasted.
-To maximize the remaining values with different weightings each
The output should be the best possible selection of items to meet the goals stated above. I imagine the evaluation function to just add up all non-waste values times their respective weightings while subtracting all wasted stats times their respective weightings.
edit:
The total amount of items across all lists should be somewhere between 500 and 1000, the number of lists is around 10 and the mandatoryAmount for each list is between 0 and 14.
Here's some sample code that uses Python 3 and OR-Tools. Let's start by
defining the input representation and a random instance.
import collections
import random
Item = collections.namedtuple("Item", ["M", "E", "P", "C", "Al", "Ac", "D", "Ab", "S"])
List = collections.namedtuple("List", ["mandatoryAmount", "items"])
def RandomItem():
return Item(
random.random(),
random.random(),
random.random(),
random.random(),
random.random(),
random.random(),
random.random(),
random.random(),
random.random(),
)
lists = [
List(
random.randrange(5, 10), [RandomItem() for j in range(random.randrange(5, 10))]
)
for i in range(random.randrange(5, 10))
]
Time to formulate the optimization as a mixed-integer program. Let's import
the solver library and initialize the solver object.
from ortools.linear_solver import pywraplp
solver = pywraplp.Solver.CreateSolver("solver", "SCIP")
Make constraints for the totals that must reach a certain cap.
AlCap = random.random()
totalAl = solver.Constraint(AlCap, solver.infinity())
AcCap = random.random()
totalAc = solver.Constraint(AcCap, solver.infinity())
AbCap = random.random()
totalAb = solver.Constraint(AbCap, solver.infinity())
SCap = random.random()
totalS = solver.Constraint(SCap, solver.infinity())
We want to maximize the other values subject to some weighting.
MWeight = random.random()
EWeight = random.random()
PWeight = random.random()
CWeight = random.random()
DWeight = random.random()
solver.Objective().SetMaximization()
Create variables and fill in the constraints. For each list there is an
equality constraint on the number of items.
associations = []
for list_ in lists:
amount = solver.Constraint(list_.mandatoryAmount, list_.mandatoryAmount)
for item in list_.items:
x = solver.IntVar(0, solver.infinity(), "")
amount.SetCoefficient(x, 1)
totalAl.SetCoefficient(x, item.Al)
totalAc.SetCoefficient(x, item.Ac)
totalAb.SetCoefficient(x, item.Ab)
totalS.SetCoefficient(x, item.S)
solver.Objective().SetCoefficient(
x,
MWeight * item.M
+ EWeight * item.E
+ PWeight * item.P
+ CWeight * item.C
+ DWeight * item.D,
)
associations.append((item, x))
if solver.Solve() != solver.OPTIMAL:
raise RuntimeError
solution = []
for item, x in associations:
solution += [item] * round(x.solution_value())
print(solution)
I think David Eisenstat has the right idea with Integer programming, but let's see if we get some good solutions otherwise and perhaps provide some initial optimization. However, I think that we can just choose all of one item in each list may make this easier to solve that it normally would be. Basically that turns it into more of a Subset Sum problem. Especially with the cap.
There are two possibilities here:
There is no solution, no condition satisfies the requirement.
There is a solution that we need to be optimized.
We really want to try to find a solution first, if we can find one (regardless of the amount of waste), then that's nice.
So let's reframe the problem: We aim to simply minimize waste, but we also need to meet a min requirement. So let's try to get as much waste as possible in ways we need it.
I'm going to propose an algorithm you could use that should work "fairly well" and is polynomial time, though could probably have some optimizations. I'll be using K to mean mandatoryAmount as it's a bit of a customary variable in this situation. Also I'll be using N to mean the number of lists. Lastly, Z to represent the total number of items (across all lists).
Get the list of all items and sort them by the amount of each value they have (first the goal values, then the bonus values). If an item has 100A, 300C, 200B, 400D, 150E and the required are [B, D], then the sort order would look like: [400,200,300,150,100]. Repeat but for one goal value. Using the same example above we would have: [400,300,150,100] for goal: D and [200,300,150,100] for goal B. Create a boolean variable for optimization mode (we start by seeking for a solution, once we find one, we can try to optimize it). Create a counter/hash to contain the unassigned items. An item cannot be unassigned more than K times (to avoid infinite loops). This isn't strictly needed, but could work as an optimization for step 5, as it prioritize goals you actually need.
For each list, keep a counter of the number of assignable slots for each list, set each to K, as well as the number of total assignable slots, and set to K * N. This will be adjusted as needed along the way. You want to be able to quickly O(1) lookup for: a) which list an (sorted) item belongs to, b) how many available slots that item has, and c) How many times has the item been unassigned, d) Find the item is the sorted list.
General Assignment. While there are slots available (total slots), go through the list from highest to lowest order. If the list for that item is available, assign as many slots as possible to that item. Update the assignable and total slots. If result is a valid solution, record it, trip the "optimization mode flag". If slots remain unassigned, revert the previous unassignment (but do not change the assignment count).
Waste Optimization. Find the most wasteful item that can be unassigned (unassigned count < K). Unassign one slot of it. If in optimization mode, do not allow any of the goal values to go below their cap (skip if it would). Update the unassigned count for item. Goto #3, but start just after the wasteful item. If no assignment made, reassign this item until the list has no remaining assignments, but do not update the unassigned count (otherwise we might end up in an invalid state).
Goal value Optimization. Skip if current state is a valid solution. Find the value furthest from it's goal (IE: A/B/C/D/E above) that can be unassigned. Unassign one slot for that item. Update assignment count. Goto step 3, begin search at start of list (unlike Step 4), stop searching the list if you go below the value of this item (not this item itself, as others may have the same value). If no assignment made, reassign this item until the list has no remaining assignments, but do not update the unassigned count (otherwise we might end up in an invalid state).
No Assignments remain. Return current state as "best solution found".
Algorithm should end with the "best" solution that this approach can come up with. Increasing max unassignment counts may improve the solution, decreasing max assignment counts will speed up the algorithm. Algorithm will run until it has maxed out it's assignment counts.
This is a bit of a greedy algorithm, so I'm not sure it's optimal (in that it will always yield the best result) but it may give you some ideas as to how to approach it. It also feels like it should yield fairly good results, as it basically trying to bound the results. Algorithm performance is something like O(Z^2 * K), where K is the mandatoryAmount and Z is the total number of items. Each item is unassigned K items, and potentially each assignment also requires O(Z) checks before it is reassigned.
As an optimization, use a O(log N) or better delete/next operation sorted data structure to store the sorted lists. Doing so it would make it practical to delete items from the assignment lists once the unassignment count reaches K (rendering them no longer assignable) allowing for O(Z * log(Z) * K) performance instead.
Edit:
Hmmm, the above only works within a single list (IE: Item removed can only be added to it's own list, as only that list has room). To avoid this, do step 4 (remove too heavy) then step 5 (remove too light) and then goto step 3 (using step 5's rules for searching, but also disallow adding back the too heavy ones).
So basically we remove the heaviest one then the lightest one then we try to assign something that is as heavy as possible to make up for the lightest one we removed.

Number of partitions with a given constraint

Consider a set of 13 Danish, 11 Japanese and 8 Polish people. It is well known that the number of different ways of dividing this set of people to groups is the 13+11+8=32:th Bell number (the number of set partitions). However we are asked to find the number of possible set partitions under a given constraint. The question is as follows:
A set partition is said to be good if it has no group consisting of at least two people that only includes a single nationality. How many good partitions there are for this set? (A group may include only one person.)
The brute force approach requires going though about 10^26 partitions and checking which ones are good. This seems pretty unfeasible, especially if the groups are larger or one introduces other nationalities. Is there a smart way instead?
EDIT: As a side note. There probably is no hope for a really nice solution. A highly esteemed expert in combinatorics answered a related question, which, I think, basically says that the related problem, and thus this problem also, is very difficult to solve exactly.
Here's a solution using dynamic programming.
It starts from an empty set, then adds one element at a time and calculates all the valid partitions.
The state space is huge, but notice that to be able to calculate the next step we only need to know about a partition the following things:
For each nationality, how many sets it contains that consists of only a single member of that nationality. (e.g.: {a})
How many sets it contains with mixed elements. (e.g.: {a, b, c})
For each of these configurations I only store the total count. Example:
[0, 1, 2, 2] -> 3
{a}{b}{c}{mixed}
e.g.: 3 partitions that look like: {b}, {c}, {c}, {a,c}, {b,c}
Here's the code in python:
import collections
from operator import mul
from fractions import Fraction
def nCk(n,k):
return int( reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1) )
def good_partitions(l):
n = len(l)
i = 0
prev = collections.defaultdict(int)
while l:
#any more from this kind?
if l[0] == 0:
l.pop(0)
i += 1
continue
l[0] -= 1
curr = collections.defaultdict(int)
for solution,total in prev.iteritems():
for idx,item in enumerate(solution):
my_solution = list(solution)
if idx == i:
# add element as a new set
my_solution[i] += 1
curr[tuple(my_solution)] += total
elif my_solution[idx]:
if idx != n:
# add to a set consisting of one element
# or merge into multiple sets that consist of one element
cnt = my_solution[idx]
c = cnt
while c > 0:
my_solution = list(solution)
my_solution[n] += 1
my_solution[idx] -= c
curr[tuple(my_solution)] += total * nCk(cnt, c)
c -= 1
else:
# add to a mixed set
cnt = my_solution[idx]
curr[tuple(my_solution)] += total * cnt
if not prev:
# one set with one element
lone = [0] * (n+1)
lone[i] = 1
curr[tuple(lone)] = 1
prev = curr
return sum(prev.values())
print good_partitions([1, 1, 1, 1]) # 15
print good_partitions([1, 1, 1, 1, 1]) # 52
print good_partitions([2, 1]) # 4
print good_partitions([13, 11, 8]) # 29811734589499214658370837
It produces correct values for the test cases. I also tested it against a brute-force solution (for small values), and it produces the same results.
An exact analytic solution is hard, but a polynomial time+space dynamic programming solution is straightforward.
First of all, we need an absolute order on the size of groups. We do that by comparing how many Danes, Japanese, and Poles we have.
Next, the function to write is this one.
m is the maximum group size we can emit
p is the number of people of each nationality that we have left to split
max_good_partitions_of_maximum_size(m, p) is the number of "good partitions"
we can form from p people, with no group being larger than m
Clearly you can write this as a somewhat complicated recursive function that always select the next partition to use, then call itself with that as the new maximum size, and subtract the partition from p. If you had this function, then your answer is simply max_good_partitions_of_maximum_size(p, p) with p = [13, 11, 8]. But that is going to be a brute force search that won't run in reasonable time.
Finally apply https://en.wikipedia.org/wiki/Memoization by caching every call to this function, and it will run in polynomial time. However you will also have to cache a polynomial number of calls to it.

Elements mixing algorithm

Not sure about title.
Here is what I need.
Lets for example have this set of elements 20*A, 10*B, 5*C, 5*D, 2*E, 1*F
I need to mix them so there are not two same elements next to each other and also I can for example say I don't want B and C to be next to each other. Elements have to be evenly spread (if there are 2 E one should be near begining/ in firs half a and second near end/in second half. Number of elements can of course change.
I haven't done anything like this yet. Is there some knowledge-base of this kind of algorithms where could I find some hints and methods how to solve this kind of problem or do I have to do all the math myself?
I think the solution is pretty easy.
Start with an array x initialised to empty values such that there is one space for each item you need to place.
Then, for each (item, frequency) pair in descending order of frequency, assign item values to x in alternating slots starting from the first empty slot.
Here's how it works for your example:
20*A A_A_A_A_A_A_A_A_A_A_A_A_A_A_A_A_A_A_A_A
10*B ABABABABABABABABABABA_A_A_A_A_A_A_A_A_A
5*C ABABABABABABABABABABACACACACACA_A_A_A_A
2*E ABABABABABABABABABABACACACACACAEAEA_A_A
1*F ABABABABABABABABABABACACACACACAEAEAFA_A
At this point we fail, since x still has an empty slot. Note that we could have identified this right from the start since we need at least 19 slots between the As, but we only have 18 other items.
UPDATE
Leonidas has now explained that the items should be distributed "evenly" (that is, if we have k items of a particular kind, and n slots to fill, each "bucket" of n/k slots must contain one item of that kind.
We can adapt to this constraint by spreading out our allocations rather than simply going for alternating slots. In this case (and let's assume 2 Fs so we can solve this), we would have
20*A A_A_A_A_A_A_A_A_A_A_A_A_A_A_A_A_A_A_A_A
10*B ABA_ABA_ABA_ABA_ABA_ABA_ABA_ABA_ABA_ABA
5*C ABACABA_ABACABA_ABACABA_ABACABA_ABACABA
2*E ABACABAEABACABA_ABACABAEABACABA_ABACABA
2*F ABACABAEABACABAFABACABAEABACABAFABACABA
You can solve this problem recursively:
def generate(lastChar, remDict):
res = []
for i in remDict:
if i!=lastChar):
newRemDict = remDict
newRemDict[i]-=1
subres = generate(i,newRemDict)
res += [i+j for j in subres]
return res
Note that I am leaving out corner conditions and many checks that need to be done. But only the core recursion is shown. You can also quit pursuing a branch if more than half+1 of the remaining letters is a same letter.
I ran into a similar problem, and after evaluating various metrics, I came up with the idea of grabbing the first item for which the proportion through the source array is less than the proportion through the result array. There is a case where all of these values may come out as 1, for instance when halfway through merging a group of even arrays - everything's exactly half done - so I grab something from the first array in that case.
This solution does use the source array order, which is something that I wanted. If the calling routine wants to merge arrays A, B, and C, where A has 3 elements but B and C have 2, we should get A,B,C,A,B,C,A, not A,C,B,A,C,B,A or other possibilities. I find that choosing the first of my source arrays that's "overdue" (by having a proportion that's lower than our overall progress), I get a nice spacing with all arrays.
Source in Python:
#classmethod
def intersperse_arrays(cls, arrays: list):
# general idea here is to produce a result with as even a balance as possible between all the arrays as we go down.
# Make sure we don't have any component arrays of length 0 to worry about.
arrays = [array for array in arrays if len(array) > 0]
# Handle basic cases:
if len(arrays) == 0:
return []
if len(arrays) == 1:
return arrays[0]
ret = []
num_used = []
total_count = 0
for j in range(0, len(arrays)):
num_used.append(0)
total_count += len(arrays[j])
while len(ret) < total_count:
first_overdue_array = None
first_remaining_array = None
overall_prop = len(ret) / total_count
for j in range(0, len(arrays)):
# Continue if this array is already done.
if len(arrays[j]) <= num_used[j]:
continue
current_prop = num_used[j] / len(arrays[j])
if current_prop < overall_prop:
first_overdue_array = j
break
elif first_remaining_array is None:
first_remaining_array = j
if first_overdue_array is not None:
next_array = first_overdue_array
else:
# Think this only happens in an exact tie. (Halfway through all arrays, for example.)
next_array = first_remaining_array
if next_array is None:
log.error('Internal error in intersperse_arrays')
break # Shouldn't happen - hasn't been seen.
ret.append(arrays[next_array][num_used[next_array]])
num_used[next_array] += 1
return ret
When used on the example given, I got:
ABCADABAEABACABDAFABACABADABACDABAEABACABAD
(Seems reasonable.)

Finding sets that are a subset of a specific set

Lets say I have 4 different values A,B,C,D with sets of identifiers attached.
A={1,2,3,4,5}
B={8,9,4}
C={3,4,5}
D={12,8}
And given set S of identifiers {1,30,3,4,5,12,8} I want it to return C and D. i.e. retrieve all sets from a group of sets for which S is a superset.
Is there any algorithms to perform this task efficiently (Preferably with low memory complexity. Using external device for storing data is not an option) ?
A trivial solution would be for each member in the superset S retrieve list of sets that include that member (basically inverted index) and for each returned set check that all of his members are in the superset. Unfortunately because on average the superset will include at least one member for each set there is a significant and unacceptable performance hit with this approach.
I am trying to do this in Java. Set consist of integers and the value they identify is an object.
Collection of sets is not static and bound to change during the course of execution. There will be some limit on the set number though.
Set size is not limited. But on average it's between 1 and 20.
Go through each element x in S.
For each set t for which x ∈ t, increment a counter—call it tcount—associated with t.
After all that, for each set t for which tcount = | t |, you know that t ⊆ S.
Application.
After step 2.
Acount = 4,
Bcount = 1,
Ccount = 3,
Dcount = 2.
Step 3 processing.
Acount ≠ |A| (4 ≠ 5) — Reject,
Bcount ≠ |B| (1 ≠ 3) — Reject,
Ccount = |C| (3 = 3) — Accept,
Dcount = |D| (2 = 2) — Accept.
Note after cgkanchi note: The following algorithm is under the assumption that you don't really use sets but arrays. If that is not the case, you should look for a method which implements intersection of sets and then the problem is trivial. This is about how to implement the notion of intersection using arrays.
Sort all sets using heapsort for in-place sorting O(1) space. It runs in O(nlogn) and soon enough it will pay you back.
For each set L of all sets:
2.1. j = 0
2.2. For the i element in L:
2.2.1. Starting from j element find L[i] in S for which L[i] = S[j] else reject. If L and S and large enough use binary search or interpolation search (for the second one, have a look at your data distibution)
2.3. Accept
As for Java, I’d use a Hashtable for the lookup table of the elements in S. Then for each element in X, the set you want to test if it’s a subset of S, test if it’s in the lookup table. If all elements of X are also in S, then S is a superset of X.

Resources