Integer Linear Programming special case - algorithm

I am trying to solve a special case of Integer Linear Programming Problem, but I seem to be stack on the algorithm.
Specifically, suppose you have some binary variable x_{1}, ... x_{n} and some inequalities of the form:
i.e. x_{2} + x_{3} + x_{10} <= 2
Note, that the coefficients of the inequalities are all unity and the right hand side is always the number of variables in the left hand side minus 1.
Also, remember that the variables x_{1}, ..., x_{n} can take the values of 0 or 1.
This is a homework (to write a program) but I cannot find an algorithm to start.
I tried with DP and Network Flow, but nothing came out.
The objective function (got lost in the edit) is to maximize the sum:
x_{1} + ... + x_{n}

The problem is equivalent to Set Cover: http://en.wikipedia.org/wiki/Set_cover_problem#Integer_linear_program_formulation. One way to see this easily is to replace x_{i} with 1-y{i}, which gives an equivalent 0-1 linear programming problem, namely
maximize (1-y_{1}) + (1-y_{2}) + ... + (1-y_{n}) = n - (y_{1} + ... + y_{n}),
which is equivalent to minimizing y_{1} + ... + y_{n},
subject to the following family of inequalities indexed by j:
(1-y_{i_{1j}}) + (1-y_{i_{2j}}) + ... + (1-y_{i_{kj}) <= k-1,
which are equivalent to:
y_{i_{1j}} + y_{i_{2j}} + ... + y_{i_kj} >= 1
The equivalent formulation of the problem is the 0-1 integer linear programming formulation of Set Cover.
A greedy algorithm will provide a reasonable approximation in this situation. Determine which of the x_{i} appear most often in constraints, and set it equal to 0. All of the constraints in which x_{i0} appears are now satisfied, so they can be removed from consideration, and the variable x_{i0} can be removed from objective. Repeat with the variable x_{i1} which appears most often in the remaining constraints, etc.
Alternatively, real linear programming will also provide an approximation.
Since Set Cover is NP-hard, the best exact solution you will be able to find will be exponential in time. A simple algorithm would just try all possibilities (run through all binary numbers from x_{n}x_{n-1}...x_{1}x_{0} = 00...00 to x_{n}x_{n-1}...x_{1}x_{0} = 11...11 = 2^(n+1)-1. There are surely faster (but still exponential time) algorithms if you search.

Related

Diff algorithm with fuzzy difference metric

I'm looking for an algorithm similar to largest common subsequence algorithms that has an alphabet letter similarity metric. What I mean is that known algorithms treat all letters of alphabet as completely different, my use case has letters of alphabet that are easier to edit into another letter, hence they should be treated as similar by diffing algorithm.
As usage example you may think about diffing algorithm working on lines of text where some lines are more similar to other lines.
The paper An O(ND) Difference Algorithm and Its Variations states on page 4: Consider adding a weight or cost to every edge. Give diagonal edges weight 0 and non-diagonal edges weight 1. I'd like to have an option to assign any weight from [0;1] interval.
The Largest Common Subsequence (LCS) problem is usually computed by dynamic programming methods and you can tweak existing methods to apply them to your use case.
In this example explaining how LCS works (from Wikipedia) https://en.wikipedia.org/wiki/Longest_common_subsequence_problem#Example, you should think tweaking the algorithm such that:
instead of scoring :
score_j = socre_i + 1, for j = i +1 (that is to say, you add 1 when you find a new common item) when a new item is added to the LCS, you should score:
score_j = F(score_i, p(letter_i, letter_j)) no matter what.
p(letter_i, letter_j) is the probability to change from letter_i to letter_j (that is the weight [0, 1] you were talking about)
F is an aggreggation function, to go from score_i to score_j knowing that probability p.
For instance F can be defined as the operator +. It would then yield :
score_j = score_i + p(letter_i, letter_j))
or more precisely :
score_j = score_i + p(letter_i, letter_j)) x 1 (read the x 1 as of a character)
and that woud give you the maximum similarity (of characters) of the 2 subsequences, that you can find by backtracking at the end of the algorithm.
You can find your own function F to yield better results.

Dynamic Programing- complexity

I have a homework problem that I have been trying to figure out for some time now, and I can't figure it out for the life of me.
I have a sheet of size X*Y, and a set of patterns of lesser sizes, with price values associated with them. I can cut the sheet either horizontally or vertically, and I have to find the optimized cutting pattern to get the greatest profit from the sheet.
As far as I can tell there should be (X*Y)(X+Y+#ofPatterns) recursive operations. The complexity is supposed to be exponential. Can someone please explain why?
The pseudo-code I have come up with is as follows:
Optimize( w, h ) {
best_price = 0
for(Pattern p : all patterns) {
if ( p fits into this piece of cloth && p’s price > best price) {best_price = p’s price}
}
for (i = 1…n){
L= Optimize( i, h );
R= Optimize( w-i, h);
if (L_price + R_price > best_price) { update best_price}
}
for (i = 1…n){
T= Optimize( w, i );
B= Optimize( w, h-i);
if (T_price + B_price > best_price) { update best_price}
}
return best_price;
}
The recursive case is exponential because you can at the start choose to cut your paper 0 to max width inches or 0 to max height inches and then optionally cut the remaining pieces (recurse).
This problem sounds like a bit more interesting case of this rod cutting problem since it involves two dimensions.
http://www.radford.edu/~nokie/classes/360/dp-rod-cutting.html
is a good guide. Read that should put you on the right track without blatantly answering your homework.
The relevant portion to why it is exponential when recursing:
This recursive algorithm uses the formula above and is slow
Code
-- price array p, length n
Cut-Rod(p, n)
if n = 0 then
return 0
end if
q := MinInt
for i in 1 .. n loop
q = max(q, p(i) + Cut-Rod(p, n-i)
end loop
return q
Recursion tree (shows subproblems): 4/[3,2,1,0]//[2,1,0],[1,0],0//[1,0],0,0//0
Performance: Let T(n) = number of calls to Cut-Rod(x, n), for any x
T(0)=0
T(n)=1+∑i=1nT(n−i)=1+∑j=0n−1T(j)
Solution: T(n)=2n
When calculating the complexity of a dynamic programming algorithm, we can decompose it into two subproblems: one is calculating the number of substates; and the other is calculating the time complexity of solving a particular subproblem.
But it's true that when you don't use a memoization approach, the algorithm that has a polynomial time complexity in nature would increase to exponential time complexity since you are not re-using information that you've previously calculated. (I'm pretty sure you understand this part from your dynamic programming course)
No matter whether you solve a dynamic programming problem using the memoization method or the bottom-up approach, the time complexity stays the same. I think the trouble you are having is that you are trying to draw the function call graph in your head. Instead, let's try to estimate the number of function calls this way.
You are saying that there are (X*Y)(X+Y+#ofPatterns) recursive calls.
Well, yes and no.
It's true that when you use a memoization method, there are only this many number of recursive calls. Because if you have called and calculated a certain Optimize(w0,h0), the value will be stored and the next time another function Optimize(w1,h1) calls Optimize(w0,h0), it won't do these redundant work again. And that's what makes the time complexity polynomial.
But in your current implementation, one subproblem Optimize(w0,h0) gets many redundant function calls, which means the number of recursive calls in your algorithm is not polynomial at all (for a simple example, try to draw the call graph of the recursive Fibonacci number algorithm).

optimization of sum of multi variable functions

Imagine that I'm a bakery trying to maximize the number of pies I can produce with my limited quantities of ingredients.
Each of the following pie recipes A, B, C, and D produce exactly 1 pie:
A = i + j + k
B = t + z
C = 2z
D = 2j + 2k
*The recipes always have linear form, like above.
I have the following ingredients:
4 of i
5 of z
4 of j
2 of k
1 of t
I want an algorithm to maximize my pie production given my limited amount of ingredients.
The optimal solution of these example inputs would yield me the following quantities of pies:
2 x A
1 x B
2 x C
0 x D
= a total of 5 pies
I can solve this easily enough by taking the maximal producer of all combinations, but the number
of combos becomes prohibitive as the quantities of ingredients increases. I feel like there must
be generalizations of this type of optimization problem, I just don't know where to start.
While I can only bake whole pies, I would be still be interested in seeing a method which may produce non integer results.
You can define the linear programming problem. I'll show the usage on the example, but it can of course be generalized to any data.
Denote your pies as your variables (x1 = A, x2 = B, ...) and the LP problem will be as follows:
maximize x1 + x2 + x3 + x4
s.t. x1 <= 4 (needed i's)
x1 + 2x4 <= 4 (needed j's)
x1 + 2x4 <= 2 (needed k's)
x2 <= 1 (needed t's)
x2 + 2x3 <= 5 (needed z's)
and x1,x2,x3,x4 >= 0
The fractional solution to this problem is solveable polynomially, but the integer linear programming is NP-Complete.
The problem is indeed NP-Complete, because given an integer linear programming problem, you can reduce the problem to "maximize the number of pies" using the same approach, where each constraint is an ingredient in the pie and the variables are the number of pies.
For the integers problem - there are a lot of approximation techniques in the literature for the problem if you can do with "close up to a certain bound", (for example local ratio technique or primal-dual are often used) or if you need an exact solution - exponential solution is probably your best shot. (Unless of course, P=NP)
Since all your functions are linear, it sounds like you're looking for either linear programming (if continuous values are acceptable) or integer programming (if you require your variables to be integers).
Linear programming is a standard technique, and is efficiently solvable. A traditional algorithm for doing this is the simplex method.
Integer programming is intractable in general, because adding integral constraints allows it to describe intractable combinatorial problems. There seems to be a large number of approximation techniques (for example, you might try just using regular linear programming to see what that gets you), but of course they depend on the specific nature of your problem.

What is the most efficient way to determine the Farey sequence of degree n?

I am going to implement a Farey fraction approximation for converting limited-precision user input into possibly-repeating rationals.
http://mathworld.wolfram.com/FareySequence.html
I can easily locate the closest Farey fraction in a sequence, and I can find Fn by recursively searching for mediant fractions by building the Stern-Brocot tree.
http://mathworld.wolfram.com/Stern-BrocotTree.html
However, the method I've come up with for finding the fractions in the sequence Fn seems very inefficient:
(pseudo)
For int i = 0 to fractions.count -2
{
if fractions[i].denominator + fractions[i+1].denominator < n
{
insert new fraction(
numerator = fractions[i].numerator + fractions[i+1].numerator
,denominator = fractions[i].denominator + fractions[i+1].denominator)
//note that fraction will reduce itself
addedAnElement = true
}
}
if addedAnElement
repeat
I will almost always be defining the sequence Fn where n = 10^m where m >1
So perhaps it might be best to build the sequence one time and cache it... but it still seems like there should be a better way to derive it.
EDIT:
This paper has a promising algorithm:
http://www.math.harvard.edu/~corina/publications/farey.pdf
I will try to implement.
The trouble is that their "most efficient" algorithm requires knowing the prior two elements. I know element one of any sequence is 1/n but finding the second element seems a challenge...
EDIT2:
I'm not sure how I overlooked this:
Given F0 = 1/n
If x > 2 then
F1 = 1/(n-1)
Therefore for all n > 2, the first two fractions will always be
1/n, 1/(n-1) and I can implement the solution from Patrascu.
So now, we the answer to this question should prove that this solution is or isn't optimal using benchmarks..
Why do you need the Farey series at all? Using continued fractions would give you the same approximation online without precalculating the series.
Neighboring fractions in Farey sequences are described in Sec. 3 of Neighboring Fractions in Farey Subsequences, http://arxiv.org/abs/0801.1981 .

Calculate discrete logarithm

Given positive integers b, c, m where (b < m) is True it is to find a positive integer e such that
(b**e % m == c) is True
where ** is exponentiation (e.g. in Ruby, Python or ^ in some other languages) and % is modulo operation. What is the most effective algorithm (with the lowest big-O complexity) to solve it?
Example:
Given b=5; c=8; m=13 this algorithm must find e=7 because 5**7%13 = 8
From the % operator I'm assuming that you are working with integers.
You are trying to solve the Discrete Logarithm problem. A reasonable algorithm is Baby step, giant step, although there are many others, none of which are particularly fast.
The difficulty of finding a fast solution to the discrete logarithm problem is a fundamental part of some popular cryptographic algorithms, so if you find a better solution than any of those on Wikipedia please let me know!
This isn't a simple problem at all. It is called calculating the discrete logarithm and it is the inverse operation to a modular exponentation.
There is no efficient algorithm known. That is, if N denotes the number of bits in m, all known algorithms run in O(2^(N^C)) where C>0.
Python 3 Solution:
Thankfully, SymPy has implemented this for you!
SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python.
This is the documentation on the discrete_log function. Use this to import it:
from sympy.ntheory import discrete_log
Their example computes \log_7(15) (mod 41):
>>> discrete_log(41, 15, 7)
3
Because of the (state-of-the-art, mind you) algorithms it employs to solve it, you'll get O(\sqrt{n}) on most inputs you try. It's considerably faster when your prime modulus has the property where p - 1 factors into a lot of small primes.
Consider a prime on the order of 100 bits: (~ 2^{100}). With \sqrt{n} complexity, that's still 2^{50} iterations. That being said, don't reinvent the wheel. This does a pretty good job. I might also add that it was almost 4x times more memory efficient than Mathematica's MultiplicativeOrder function when I ran with large-ish inputs (44 MiB vs. 173 MiB).
Since a duplicate of this question was asked under the Python tag, here is a Python implementation of baby step, giant step, which, as #MarkBeyers points out, is a reasonable approach (as long as the modulus isn't too large):
def baby_steps_giant_steps(a,b,p,N = None):
if not N: N = 1 + int(math.sqrt(p))
#initialize baby_steps table
baby_steps = {}
baby_step = 1
for r in range(N+1):
baby_steps[baby_step] = r
baby_step = baby_step * a % p
#now take the giant steps
giant_stride = pow(a,(p-2)*N,p)
giant_step = b
for q in range(N+1):
if giant_step in baby_steps:
return q*N + baby_steps[giant_step]
else:
giant_step = giant_step * giant_stride % p
return "No Match"
In the above implementation, an explicit N can be passed to fish for a small exponent even if p is cryptographically large. It will find the exponent as long as the exponent is smaller than N**2. When N is omitted, the exponent will always be found, but not necessarily in your lifetime or with your machine's memory if p is too large.
For example, if
p = 70606432933607
a = 100001
b = 54696545758787
then 'pow(a,b,p)' evaluates to 67385023448517
and
>>> baby_steps_giant_steps(a,67385023448517,p)
54696545758787
This took about 5 seconds on my machine. For the exponent and the modulus of those sizes, I estimate (based on timing experiments) that brute force would have taken several months.
Discrete logarithm is a hard problem
Computing discrete logarithms is believed to be difficult. No
efficient general method for computing discrete logarithms on
conventional computers is known.
I will add here a simple bruteforce algorithm which tries every possible value from 1 to m and outputs a solution if it was found. Note that there may be more than one solution to the problem or zero solutions at all. This algorithm will return you the smallest possible value or -1 if it does not exist.
def bruteLog(b, c, m):
s = 1
for i in xrange(m):
s = (s * b) % m
if s == c:
return i + 1
return -1
print bruteLog(5, 8, 13)
and here you can see that 3 is in fact the solution:
print 5**3 % 13
There is a better algorithm, but because it is often asked to be implemented in programming competitions, I will just give you a link to explanation.
as said the general problem is hard. however a prcatical way to find e if and only if you know e is going to be small (like in your example) would be just to try each e from 1.
btw e==3 is the first solution to your example, and you can obviously find that in 3 steps, compare to solving the non discrete version, and naively looking for integer solutions i.e.
e = log(c + n*m)/log(b) where n is a non-negative integer
which finds e==3 in 9 steps

Resources