I'm looking for an algorithm that addresses the LCS problem for two strings with the following conditions:
Each string consists of English characters and each character has a weight. For example:
sequence 1 (S1): "ABBCD" with weights [1, 2, 4, 1, 3]
sequence 2 (S2): "TBDC" with weights [7, 5, 1, 2]
Suppose that MW(s, S) is defined as the maximum weight of the sub-sequence s in string S with respect to the associated weights. The heaviest common sub-sequence (HCS) is defined as:
HCS = argmin(MW(s, S1), MW(s, S2))
The algorithm output should be the indexes of HCS in both strings and the weight. In this case, the indexes will be:
I_S1 = [2, 4] --> MW("BD", "ABBCD") = 7
I_S2 = [1, 2] --> MW("BD", "TBDC") = 6
Therefore HCS = "BD", and weight = min(MW(s, S1), MW(s, S2)) = 6.
The table that you need to build will have this.
for each position in sequence 1
for each position in sequence 2
for each extreme pair of (weight1, weight2)
(last_position1, last_position2)
Where an extreme pair is one where it is not possible to find a subsequence to that point whose weights in sequence 1 and weights in sequence 2 are both >= and at least one is >.
There may be multiple extreme pairs, where one sequence is higher than the other.
The rule is that at the (i, -1) or (-1, j) positions, the only extreme pair is the empty set with weight 0. At any other we merge the extreme pairs for (i-1, j) and (i, j-1). And then if seq1[i] = seq2[j], then add the options where you went to (i-1, j-1) and then included the i and j in the respective subsequences. (So add weight1[i] and weight2[j] to the weights then do a merge.)
For that merge you can sort by weight1 ascending, all of the extreme values for both previous points, then throw away all of the ones whose weight2 is less than or equal to the best weight2 that was already posted earlier in the sequence.
When you reach the end you can find the extreme pair with the highest min, and that is your answer. You can then walk the data structure back to find the subsequences in question.
You are given an array of positive integers A. You need to create a subset of the array A with the maximum number of elements with the property that however we take any two numbers of the subset (we can call it x and y), we have that gcd(x,y) is higher than 1. Print the elements of the subset.
For example, if we have n = 4 and the array is {15, 7, 10, 6}, the output needs to be {15, 10, 6}.
Is there any faster solution than backtracking?
Yes, I think you have a better solution. Transform this to a graph problem: each integer is a node; two nodes i and j have an edge connecting them iff gcd(i, j) > 1.
Now, you need to find the largest fully-connected subgraph, (a.k.a. a clique). A little research will show you how to implement that. It's not efficient, but it's more tractable and reliable.
This is equivalent to the Clique problem. So no, there is no efficient solution for this (unless P = NP).
So, say you have a collection of value pairs on the form {x, y}, say {1, 2}, {1, 3} & {2, 5}.
Then you have to find a subset of k pairs (in this case, say k = 2), such that the ratio of the sum of all x in the subset divided by all the y in the subset is as high as possible.
Could you point me in the direction for relevant theory or algorithms?
It's kind of like maximum subset sum, but since the pairs are "bound" to each other it introduces a restriction that changes it from problems known to me.
Initially I thought that a simple greedy approach could work here, but commentators pointed out some counter examples.
Instead I think a bisection approach should work.
Suppose we want to know whether it is possible to achieve a ratio of g.
We need to add a selection of k vectors to end up above a line of gradient g.
If we project each vector perpendicular to this line to get values p1,p2,p3, then the final vector will be above the line if and only if the sum of the p values is positive.
Now, with the projected values it does seem right that the optimal solution is to choose the largest k.
We can then use bisection to find the highest ratio that is achievable.
Mathematical justification
Suppose we want to have the ratio above g, i.e.
(x1+x2+x3)/(y1+y2+y3) >= g
=> (x1+x2+x3) >= g(y1+y2+y3)
=> (x1-g.y1) + (x2-g.y2) + (x3-g.y3) >= 0
=> p1 + p2 + p3 >= 0
where pi is defined to be xi-g.yi.
I have a number n, and I want to find three numbers whose product is n but are as close to each other as possible. That is, if n = 12 then I'd like to get 3, 2, 2 as a result, as opposed to 6, 1, 2.
Another way to think of it is that if n is the volume of a cuboid then I want to find the lengths of the sides so as to make the cuboid as much like a cube as possible (that is, the lengths as similar as possible). These numbers must be integers.
I know there is unlikely to be a perfect solution to this, and I'm happy to use something which gives a good answer most of the time, but I just can't think where to go with coming up with this algorithm. Any ideas?
Here's my first algorithm sketch, granted that n is relatively small:
Compute the prime factors of n.
Pick out the three largest and assign them to f1, f2, f3. If there are less than three factors, assign 1.
Loop over remaining factors in decreasing order, multiply them into the currently smallest partition.
Edit
Let's take n=60.
Its prime factors are 5 3 2 2.
Set f1=5, f2=3 and f3=2.
The remaining 2 is multiplied to f3, because it is the smallest.
We end up with 5 * 4 * 3 = 60.
Edit
This algorithm will not find optimum, notice btillys comment:
Consider 17550 = 2 * 3 * 3 * 3 * 5 * 5
* 13. Your algorithm would give 15, 30, 39 when the best is 25, 26, 27.
Edit
Ok, here's my second algorithm sketch with a slightly better heuristic:
Set the list L to the prime factors of n.
Set r to the cube root of n.
Create the set of three factors F, initially set to 1.
Iterate over the prime factors in descending order:
Try to multiply the current factor L[i] with each of the factors in descending order.
If the result is less than r, perform the multiplication and move on to the next
prime factor.
If not, try the next F. If out of Fs, multiply with the smallest one.
This will work for the case of 17550:
n=17550
L=13,5,5,3,3,3,2
r=25.98
F = { 1, 1, 1 }
Iteration 1:
F[0] * 13 is less than r, set F to {13,1,1}.
Iteration 2:
F[0] * 5 = 65 is greated than r.
F[1] * 5 = 5 is less than r, set F to {13,5,1}.
Iteration 3:
F[0] * 5 = 65 is greated than r.
F[1] * 5 = 25 is less than r, set F to {13,25,1}.
Iteration 4:
F[0] * 3 = 39 is greated than r.
F[1] * 3 = 75 is greated than r.
F[2] * 3 = 3 is less than r, set F to {13,25,3}.
Iteration 5:
F[0] * 3 = 39 is greated than r.
F[1] * 3 = 75 is greated than r.
F[2] * 3 = 9 is less than r, set F to {13,25,9}.
Iteration 6:
F[0] * 3 = 39 is greated than r.
F[1] * 3 = 75 is greated than r.
F[2] * 3 = 27 is greater than r, but it is the smallest F we can get. Set F to {13,25,27}.
Iteration 7:
F[0] * 2 = 26 is greated than r, but it is the smallest F we can get. Set F to {26,25,27}.
Here's a purely math based approach, that returns the optimal solution and does not involve any kind of sorting. Hell, it doesn't even need the prime factors.
Background:
1) Recall that for a polynomial
the sum and product of the roots are given by
where x_i are the roots.
2) Recall another elementary result from optimization theory:
i.e., given two variables such that their product is a constant, the sum is minimum when the two variables are equal to each other. The tilde variables denote the optimal values.
A corollary of this would be that if the sum of two variables whose product is constant, is a minimum, then the two variables are equal to each other.
Reformulate the original problem:
Your question above can now be reformulated as a polynomial root-finding exercise. We'll construct a polynomial that satisfies your conditions, and the roots of that polynomial will be your answer. If you need k numbers that are optimal, you'll have a polynomial of degree k. In this case, we can talk in terms of a cubic equation
We know that:
c is the negative of the input number (assume positive)
a is an integer and negative (since factors are all positive)
b is an integer (which is the sum of the roots taken two at a time) and is positive.
Roots of p must be real (and positive, but that has already been addressed).
To solve the problem, we simply need to maximize a subject to the above set of conditions. The only part not explicitly known right now, is condition 4, which we can easily enforce using the discriminant of the polynomial.
For a cubic polynomial p, the discriminant is
and p has real and distinct roots if ∆>0 and real and coincident (either two or all three) if ∆=0. So, constraint 4 now reads ∆>=0. This is now simple and easy to program.
Solution in Mathematica
Here's a solution in Mathematica that implements this.
And here's a test on some of the numbers used in other answers/comments.
The column on the left is the list and the corresponding row in the column on the right gives the optimal solution.
NOTE:
I just noticed that OP never mentioned that the 3 numbers needed to be integers although everyone (including myself until now) assumed that they were (probably because of his first example). Re-reading the question, and going by the cube example, it doesn't seem like OP was fixated on integers.
This is an important point which will decide which class of algorithms to pursue and needs to be defined. If they need not be integers, there are several polynomial based solutions that can be provided, one of which is mine (after relaxing the integer constraint). If they should be integers, then perhaps an approach using branch-n-bound/branch-n-cut/cutting plane might be more appropriate.
The following was written assuming the OP meant the three numbers to be integers.
The way I've implemented it right now, it can give a non-integer solution in certain cases.
The reason this gives non-integer solutions for x is because I had only maximized a, when actually, b also needs to be minimum (not only that, but also because I haven't placed a constraint on the x_i being integers. It is possible to use the integer root theorem, which would involve finding the prime factors, but makes things more complicated.)
Mathematica code in text
Clear[poly, disc, f]
poly = x^3 + a x^2 + b x + c;
disc = Discriminant[poly, x];
f[n_Integer] :=
Module[{p, \[CapitalDelta] = disc /. c -> -n},
p = poly /.
Maximize[{a, \[CapitalDelta] >= 0,
b > 0 && a < 0 && {a, b} \[Element] Integers}, {a, b}][[
2]] /. c -> -n;
Solve[p == 0]
]
There may be a clever way to find the tightest triplet, as Anders Lindahl is pursuing, but I will focus on a more basic approach.
If I generate all triplets, then I can filter them afterward however I want, so I will start there. The best way I know to generate these uses recursion:
f[n_, 1] := {{n}}
f[n_, k_] := Join ##
Table[
{q, ##} & ### Select[f[n/q, k - 1], #[[1]] >= q &],
{q, #[[2 ;; ⌈ Length##/k ⌉ ]] & # Divisors # n}
]
This function f takes two integer arguments, the number to factor n, and the number of factors to produce k.
The section #[[2 ;; ⌈ Length##/k ⌉ ]] & # Divisors # n uses Divisors to produce a list of all divisors of n (including 1), and then takes from these from the second (to drop the 1) to the Ceiling of the number of divisors divided by k.
For example, for {n = 240, k = 3} the output is {2, 3, 4, 5, 6, 8}
The Table command iterates over this list while accumulating results, assigning each element to q.
The body of the Table is Select[f[n/q, k - 1], #[[1]] >= q &]. This calls f recursively, and then selects from the result all lists that begin with a number >= q.
{q, ##} & ### (also in the body) then "prepends" q to each of these selected lists.
Finally, Join ## merges the lists of these selected lists that are produced by each loop of Table.
The result is all of the integer factors of n into k parts, in lexicographical order. Example:
In[]:= f[240, 3]
Out[]= {{2, 2, 60}, {2, 3, 40}, {2, 4, 30}, {2, 5, 24}, {2, 6, 20},
{2, 8, 15}, {2, 10, 12}, {3, 4, 20}, {3, 5, 16}, {3, 8, 10},
{4, 4, 15}, {4, 5, 12}, {4, 6, 10}, {5, 6, 8}}
With the output of the function/algorithm given above, one can then test triplets for quality however desired.
Notice that because of the ordering the last triplet in the output is the one with the greatest minimum factor. This will usually be the most "cubic" of the results, but occasionally it is not.
If the true optimum must be found, it makes sense to test starting from the right side of the list, abandoning the search if a better result is not found quickly, as the quality of the results decrease as you move left.
Obviously this method relies upon a fast Divisors function, but I presume that this is either a standard library function, or you can find a good implementation here on StackOverflow. With that in place, this should be quite fast. The code above finds all triplets for n from 1 to 10,000 in 1.26 seconds on my machine.
Instead of reinventing the wheel, one should recognize this as a variation of a well known NP-complete problem.
Compute the prime factors of n.
Compute the logarithms of these factors
The problem translates as partitioning these logs into three sums that are as close as possible.
This problem is known as a variation of the Bin Packing problem, known as Multiprocessor scheduling
Given the fact that the Multiprocessor scheduling problem is NP-complete, it's no wonder that it's hard to find an algorithm that does not search the whole problem space and finds the optimum solution.
But I guess there are already several algorithms that deal with either Bin-Packing or Multiprocessor-Scheduling and find near-optimum solutions in efficient manner.
Another related problem (generalization) is the Job shop scheduling. See the wikipedia description with many links to known algorithms.
What wikipedia describes as (the often-used LPT-Algorithm (Longest Processing Time) is exactly what Anders Lindahl came up with first.
EDIT
Here's a shorter explanation using more efficient code, KSetPartitions simplifies things considerably. So did some suggestions from Mr.W. The overall logic remains the same.
Assuming there a at least 3 prime factors of n,
Find the list of triplet KSetPartitions for the prime factors of n.
Multiply each of the elements (prime factors) within each subset to produce all possible combinations for three divisors of n (when multiplied they yield n). You can think of the divisors as the length, width and height of an orthogonal parallelepiped.
The parallelepiped closest to a cube will have the shortest space diagonal. Sum the squares of the three divisors for each case and pick the smallest.
Here's the code in Mathematica:
Needs["Combinatorica`"]
g[n_] := Module[{factors = Join ## ConstantArray ### FactorInteger[n]},
Sort[Union[Sort /# Apply[Times, Union[Sort /#
KSetPartitions[factors, 3]], {2}]]
/. {a_Integer, b_Integer, c_Integer} :>
{Total[Power[{a, b, c}, 2]], {a, b, c}}][[1, 2]]]
It can handle fairly large numbers, but slows down considerably as the number of factors of n grows. The examples below show timings for 240, 2400, ...24000000.
This could be sped up in principle by taking into account cases where a prime factor appears more than once in a divisor. I don't have the know-how to do it yet.
In[28]:= g[240]
Out[28]= {5, 6, 8}
In[27]:= t = Table[Timing[g[24*10^n]][[1]], {n, 6}]
Out[27]= {0.001868, 0.012734, 0.102968, 1.02469, 10.4816, 105.444}
does anyone know a good and efficient algorithm for equal k subsets algorithm ? preferably c or c++ which could handle a 100 element vector maybe with a complexity and time estimation
ex. 9 element vector
x = {2,4,5,6,8,9,11,13,14}
i need to generate all k=3 disjoint subsets with sum = 24
the algorithm should check if there are k disjoint subsets each with sum of elements 24, and list them in ascending order(in subset and between subsets) or to see if the solution doesn't exists
Solutions
solution 1: {2 8 14} {4 9 11} {5 6 13}
solution 2: {2 9 13} {4 6 14} {5 8 11}
Thanks
Unfortunately the constrained k-subset problem is a hard problem ... and if you want to generate all such k-subsets, you have no choice but to evaluate many possible candidates.
There are a couple of optimizations you can perform to reduce the search space.
Given a domain x constaining integer values,
Given a positive integer target M,
Given a positive integer k size for the subset,
When x only contains positive integers, and given a upper bound M, remove all items from x larger than or equal to M. These can't possibly be part of the subset.
Similarly, for k > 1, a given M, and x containing positive integers, remove all items from x which are larger than M + min0 + min1 ... minK. Essentially, remove all of the large values which can't possibly be part of the subset since even when selecting small values they will results in a sum in excess of M.
You can also use the even/odd exclusion principle to pare down your search space. For instance, of k is odd and M is even, you know that the sum will either contain three even numbers or two odd and one even. You can use this information to reduce the search space by eliminating candidate values from x that could be part of the sum.
Sort the vector x - this allows you to rapidly exclude values that can't possibly be included in the sum.
Many of these optimizations (other than the even/odd exclusion) are no longer useful/valid when the vector x contains negative values. In this case, you pretty much have to do an exhaustive search.
As Jilles De Wit points out, if X contains negative numbers you could add the absolute value of the smallest value in X to each member of X. This would shift all values back into positive range - making some of the optimizations I describe above possible again. This requires, however, that you are able to accurately represent positive values in the enlarged range. One way to achieve this would be to internally use a wider type (say long instead of int) to perform the subset selection search. If you do this, however, remember to scale the results subsets back down by this same offset when you return your results.