Maximum weighted bipartite matching, constraint: ordering of each graph is preserved - algorithm

Let's say I have two sets: (n_1, n_2, ...) and (m_1, m_2, ...) and a matching function match(n, m) that returns a value from 0 to 1. I want to find the mapping between the two sets such that the following constraints are met:
Each element must have at most 1 matched element in the opposite set.
Unmatched elements will be paired with a dummy element at cost 1
The sum of the match function when applied to all elements is maximal
I am having trouble expressing this formally, but if you lined up each set parallel to each other with their original ordering and drew a line between matched elements, none of the lines would cross. E.x. [n_1<->m_2, n_2<->m_3] is a valid mapping but [n_1<->m_2, n_2<->m_1] is not.
(I believe the first three are standard weighted bipartite matching constraints, but I specified them in case I misunderstood weighted bipartite matching)
This is relatively straight forward to do with an exhaustive search in exponential time (with respect to the size of the sets), but I'm hoping a polynomial time (ideally O((|n|*|m|)^3) or better) solution exists.
I have searched a fair amount on the "assignment problem"/"weighted bipartite matching" and have seen variations with different constraints, but didn't find one that matched or that I was able to reduce to one with this added ordering constraint. Do you have any ideas on how I might solve this? Or perhaps a rough proof that it is not solvable in polynomial time (for my purposes, a reduction to NP-complete would also work)?

This problem has been studied under the name "maximum-weight non-crossing matching". There's a simple quadratic-time dynamic program. Let M(a, b) be the value of an optimal matching on n1, …, na and m1, …, mb. We have the recurrence
M(0, b) = -b
M(a, 0) = -a
M(a, b) = max {M(a - 1, b - 1) + match(a, b), M(a - 1, b) - 1, M(a, b - 1) - 1}.
By tracing back the argmaxes, you can recover an optimal solution from its value.
If match has significantly fewer than a quadratic number of entries greater than -1, there is an algorithm that runs in time linearithmic in the number of useful entries (Khoo and Cong, A Fast Multilayer General Area Router for MCM Designs).

Related

Pathfinding task - how can I find next vertex on the shortest path from A to B faster that O ( n )?

I have a quite tricky task to solve:
You are given a N * M board (1 <= N, M <= 256). You can move from each field to it's neighbouring field (moving diagonally is not allowed). At the beginning, there are two types of fields: active and blocked. You can pass through active field, but you can't go on the blocked one. You have Q queries (1 <= Q <= 200). There are two types of queries:
1) find the next field (neighbouring to A) that lies on the shortest path from field A to B
2) change field A from active to blocked or conversly.
The first type query can be easily solved with simple BFS in O(N * M) time. We can represent active and blocked fields as 0 or 1, so the second query could be done in constant time.
The total time of that algorithm would be O(Q (number of queries) * N * M).
So what's the problem? I have a 1/60 second to solve all the queries. If we consider 1 second as 10^8 calculations, we are left with about 1,5 * 10^6 calculations. One BFS may take up to N * M * 4 time, which is about 2,5 * 10^5. So if Q is 200, the needed calculations may be up to 5 * 10^7, which is way too slow.
As far as I know, there is no better pathfinding algorithms than BFS in this case (well, I could go for an A*, but I'm not sure if it's much quicker than BFS, it's still worst-case O(|E|) - according to Wikipedia ). So there's not much to optimize in this area. However, I could change my graph in some way to reduce the amount of edges that the algorithm would have to process (I don't need to know the full shortest path, only the next move I should make, so the rest of the shortest path can be very simplified). I was thinking about some preprocessing - grouping vertices in a groups and making a graph of graphs, but I'm not sure how to handle the blocked fields in that way.
How can I optimize it better? Or is it even possible?
EDIT: The actual problem: I have some units on the board. I want to start moving them to the selected destination. Units can't share the same field, so one can block others' paths or open a new, better paths for them. There can be a lot of units, that's why I need a better optimization.
If I understand the problem correctly, you want to find the shortest path on a grid from A to B, with the added ability that your path-finder can remove walls for an additional movement cost?
You can treat this as a directed graph problem, where you can move into any wall-node for a cost of 2, and into any normal node for a cost of 1. Then just use any directed-graph pathfinding algorithm such as Dijkstra's or A* (the usual heuristic, manhatten distance, will still work)

Select the most elements that do not overlap so that the sum of their size is maximized

I'm trying to find an algorithm to the following problem.
Say I have a number of objects A, B, C,...
I have a list of valid combinations of these objects. Each combination is of length 2 or 4.
For eg. AF, CE, CEGH, ADFG,... and so on.
For combinations of two objects, eg. AF, the length of the combination is 2. For combination of four objects, eg CEGH, the length of the combination.
I can only pick non-overlapping combinations, i.e. I cannot pick AF and ADFG because both require objects 'A' and 'F'. I can pick combinations AF and CEGH because they do not require common objects.
If my solution consists of only the two combinations AF and CEGH, then my objective is the sum of the length of the combinations, which is 2 + 4 = 6.
Given a list of objects and their valid combinations, how do I pick the most valid combinations that don't overlap with each other so that I maximize the sum of the lengths of the combinations? I do not want to formulate it as an IP as I am working with a problem instance with 180 objects and 10 million valid combinations and solving an IP using CPLEX is prohibitively slow. Looking for some other elegant way to solve it. Can I perhaps convert this to a network? And solve it using a max-flow algorithm? Or a Dynamic program? Stuck as to how to go about solving this problem.
My first attempt at showing this problem to be NP-hard was wrong, as it did not take into account the fact that only combinations of size 2 or 4 were allowed. However, using Jim D.'s suggestion to reduce from 3-dimensional matching (3DM), we can show that the problem is nevertheless NP-hard.
I'll show that the natural decision problem form of your problem ("Given a set O of objects, and a set C of combinations of either 2 or 4 objects from O, and an integer m, does there exist a subset D of C such that all sets in D are pairwise disjoint, and the union of all sets in D has size at least m?") is NP-hard. Clearly the optimisation problem (i.e., your original problem, where we seek an actual subset of combinations that maximises m above) is at least as hard as this problem. (To see that the optimisation problem is not "much" harder than the decision problem, notice that you could first find the maximum m value for which a solution exists using a binary search on m in which you solve a decision problem at each step, and then, once this maximal m value has been found, solving a series of decision problems in which each combination in turn is removed: if the solution after removing some particular combination is still "YES", then it may also be left out of all future problem instances, while if the solution becomes "NO", then it is necessary to keep this combination in the solution.)
Given an instance (X, Y, Z, T, k) of 3DM, where X, Y and Z are sets that are pairwise disjoint from each other, T is a subset of X*Y*Z (i.e., a set of ordered triples with first, second and third components from X, Y and Z, respectively) and k is an integer, our task is to determine whether there is any subset U of T such that |U| >= k and all triples in U are pairwise disjoint (i.e., to answer the question, "Are there at least k non-overlapping triples in T?"). To turn any such instance of 3DM into an instance of your problem, all we need to do is create a fresh 4-combination from each triple in T, by adding a distinct dummy value to each. The set of objects in the constructed instance of your problem will consist of the union of X, Y, Z, and the |T| dummy values we created. Finally, set m to k.
Suppose that the answer to the original 3DM instance is "YES", i.e., there are at least k non-overlapping triples in T. Then each of the k triples in such a solution corresponds to a 4-combination in the input C to your problem, and no two of these 4-combinations overlap, since by construction, their 4th elements are all distinct, and by assumption of the. Thus there are at least m = k non-overlapping 4-combinations in the instance of your problem, so the solution for that problem must also be "YES".
In the other direction, suppose that the solution to the constructed instance of your problem is "YES", i.e., there are at least m non-overlapping 4-combinations in C. We can simply take the first 3 elements of each of the 4-combinations (throwing away the fourth) to produce a set of k = m non-overlapping triples in T, so the answer to the original 3DM instance must also be "YES".
We have shown that a YES-answer to one problem implies a YES-answer to the other, thus a NO-answer to one problem implies a NO-answer to the other. Thus the problems are equivalent. The instance of your problem can clearly be constructed in polynomial time and space. It follows that your problem is NP-hard.
You can reduce this problem to the maximum weighted clique problem, which is, unfortunately, NP-hard.
Build a graph such that every combination is a vertex with weight equal to the length of the combination, and connect vertices if the corresponding combinations do not share any object (i.e. if you can pick both them at the same time). Then, a solution is valid if and only if it is a clique in that graph.
A simple search on google brings up a lot of approximation algorithms for this problem, such as this one.

algorithm for concatenating a list of movies to a new list of concatenated movies of length no greater than a given number

I have a (play)list (A) of movie clips (a1,...,an) with different lengths. I want to create a new list (B) where clips (b1,...,bm) are concatenated from the clips in (A).
There is also a limit MAX_LEN that no bx in (B) may exceed. Only adjacent clips in a may be concatenated (a1+a2+a3 is a legal concatenation, a1+a3 is not). All clips in (A) must appear once in (B) and have to do so in the order they appeared in (A)
An optimal solution primary:
1) minimizes the number of clips in (B).
and secondary:
2) maximizes the duration of the shortest clip in (B).
The primary constraint 1) is more inportant than 2) so for 2 different solutions S1 and S2 where NumOfClips(S1) < NumOfClips(S1) then S1 is "more optimal" than S2 even if durationOfShortestClip(S1) < durationOfShortestClip(S2).
Here is an example that shows a input list (A) three possible outputs (B1) and (B2) and (B3). Nether of (B1) or (B2) fulfill 1) (although (B2) is better solution than (B1) since 25>23) The optimal solution is (B3).
I would like to know how to find an optimal solution in an efficient way?
Other help full information/clues such as the existence or non existence of optimal sub problems, etc are also appreciated.
for realized The primary constraint you can used greedy algorithm.because you should set first element clips in (A) to first element in (B),now if you have empty space in first element(B),and second element clips in (A) can set in first element is (B),so do this one,else set into second element in B.
repeat this solution to set all clips in (A) appear once in (B).
in this result you have a minimizes the number of clips in (B) by o(n).
for optimal solution you realized the secondary constraint you should maximizes the duration of the shortest.
assume greedy algorithm provide B,that B(i) is i st element item in the list.it is clearly that no clip in B(i) can not appear in B(i-1),so just last members in b(i) can appears in b(i+1).
so check that move is maximizes the duration of the shortest clip in (B) or not.
One very vague idea, but it at least proves your problem is polynomial. My solution is of the kind O(N^3 * log N * log L), where L is the sum of the lengths of all clips.
First of all find the appropriate minimum possible number of clip groups G - this is fairly simple. Just greadily position as many as possible clips in the first group and continue on with the next one. This will for sure produce the minimal value of G (that is criteria 1). However, optimality according to criteria 2) is still to be found.
Here is how it goes:
create matrix mat[i][j] having mat[i][j] = 1 iff the clips with indices between i and j have sum less than MAX_LEN. All other values in mat are 0.
You do a binary search of what is the minimal of the clips sums in all the groups. This step gives the factor of log L in my algorithm. Assume that at a given step the chosen value is M
make copy of mat as copy_mat. copy_mat[i][j] = 1 <=> mat[i][j] = 1 and SUM(clips i..j) >= M
raise this matrix to the power of the found number of clip groups G. The matrix multiplication gives factor of N^3 if implemented the easiest way. Raising to power adds additional log N factor.
if copy_mat[1][N] = 1 then there is a solution with M, try to increase it. Otherwise - decrease.
Decrease binary search step.
When the binary search finishes it will find the optimal value of M. If you need to find the exact grouping you will need to use one auxiliary matrix while doing the matrix multiplications, but I think you should be able to figure out this last bit by yourself.
I will keep on thinking of faster solution, but mine at least proves your problem is not exponential in complexity and will work with number of clips around 1000 relatively fast.

Point covering problem

I recently had this problem on a test: given a set of points m (all on the x-axis) and a set n of lines with endpoints [l, r] (again on the x-axis), find the minimum subset of n such that all points are covered by a line. Prove that your solution always finds the minimum subset.
The algorithm I wrote for it was something to the effect of:
(say lines are stored as arrays with the left endpoint in position 0 and the right in position 1)
algorithm coverPoints(set[] m, set[][] n):
chosenLines = []
while m is not empty:
minX = min(m)
bestLine = n[0]
for i=1 to length of n:
if n[i][0] <= minX and n[i][1] > bestLine[1] then
bestLine = n[i]
add bestLine to chosenLines
for i=0 to length of m:
if m[i] <= bestLine[1] then delete m[i] from m
return chosenLines
I'm just not sure if this always finds the minimum solution. It's a simple greedy algorithm so my gut tells me it won't, but one of my friends who is much better than me at this says that for this problem a greedy algorithm like this always finds the minimal solution. For proving mine always finds the minimal solution I did a very hand wavy proof by contradiction where I made an assumption that probably isn't true at all. I forget exactly what I did.
If this isn't a minimal solution, is there a way to do it in less than something like O(n!) time?
Thanks
Your greedy algorithm IS correct.
We can prove this by showing that ANY other covering can only be improved by replacing it with the cover produced by your algorithm.
Let C be a valid covering for a given input (not necessarily an optimal one), and let S be the covering according to your algorithm. Now lets inspect the points p1, p2, ... pk, that represent the min points you deal with at each iteration step. The covering C must cover them all as well. Observe that there is no segment in C covering two of these points; otherwise, your algorithm would have chosen this segment! Therefore, |C|>=k. And what is the cost (segments count) in your algorithm? |S|=k.
That completes the proof.
Two notes:
1) Implementation: Initializing bestLine with n[0] is incorrect, since the loop may be unable to improve it, and n[0] does not necessarily cover minX.
2) Actually this problem is a simplified version of the Set Cover problem. While the original is NP-complete, this variation results to be polynomial.
Hint: first try proving your algorithm works for sets of size 0, 1, 2... and see if you can generalise this to create a proof by induction.

Is this "Valid mathematical expression" problem P, or NP?

This question is purely out of curiosity. I am off school for the summer, and was going to implement an algorithm to solve this just for fun. That led to the above question, how hard is this problem?
The problem: you are given a list of positive integers, a set of mathematical operators and the equal sign(=). can you create a valid mathematical expression using the integers (in the same order) and the operators (any number of times)?
An example will should clarify any questions:
given: {2, 3, 5, 25} , {+, -, *, /} , {=}
output: YES
the expression (only one i think) is (2 + 3) * 5 = 25. you only need to output YES/NO.
I believe the problem is in NP. I say this because it is a decision problem (YES/NO answer) and I can find a non-deterministic poly time algorithm that decides it.
a. non-deterministically select a sequence of operators to place between the integers.
b. verify you answer is a valid mathematical expression (this can be done in constant
time).
In this case, the big question is this: Is the problem in P? (i.e. Is there a deterministic poly time algorithm that decides it?) OR Is the problem NP complete? (i.e. Can a known NP Complete problem be reduced to this? or equivalently Is every NP language poly time reducable to this problem?) OR neither? (i.e. problem in NP but not NP Complete)
Note: This problem statement assumes P not equal to NP. Also, although I am new to Stack Overflow, I am familiar with the homework tag. This is indeed just curiosity, not homework :)
An straightforward reduction from the Partition problem (which is NP-Complete) - given a set of N integers S, the input to the "Valid Math" problem would be - the elements of S, N-2 '+' operators and an '=' sign.
There seems to be some sort of confusion about how to check for NP-completeness. An NP-complete problem is at least as hard, in a particular sense, as any other problem in NP. Suppose we were comparing to 3SAT, as some posters are trying to do.
Now, reducing the given problem to 3SAT proves nothing. It is then true that, if 3SAT can be solve efficiently (meaning P=NP), the given problem can be solved efficiently. However, if the given problem can be solved efficiently, then perhaps it corresponds only to easy special cases of 3SAT.
We would have to reduce 3SAT to the given problem. This means that we would have to make up a rule to transform arbitrary 3SAT problems to examples of the given problem, such that the solution of the given problem would tell us how to solve the 3SAT problem. This means that 3SAT couldn't be harder than the given problem. Since 3SAT is the hardest possible, then the given problem must also be the hardest possible.
The reduction from the Partition problem works. That problem works like this: given a multiset S of integers, can we divide this into two disjoint subsets that between them include each member of S, such that the sums of the disjoint subsets are equal?
To do this, we construct a sequence beginning with 0, containing each element of S, and then 0. We use {+, -} as the operation set. This means that each element of S will be either added or subtracted to total to 0, meaning that the sum of the added elements is the same as the sum of the subtracted elements.
Therefore, this problem is at least as hard as the Partition problem, since we can solve a example Partition program if we can solve the given one, and is therefore NP-complete.
OK, first, you specify "set" of integers but a set is by definition unordered, so you mean a "list" of integers.
Also, I am going to make an assumption here which may be wrong, which is that the = sign always appears exactly once, between the second to last and the last integer on your list. If you allow the equals sign in the middle, it becomes more complicated.
Here is an actual proof that "Valid Mathematical Expression" (VME) is NP complete. We can do a reduction from Subset sum. NOTE that Wikipedia's definition of subset sum requires that the subset is non-empty. In fact, it is true that the more general problem of subset sum allowing empty subsets is NP complete, if the desired sum is also part of the input. I won't give that proof unless requested. Given the instance of subset sum {i_1, i_2, ..., i_n} along with desired sum s, create the following instance of VME:
{0, i_1, 0, i_2, 0, ..., i_n, s}, {+, *}, {=}
IF the instance of subset sum is solvable, then there is some subset of the integers that adds to 0. If the integer i1 is part of the sum, add it with its corresponding zero (immediately to the left) and if i1 is not part of the sum, multiply it. Between each zero and the term to the right, insert an addition sign.
Taking the Wikipedia example
{−7, −3, −2, 5, 8}
where { −3, −2, 5} sums to 0, we would encode it as
{0, -7, 0, -3, 0, -2, 0, 5, 0, 8, 0}
and the resulting expression would be
{0*7 + 0 + -3 + 0 + -2 + 0 + 5 + 0*8 = 0}
Now we also need to show that any solution to this instance of VME results in a solution to the instance of subset sum. This is easier than you think. When we look at a resulting expression, we can group the numbers into those which are multiplied with a 0 (including as part of a chain multiplication) and those that are not. Any number that is multiplied with a zero is not included in the final sum. Any number that is not multiplied with a zero must be added into the final sum.
So we have shown that this instance of VME is solvable IF and ONLY IF the corresponding instance of subset sum is solvable, so the reduction is complete.
EDIT: The Partition reduction (with the comment) works as well, and is better because it allows you to put the equals sign anywhere. Neat!
Don't have the time for the full answer right now, but you can describe a reduction from this problem to the Knapsack Problem.
Using dynamic programming you can achieve pseudo-polynomial time solution. Note that this does not conflict with the fact that the problem is indeed NP Complete.
There are two properties that need to be satisfied for it to be NP Complete.
A decision problem C is NP-complete if:
C is in NP, and
Every problem in NP is reducible to C in polynomial time.
We have established 1. 2 results from the fact that every problem in NP is reducible to 3SAT and 3SAT is reducible to the current problem.
Therefore it is NP-complete.
(edit) Answer to the comment below:
I will prove that SAT is reducible to the current problem, and since 3SAT is reducible to SAT, the result follows.
Input formula is the conjunction of the following expressions:
(x1 V x2 V x3 V ... xn V y1)
(x1 V x2 V x3 V ... xn V y2)
(x1 V x2 V x3 V ... xn V y3)
.
.
.
(x1 V x2 V x3 V ... xn V y64)
where each yi is a boolean based on what the order of the operators applied between all the xi's is.
i.e., yi can take a total of 4x4x4x4x1 values (assuming that only +, -, x, / are the operators and = is always the last operator; this can be changed if the operator set is modified to include other operators)
If none of the expressions is true, then the complete expression will evaluate to FALSE, and there is no way to check unless we substitute all possible values, i.e, x1 through xn as the n numbers and y1 through y64 as the various ways in which the operators can be applied (This takes care of order)
This conversion is in POLY-time, and the given boolean formula is satisfiable iff the mathematical expression is valid, etc.
Anyone notice a flaw?
This isn't really an answer to your complexity question, but your problem sounds a bit like the Countdown problem. A quick search turned up this paper: http://www.cs.nott.ac.uk/~gmh/countdown.pdf
I don't have to time to work out a proof at the moment, but a hunch tells me that it may not be in P. you can define a grammar for arithmetic, and then this question amounts to finding if there's a valid parse tree that uses all these terminals. i believe that that problem is in NP but outside of P.

Resources