It is a many-to-one assignment problem with N tasks and M people.
Each person can get multiple tasks, while each task can be assigned to only one person. We can earn a profit Pij if the task i is assigned to person j.
If T1, T2, ... , Tm is a partition of the tasks, and n1, n2, ..., nm are m positive integers. I want the optimum assignment such that the number of people assigned to any task in Ti must be less or equal to ni
If I understand your question correctly, this is a special case of the minimum-cost flow problem on a graph with three layers (in addition to a source and a sink layer).
From the source, you have a layer with M vertices, one for each person, connected to the source with edges of capacity 1 and no cost.
The next layer has the N tasks, and the i'th person is connected to the j'th task with an edge of capacity 1 and cost -P_ij.
The third layer contains m vertices, one for each part in your partition, and a task is connected to its part with an edge of capacity 1 and no cost.
Finally, the i'th part is connected to the sink with an edge of capacity n_i and no cost.
We haven't specified a demand, but we could simply try all possible demands between 0 and M and still be in P, so showing that it's not NP-hard is equivalent to showing that P ≠ NP.
Related
Given a complete bipartite graph G = (V1, V2; E), |V1|=|V2|=n and a non-negative cost for each edge the min cost bipartite matching problem finds a partition of G to n pairs of vertices connected by an edge, such that the total sum of the edges costs is minimized.
This problem can be solved using the min cost flow algorithm, by adding a source and sink vertices connected to each group with a weight 0 and a capacity 1.
But what if instead we get as an input a number m < n and want to find a partition of m pairs such that the total cost is minimized?
At first I thought we can just add another vertex at the beginning which is connected to the original source with weight 0 and capacity m and call it the new source, that way the maximum flow would be m and it should choose only m pairs.
However when I ran this algorithm using boost's min cost flow function a lot of times there were 2 big problems:
1) The flow in an edge wasn't always an integer (i.e. instead of 0 or 1 the flow was 0.5 for example).
2) There were many possible (non-integer) solutions so even for the same input with different order the algorithm outputted different results.
The moment I set m to be n both of these problems were resolved.
So my question is: is there a way to solve this problems and if not is there another algorithm that can solve the min cost bipartite matching with outliers problem?
I just found out the algorithm I described in the question and said that didn't work actually did work and it happened because of floating point error caused inside boosts min cost flow function, when I multiplied all the costs by 10000 all the problems were resolved.
Lets say I have a set of destinations and another corresponding set of origins. I need to link each destination with one origin.A set of vehicles start from each origin towards their respective destination.The speed of every vehicle is provided.
In the network no two vehicles moving in opposite directions are allowed to move on a particular road at any instance of time,in brief there should not be any collisions on a road,if such a situation arrives,any one of the two vehicles which can collide on the road can wait till the other vehicle is passed or take some another path to reach its destination.
The graph can be thought of a road network where each edge in the graph represents the road and vertices in the graph can be thought of intersection of the edges.
The aim is to calculate the minimum time required for each vehicle to reach its destination and also the path taken by each vehicle to reach its destination satisfying all the above constraints.
Ideas on a way to tackle that?
This is NP-hard.
The problem of deciding whether all cars can complete their trips in at most some given number k of time units is NP-hard, even under any combination of the following simultaneous restrictions: all cars travel at unit speed, every edge has length 1, k = 3. A problem being NP-hard means there's almost certainly no polynomial-time algorithm that solves every instance. To show this I'll give a reduction from the NP-hard problem 3SAT: In this problem, we are given a Boolean expression in the form of a conjunction (AND) of n clauses, each of which is a disjunction (OR) of 3 literals, each of which is either a variable or its negation (NOT). There are m variables overall, each of which we can assign to either TRUE or FALSE; our task is to determine whether the overall expression is satisfiable -- that is, whether there exists any assignment of TRUE or FALSE values to the m variables that causes the overall expression to be TRUE.
Constructing an instance of your problem from an arbitrary 3SAT instance
Suppose we have an instance of 3SAT with n clauses and m variables. We can construct an instance of your problem in which each variable becomes an edge, with the direction of traffic (left-to-right or right-to-left) along that edge corresponding to the value (TRUE or FALSE) of the variable. Each clause becomes a gadget that connects to both ends of 3 of these variable-edges. Intuitively, each clause-gadget gives a vehicle starting at its start point (think of this as being on the left) one of 3 options to reach its corresponding end point (think of this as being on the right). Specifically:
First, delete any clause that contains both a variable and its negation as literals. This serves to remove length-2 trips from the graph without affecting satisfiability of the original expression, since such a clause is satisfiable by any assignment.
For each variable x_i, create two vertices u_i and v_i, and the edge (u_i, v_i). All edges in this construction have weight (distance?) 1.
For all 1 <= j <= n, build a gadget corresponding to the jth clause as follows: Create a vertex s_j and a vertex t_j. Let the literals in the jth clause be a, b and c. Let x_i be the variable in literal a. If a is a positive literal, create the edges (s_j, u_i) and (v_i, t_j), otherwise (i.e., if it equals "NOT x_i") create the edges (s_j, v_i) and (u_i, t_j). Do likewise for literals b and c.
Finally, add (s_j, t_j) as a (source, destination) pair for each 1 <= j <= n. Give each such car unit speed.
I now claim that the original 3SAT instance is satisfiable if and only if the just-constructed instance of your problem has a solution with duration at most 3.
YES to 3SAT instance => YES to instance of your problem
First I'll show that if the 3SAT instance is satisfiable, then there exists a solution to the just-constructed instance of your problem with duration 3. In this case we can assume that a satisfying assignment Y exists, so for every 1 <= i <= m let y_i be the assignment to variable x_i in some such satisfying assignment. Now in the just-constructed instance, orient every edge incident on some s_j away from s_j, every edge incident on some t_j toward t_j, and each variable-edge as follows: If y_i = TRUE, then orient the edge (u_i, v_i) from u_i to v_i, while OTOH if y_i = FALSE, orient the edge from v_i to u_i. Since by assumption Y is a satisfying assignment, we know that every clause contains at least 1 literal that evaluates to TRUE: that is, in each clause there is at least 1 literal z containing a variable x_i such that either z is positive and x_i = TRUE, or z is negative and x_i = FALSE. This implies that, for every clause j, there is at least 1 path from s_j to t_j that agrees with the orientation of edges established above. Clearly, if cars only ever travel across edges in the direction given by the orientation above, there can never be two cars crossing an edge in opposite directions, so such car trips do not interfere with each other. Since these paths from s_j to t_j all have length 3 and no trips interfere with each other, all trips can be simultaneously completed in 3 time steps.
YES to instance of your problem => YES to 3SAT instance
Now I'll show that if the solution to the just-constructed instance of your problem has duration at most 3, then there exists a satisfying assignment to the original 3SAT instance. Assume that there is such a solution to the just-constructed instance: then clearly, every trip must be completed in at most 3 time units. For a car to get from s_j to t_j, it must use at least 1 of the 3 edges incident on s_j, and at least 1 of the 3 edges incident on t_j, so it must take at least 2 time units; furthermore, because we deleted any clauses containing both a variable and its negation, no vertex is adjacent to both an s_j and t_j for any j, so at least one more edge is required, meaning 3 time units is the shortest path we could hope for (since every edge takes 1 time unit). So every trip in the solution must take exactly 3 time units, along a 3-edge path that experiences no hold-ups due to cars coming the other way. Notice that the middle leg on such a path must be a single variable-edge, since the only other ways of getting from some u_i to v_i or vice versa involve "doubling back" via at least 2 more edges. In particular, for the trip starting at s_j, it must be one of the 3 distinct variable-edges corresponding to the 3 literals in clause j. Specifically, let the literals in the jth clause be a, b and c. Let x_i be the variable in literal a. If a is positive, then "from u_i to v_i" is one of the 3 permitted legs for the trip starting at s_j, otherwise (i.e., if a is negative) "from v_i to u_i" is. Likewise for the remaining literals b and c. So, thus far, we have established that:
For each 1 <= j <= n, a car can travel from s_j to t_j in at most 3 time units using one of the 3 middle legs corresponding to the literals in clause j.
We build a solution to the 3SAT instance as follows: For each variable x_i, if the edge (u_i, v_i) is crossed by one or more cars from u_i to v_i, assign TRUE to y_i; if it is crossed by one or more cars from v_i to u_i, assign FALSE to y_i; otherwise (if the edge is not crossed at all), arbitrarily assign either value to y_i. We need to show two things: that no variable is assigned both TRUE and FALSE, and that the assignment causes the expression to take the value TRUE.
First, the only condition under which a variable x_i could be assigned both TRUE and FALSE by the above rule is if at least one car traverses the edge (u_i, v_i) in each direction. Suppose towards contradiction that this is true: some variable-edge (u_i, v_i) is crossed in opposite directions by 2 different car trips in the solution. Then clearly at least one of the two cars must pause for at least 1 time step to let the other one through this edge. But then the solution would need at least 4 time steps, contradicting our assumption of a solution of duration at most 3 time steps, thus it must be that if any cars cross edge (u_i, v_i), then they all do so in the same direction, and thus each variable is assigned at most one of TRUE or FALSE.
Second, for each 1 <= j <= n, we can reinterpret the jth clause of the 3SAT instance as "A car can travel from s_j to t_j in at most 3 time units using one of the middle legs corresponding to the literals in clause j", where "corresponding" is used in the same sense as earlier. Observe that under this interpretation, the 3SAT instance is (a) equivalent to the statement in the bullet point above, which we have already established to be TRUE, and (b) still formally equivalent to the original 3SAT problem (since all we have done is given an interpretation to its variables and clauses).
It follows that the variable assignment for the 3SAT problem that we just built from the solution to the instance of your problem is free of contradictions and produces the value TRUE: i.e., the 3SAT formula is satisfiable.
Wrapping up
We have now established that a YES answer to the question "Does there exist a satisfying assignment for this 3SAT expression?" implies a YES answer to the question "Does there exist a way of getting all cars from their starting points to their destinations in 3 time steps or less?", and also that a YES answer to the latter question implies a YES answer to the former. Thus a NO answer to either question also implies a NO answer to the other: that is, the questions are equivalent. We constructed the instance of your problem in polynomial time from the given 3SAT instance, so if there was some algorithm that could solve your problem in polynomial time, it could also be used to solve any 3SAT instance in polynomial time -- by first constructing such an instance of your problem, calling this algorithm to solve that instance as a subroutine, and then returning the answer. Thus your problem is at least as hard as 3SAT, namely NP-hard.
I’m currently studying flow networks in the university, and my professor presented this theorem to us:
“Given a flow network, and a flow B in it, so that for each vertex, except the source and the sink: |∑(e:u→v) of B(e) - ∑(e':v→u) of B(e')|≤ε.
Note: this equation is for every v (vertex who is not the source or
the sink in the network). e:u→v means that I want the Sum of B(e)'s of
every edge who is, in a cutset, from the set of u to the set of v. and
then, e':v→u means that I want the Sum of B(e)'s of every edge who is,
in the same cutset, from the set of v to the set of u.
There exists a new flow, F, that for every edge in the graph, |F(e)-B(e)|<ε*N (where N is the number of vertexes in the graph).”
He claimed that a proof exists, but I can’t get to the bottom of it. I was thinking about the fact that Epsilon’s lower bound is the min cut of the graph, but all the other ideas I had we’re useless. I’d appreciate any help. I searched for the proof on the web but couldn’t find anything.
Thanks in advance,
Or
Given an antisymmetric assignment of quantities to edges, the excess of a vertex is the total quantity entering minus the total quantity exiting. For each vertex v with a negative excess -c, choose a path from the source s to v, multiply it by c, and add it to the assignment. For each vertex v with a positive excess c, choose a path from v to the sink t, multiply it by c, and add it to the assignment. It's straightforward to check that (1) all of the excesses are now zero, except at s and t (2) since every excess was less than epsilon in absolute value, the worst-case change for an edge is if it's involved in every path, for a total of less than epsilon times n, the number of vertices.
Considering a simple network flow model: G = (V,E), source node S, and sink node T. For each edge E[i], its capacity is C[i].
Then the flow F[i] on edge E[i] is constrained to be either C[i] or 0, that is, F[i] belongs to {0, C[i]}.
How to compute the maximum flow from S to T? Is this still a network flow problem?
The decision variant of your modified flow problem is NP-complete, as evidenced by the fact that the subset sum problem can be reduced to it: For given items w_1, ..., w_n and a sum W, just create a source S connected to every item i via an edge S -> i of capacity w_i. Then connect every item i to a sink t via another edge i -> t of capacity w_i. Add an edge t -> T of capacity W. There exists a subset of items with cumulative weight W iif the S-T max-flow in the graph is W with your modifications.
That said, there is likely no algorithm that solves this problem efficiently in every case, but for instances not specifically designed to be hard, you can try an integer linear program formulation of the problem and use a general ILP solver to find a solution.
There might be a pseudopolynomial algorithm if your capacities are integers bounded by a value polynomial in the input size.
Um, no its no longer a well defined flow problem, for the reason that Heuster gives, which is that given two edges connected through a node (with no other connections) the flow must be zero unless the two capacities equal each other. Most generic flow algorithms will fail as they cannot sequentially increase the flow.
Given the extreme restrictivity of this condition on a general graph, I would fall back on a game tree working backwards from the sink. Most nodes of the game tree will terminate quickly as there will be no combination of flows into a node that exactly match the needed outflows. With a reasonable heuristic you can probably find a reasonable search order and terminate the tree without having to search every branch.
In fact, you can probably exclude lots of nodes and remove lots of edges before you start, on the grounds that flows through certain nodes will be trivially impossible.
The DFA must have the following four properties:
The DFA has N nodes
Each node has 2 outgoing transitions.
Each node is reachable from every other node.
The DFA is chosen with perfectly uniform randomness from all possibilities
This is what I have so far:
Start with a collection of N nodes.
Choose a node that has not already been chosen.
Connect its output to 2 other randomly selected nodes
Label one transition 1 and the other transition 0.
Go to 2, unless all nodes have been chosen.
Determine if there is a node with no incoming connections.
If so, steal an incoming connection from a node with more than 1 incoming connection.
Go to 6, unless there are no nodes with no incoming connections
However, this is algorithm is not correct. Consider the graph where node 1 has its two connections going to node 2 (and vice versa), while node 3 has its two connection going to node 4 (and vice versa). That is something like:
1 <==> 2
3 <==> 4
Where, by <==> I mean two outgoing connections both ways (so a total of 4 connections). This seems to form 2 cliques, which means that not every state is reachable from every other state.
Does anyone know how to complete the algorithm? Or, does anyone know another algorithm? I seem to vaguely recall that a binary tree can be used to construct this, but I am not sure about that.
Strong connectivity is a difficult constraint. Let's generate uniform random surjective transition functions and then test them with e.g. Tarjan's linear-time SCC algorithm until we get one that's strongly connected. This process has the right distribution, but it's not clear that it's efficient; my researcher's intuition is that the limiting probability of strong connectivity is less than 1 but greater than 0, which would imply only O(1) iterations are necessary in expectation.
Generating surjective transition functions is itself nontrivial. Unfortunately, without that constraint it is exponentially unlikely that every state has an incoming transition. Use the algorithm described in the answers to this question to sample a uniform random partition of {(1, a), (1, b), (2, a), (2, b), …, (N, a), (N, b)} with N parts. Permute the nodes randomly and assign them to parts.
For example, let N = 3 and suppose that the random partition is
{{(1, a), (2, a), (3, b)}, {(2, b)}, {(1, b), (3, a)}}.
We choose a random permutation 2, 3, 1 and derive a transition function
(1, a) |-> 2
(1, b) |-> 1
(2, a) |-> 2
(2, b) |-> 3
(3, a) |-> 1
(3, b) |-> 2
In what follows I'll use the basic terminology of graph theory.
You could:
Start with a directed graph with N vertices and no arcs.
Generate a random permutation of the N vertices to produce a random Hamiltonian cycle, and add it to the graph.
For each vertex add one outgoing arc to a randomly chosen vertex.
The result will satisfy all three requirements.
There is a expected running time O(n^{3/2}) algorithm.
If you generate a uniform random digraph with m vertices such that each vertex has k labelled out-arcs (a k-out digraph), then with high probability the largest SCC (strongly connected component) in this digraph is of size around c_k m, where c_k is a constant depending on k. Actually, there is about 1/\sqrt{m} probability that the size of this SCC is exactly c_k m (rounded to an integer).
So you can generate a uniform random 2-out digraph of size n/c_k, and check the size of the largest SCC. If its size is not exactly n, just try again until success. The expected number of trials needed is \sqrt{n}. And generating each digraph should be done in O(n) time. So in total the algorithm has expected running time O(n^{3/2}). See this paper for more details.
Just keep growing a set of nodes which are all reachable. Once they're all reachable, fill in the blanks.
Start with a set of N nodes called A.
Choose a node from A and put it in set B.
While there are nodes left in set A
Choose a node x from set A
Choose a node y from set B with less than two outgoing transitions.
Choose a node z from set B
Add a transition from y to x.
Add a transition from x to z
Move x to set B
For each node n in B
While n has less than two outgoing transitions
Choose a node m in B
Add a transition from n to m
Choose a node to be the start node.
Choose some number of nodes to be accepting nodes.
Every node in set B can reach every node in set B. As long as a node can be reached from a node in set B and that node can reach a node in set B, it can be added to the set.
The simplest way that I can think of is to (uniformly) generate a random DFA with N nodes and two outgoing edges per node, ignoring the other constraints, and then throw away any that are not strongly connected (which is easy to test using a strongly connected components algorithm). Generating uniform DFAs should be straightforward without the reachability constraint. The one thing that could be problematic performance-wise is how many DFAs you would need to skip before you found one with the reachability property. You should try this algorithm first, though, and see how long it ends up taking to generate an acceptable DFA.
We can start with a random number of states N1 between N and 2N.
Assume the initial state the as the state number 1.
For each state, for each character in the input alphabet we generate a random transition (between 1 and N1).
We take the connex automaton starting from the initial state. We check the number of states, and after few tries we get one with N states.
If we wish a minimal automaton too, remains only the assignment of final states, however there are great chances that a random assignment gets a minimal automaton as well.
The following references seem to be relevant to your question:
F. Bassino, J. David and C. Nicaud, Enumeration and random generation of possibly incomplete deterministic automata, Pure Mathematics and Applications 19 (2-3) (2009) 1-16.
F. Bassino and C. Nicaud. Enumeration and Random Generation of Accessible Automata. Theor. Comp. Sc.. 381 (2007) 86-104.