Algorithm - Balancing a disconnected bipartite graph - algorithm

I have a bipartite graph G. I would like to find a fast algorithm (asymptotically) to find an assignment of all the vertices of G into two sets X and Y such that the complete bipartite graph formed by the sets X and Y has as many edges as possible.
Slightly longer explanation:
G is bipartite and consists of a set connected components (each of which are bipartite, obviously). We want to decide on a positioning (for lack of a better word) of each component into X and Y. After deciding upon all the positionings, we complete the bipartite graph (i.e. we connect every vertex of X to every vertex of Y). We then count out how many edges are there totally (including original edges) and we want to maximize this count. Simple math shows that the number of edges would be |X|*|Y|.
My thought process for a solution:
As the number of components increase, the number of choices for G increases exponentially. However, if we take number of connected components of G to be equal to number of nodes in G, then the solution is simple - split so that number of nodes in X and Y are equal (or almost equal in case of odd number of nodes in G). This makes me want to generalize that the problem is the same thing as trying to minimize the difference in cardinalities of X and Y (which can be solved as in this SO question). However, I have been unsuccessful in proving so.

Let's decompose the problem.
Your graph is actually a set of connected components, each connected
component is (U_i,V_i,E_i).
To maximize the number of edges, you need to maximize the value of
|X|*|Y|
To get the maximal value of |X|*|Y|, you obviously need to use all
vertices (otherwise, by adding another vertex, you get a bigger value).
Your freedom of choice is actually to choose for each component i - if you should add U_i to X, and V_i to Y - or vise versa.
So, what you are actually trying to do is:
maximize:
sum { x_i * |V_i| : for each component i} * sum { y_i * |U_i| : for each component i}
subject to constraints:
x_i, y_i in {0,1} for all i
x_i + y_i = 1 for all i
The value we want to maximize behaves similar to the function f(x) = x(k-x), because if we increase |X|, it comes at the expanse of decreasing |Y|, and by the same amount. This function has a single maximum:
f(x) = xk - x^2
f'(x) = k - 2x = 0 ---> x = k/2
Meaning, we should distribute the nodes such that the cardinality (size) of X and Y are closest as possible to each other (and use all the vertices).
This can be reduced to Partition Problem:
Given U_1,V_1,U_2,V_2,...,U_k,V_k
Create an instance of partition problem:
abs(|U_1| - |V_1|), abs(|U_2| - |V_2|), ... , abs(|U_k| - |V_k|)
Now, the optimal solution found to partition problem can be translated directly to which of U_i,V_i to include in which set, and will make sure the difference in sizes is kept to minimum.

Related

finding largest region with small difference more efficiently

There is an grid consisting of h * w (h, w <= 200) pixels, every pixel is represented by a value, we want to find the largest continuous region.
A continuous region is defined in this way:
Given a point P(x, y), The connected region must include this point.
There exists reference point R(x, y) of value v, any point in the connected region must be connected to this point. Also, there is a value g_critical(g_critical <= 100000). Let the value of a point in the connected region be v, the difference
of u and v must be smaller or equal than g_critical.
The question is to find the size of the largest connected region.
For example the grid. h = 5, w = 5, g_critical = 3, P(x, y) = (2, 4)
1 3 7 9 2
2 5 6 6 8
3 5 9 3 6
2 7 3 2 9
In this case, the bold region is the largest connected region. Notice that R(x, y) is chosen at (2, 3) or (2, 2) in this case. The size of the region is 14.
I have rephrased the question a bit so it is shorter. So if there is any ambiguity, please point it out in the comment. This question is also in our private judge so I am unable to share the problem source here.
My attempt
I have tried to loop through every cell, consider it as the R point and use bfs to find the connected region attached to it. Then, check if P is contained in the region.
The complexity is O(h * h * w * w), which is too large. So any way to optimize it?
I am guessing that maybe starting with p will help, but I am not sure how I should do it. Maybe there is some kind of flood fill algorithms that allow me to do it?
Thanks in advance.
There's an O(h w √(g_critical) α(h w))-time algorithm (where α is the inverse Ackermann function, constant for practical purposes) that uses a disjoint set data structure with an "undo" operation and a variant of Mo's trick. The idea is, decompose the interval [v − g_critical, v] into about √g_critical subintervals of length about √g_critical. For each subinterval [a, b], prepare a disjoint set data structure representing the components of the matrix where the allowed values are [b, a + 2 g_critical]. Then for each c in [a, b], extend the disjoint set with points whose values lie in [c, b) and (a + 2 g_critical, c + 2 g_critical] and report the number of nodes in the component of P(x,y), then undo these operations (keep a stack of the writes made to the structure, with original values; then pop each one, writing the original values).
There's also an O(h w log(h w))-time algorithm that you're not going to like because it uses dynamic trees. (Sleator–Tarjan's 1985 construction based on splay trees is the simplest and works OK here.) Posting it mainly in case it inspires a more practical approach.
The high-level idea is a kinetic algorithm that "slides" the interval of allowed values over the at most g_critical + 1 possibilities, repeatedly reporting and taking the max over the size of the connected component containing P.
To do this, we need to maintain the maximum spanning forest on a derived graph. Given a node-weighted undirected graph, construct a new, edge-weighted graph by subdividing each edge and setting the weight of each new edge to the weight of the old node to which it is incident. Deleting the lightest nodes in the graph is straightforward – all paths avoid these nodes if possible, so the new maximum spanning forest won't have any more edges. To add the lightest nodes not in the graph, try to link their incident edges one at a time. If a cycle would form, evert one endpoint, query the path minimum from the other endpoint, and unlink that minimum.
To report the size of the component containing P we need another decoration that captures the size of the concrete subtree (as opposed to the represented subtree) of each node. The details get a bit gnarly.
Here's some heuristics which might help:
First some pre-processing in O(h*w*log(h*w)):
Store matrix values in an array, sort it
Now every value in array is a possible candidate for point R
Also maximum component will be of values in range [R-critical, R+critical]
So we can estimate size of component (in best case) with simple binary search
Now some heuristics:
Sort array again this time by estimated component size in descending order
Now try BFS in this order if estimated size is bigger than currently found best size

NP-Complete reduction

The problem states that we want to show that Independent Set poly-time reduces to Relative Prime Sets, more formally Independent Set <p Relative Prime Sets.
I need to provide a reduction f from ind.set to rel. prime sets, where
- input of f must be a Graph G and an integer k, where k denotes the size of an independent set.
- output of f must be a set S of integers and an integer t, where t denotes the number of pairwise relative prime numbers in the set S.
Definition of relative prime sets (decision version):
it takes a set P of n-integers and an integer t from 1 to n.
returns yes if there's a subset A of P, with t-many pairwise relative
primes. That is, for all a, b in A, it must be true that gcd(a, b) =
1.
returns no otherwise
So far I have come-up with what I believe is a reduction, but I am not sure if it is valid and I want to double check it with someone who knows how to do this.
Reduction:
Let G be a graph.Let k indicate the size of an independent set. Then we
want to find-out if there exists an independent set of size k in G.
Since this problem is NP-Complete, if we can solve another NP-Complete
problem in poly-time, we know that we can also solve Independent Set
in poly-time. So we chose to reduce independent set to Relative Prime
Sets.
We take the graph G and label its vertices from 1 to n as pr the
definition of the input for relative prime sets. Then we find the gcd
of each node to every other node in G. We draw an edge between the
nodes that have gcd(a, b) = 1. When the graph is complete, we look at
the nodes and determine which nodes are not connected to each other
via an edge. We create sets for those nodes. We return the set
containing the most nodes along with an integer t denoting the number
of integers in the set. This is the set of the most relative prime
numbers in the graph G and also the greatest independent set of G.
Suppose two graphs, each of four nodes. On graph one, the nodes are connected in a line so that the max-independent set is 2. Graph two is a complete graph each node is connected to each other node, so the max-independent set is 1.
It sounds like your reduction would result in the same set for each graph, leading to an incorrect result for independent set.
equation,S=k*lnW discrete logaritm can`t be broken because is corelated with informational entropy

Find sub-array of objects with maximum distance between elements

Let be an array of objects [a, b, c, d, ...] and a function distance(x, y) that gives a numeric value showing how 'different' are objects x and y.
I'm looking for an algorithm to find the subset of the array of length n that maximizes the minimum difference between that subset element.
Of course, I can't simply sort the array by the minimum of the differences with other elements and take the n highest entries, since removing an element can very well change the distances. For instance, if a=b, then removing a means the minimum distance of b with another element will change dramatically.
So far, the only solution I could find was wether to iteratively remove the element with the lowest minimum distance and re-calculate the distance at each iteration, until only n elements are left, or, vice-versa, to iteratively pick new elements, recalculate the distances, add the new pick or replace an existing one based on the distance minimums.
Does anybody know how I could get the same results without those iterations?
PS: here is an example, the matrix shows the 'distance' between each element...
a b c d
a - 1 3 2
b 1 - 4 2
c 3 4 - 5
d 2 2 5 -
If we'd keep only 2 elements, that would be c and d; if we'd keep 3, that would be a or b, c and d.
This problem is NP-hard, so no-one knows an efficient (polynomial time) algorithm for solving it.
Here's a quick sketch that it is NP-hard, by reduction from CLIQUE.
Suppose we have an instance of CLIQUE in the form of a graph G and a number n and we want to know whether there is a clique of size n in G. Construct a distance matrix d such that d(i, j) = 1 if vertices i and j are connected in G, or 0 if they are not. Then find a subset of the vertices of G of size n that maximizes the minimum distance between elements (your problem). If the minimum distance between vertices in this subset is 1, then G has a clique of size n; otherwise it does not.
As Gareth said this is an NP-hard problem, however there has been a lot of research into solving these kind of problems and as such better methods than brute force have been found. Unfortunately this is such a large area that you could spend forever looking at the possible implementations of a solutions.
However if you are interested in a heuristic way of solving this I would suggest looking into Ant Colony Optimization (ACO) which has proven fairly effective at finding optimum paths within graphs.

Find the maximum number of edges in the graph

There are 'n' vertices and 0 edges of an undirected graph. What can be the maximum number of edges that we can draw such that the graph remains disconnected.
I have made the solution that we can exclude one vertex and can find the maximum number of edges between n-1 vertices of undirected graph, so that the graph still remains disconnected.
which is n(n-1)/2 for n vertices and will be (n-1)(n-2)/2 for n-1 vertices.
Can there be a better solution?
You can resolve this using analysis. Take your idea and generalize it. You divide the n vertices in two groups , of size x and n-x.
Now the number of edges is a function of x, expressed by
f(x)= x(x-1)/2 + (n-x)(n-x-1)/2
f(x) = 1/2(2x^2 - 2nx +n^2 - n)
The value which maximize this function is the partition size you want. If you make calculation you find that it decrease from x=0 to x=n/2, then increase to x=n. As x = 0 or x = n means the graph is collected, you take the next greatest value which is x=1. So your intuition is optimal.
Your solution should be the best solution.
Because any new edge added must have the nth vertex at one end.
If graph can have multi edges, answer is infinity for n>=3.
If it can also contain self-loops, answer is infinity for n>=2,
If none of those holds your solution is correct.

How can I generate a random DFA with uniform distribution?

I need to generate a Deterministic Finite Automata (DFA), selected from all possible DFAs that satisfy the properties below. The DFA must be selected with uniform distribution.
The DFA must have the following four properties:
The DFA has N nodes.
Each node has 2 outgoing transitions.
Each node is reachable from every other node.
The DFA is chosen with perfectly uniform randomness from all possibilities.
I am not considering labeling of nodes or transitions. If two DFAs have the same unlabeled directed graph they are considered the same.
Here are three algorithms that don't work:
Algorithm #1
Start with a set of N nodes called A.
Choose a node from A and put it in set B.
While there are nodes left in set A
- 3.1 Choose a node x from set A
- 3.2 Choose a node y from set B with less than two outgoing transitions.
- 3.3 Choose a node z from set B
- 3.4 Add a transition from y to x.
- 3.5 Add a transition from x to z
- 3.6 Move x to set B
For each node n in B
- 4.1 While n has less than two outgoing transitions
- 4.2 Choose a node m in B
- 4.3 Add a transition from n to m
Algorithm #2
Start with a directed graph with N vertices and no arcs.
Generate a random permutation of the N vertices to produce a random Hamiltonian cycle, and add it to the graph.
For each vertex add one outgoing arc to a randomly chosen vertex.
Algorithm #3
Start with a directed graph with N vertices and no arcs.
Generate a random directed cycle of some length between N and 2N and add it to the graph.
For each vertex add one outgoing arc to a randomly chosen vertex.
I created algorithm #3 based off of algorithm #2, however, I don't know how to select the random directed cycle to create a uniform distribution. I don't even know if it's possible.
Any help would be greatly appreciated.
If N is small (there are N^(2N) possible sets of arcs that meet the first two conditions, so you would want this to be less than the range of your random number generator) you can generate random DFAs and discard the ones that don't satisfy the reachability condition.

Resources