Algorithm/approximation for combined independent set/hamming distance - algorithm

Input: Graph G
Output: several independent sets, so that the membership of a node to all independent sets is unique. A node therefore has no connections to any node in its own set. Here is an example path.
Since clarification was called for here another rephrasal:
Divide a given graph into sets so that
i can tell a node from all others by its membership in sets e.g. if node i is present only in set A no other node should be present in set A only
if node j is present in set A and B then no other node should be present in set A and B only. if the membership of nodes is coded by a bit pattern, then these bit patterns have hamming distance at least one
if two nodes are adjacent in the graph, they should not be present in the same set, hence be an independent set
B has no adjacent nodes
D=>A, A=>D
A B /
/ B D
A has bit pattern 10 and no adjacent node in its set. B has bit pattern 11 and no adjacent node, D has 01
therefore all nodes have hamming distance at least 1 an no adjacent nodes => correct
Wrong, because D and A are connected:
/ D B
A has bit pattern 10 and D in its set, they are adjacent. B has bit pattern 11 and no adjacent node, D has 11 as has B, so there are two errors in this solution and therefore it is not accepted.
Of course this should be extended to more Sets as the number of Nodes in the Graph increases, since you need at least log(n) sets.
I already wrote a transformation into MAX-SAT, to use a sat-solver for this. but the number of clauses is just to big. A more direct approach would be nice. So far I have an approximation, but I would like an exact solution or at least a better approximation.
I have tried an approach where I used a particle swarm to optimize from an arbitrary solution towards a better one. However the running time is pretty awful and the results are far from great. I am looking for a dynamic algorithm or something, however i cannot fathom how to divide and conquer this problem.

Not a complete answer, and I don't know how useful it will be to you. But here goes:
The hamming distance strikes me as a red herring. Your problem statement says it must be at least 1 but it could be 1000. It suffices to say the bit encoding for each node's set memberships is unique.
Your problem statement doesn't spell it out, but your solution above suggests every node must be a member of at least 1 set. ie. a bit encoding of all 0's is not allowed for any node's set memberships.
Ignoring connected nodes for a moment, disjoint nodes are easy: Simply number them sequentially with an unused bit encoding. Save those for last.
Your example above uses directed edges, but again, that strikes me as a red herring. If A cannot be in the same set as D because A=>D, D cannot be in the same set as A regardless whether D=>A.
You mention needing at least log(N) sets. You will also have at most N sets. A fully connected graph (with (N^2-N)/2 undirected edges) will require N sets each containing a single node.
In fact, if your graph contains a fully connected simplex of M dimensions (M in 1..N-1) with M+1 vertices and (M^2+M)/2 undirected edges, you will require at least M+1 sets.
In your example above, you have one such simplex (M=1) with 2 vertices {A,D} and 1 (undirected) edge {(A,D)}.
It would seem that your problem boils down to finding the largest fully connected simplexes in your graph. Stated differently, you have a routing problem: How many dimensions do you need to route your edges so none cross? It doesn't sound like a very scalable problem.
The first large simplex found is easy. Every vertex node gets a new set with its own bit.
The disjoint nodes are easy. Once the connected nodes are dealt with, simply number the disjoint nodes sequentially skipping any previously used bit patterns. From your example above, since A and D take 01 and 10, the next available bit pattern for B is 11.
The tricky part then becomes how to fold any remaining simplexes as much as possible into the existing range before creating any new sets with new bits. When folding, one must use 2 or more bits (sets) for each node, and the bits (sets) must not intersect with the bits (sets) for any adjacent node.
Consider what happens to your example above when one adds another node, C, to the example:
If C connects directly to both A and D, then the initial problem becomes finding the 2-simplex with 3 vertices {A,C,D} and 3 edges {(A,c),(A,D),(C,D)}. Once A, C and D take the bit patterns 001, 010 and 100, the lowest available bit pattern for disjoint B is 011.
If, on the other hand, C connects directly A or D but not both, the graph has two 1-simplexes. Supposing we find the 1-simplex with vertices {A,D} first giving them the bit patterns 01 and 10, the problem then becomes how to fold C into that range. The only bit pattern with at least 2 bits is 11, but that intersects with whichever node C connects to so we have to create a new set and put C in it. At this point, the solution is similar to the one above.
If C is disjoint, either B or C will get the bit pattern 11 and the remaining one will need a new set and get the bit pattern 100.
Suppose C connects to B but not to A or D. Again, the graph has two 1-simplexes but this time disjoint. Suppose {A,D} is found first as above giving A and D the bit patterns 10 and 01. We can fold B or C into the existing range. The only available bit pattern in the range is 11 and either B or C could get that pattern as neither is adjacent to A or D. Once 11 is used, no bit patterns with 2 or more bits set remain, and we will have to create a new set for the remaining node giving it the bit pattern 100.
Suppose C connects to all 3 A, B and D. In this case, the graph has a 2-simplex with 3 vertexes {A,C,D} and a 1-simplex with 2 vertexes {B, C}. Proceeding as above, after processing the largest simplex, A, C and D will have bit patterns 001, 010, 100. For folding B into this range, the available bit patterns with 2 or more bits set are: 011, 101, 110 and 111. All of these except 101 intersect with C so B would get the bit pattern 101.
The question then becomes: How efficiently can you find the largest fully-connected simplexes?
If finding the largest fully connected simplex is too expensive, one could put an approximate upper bound on potential fully connected simplexes by finding maximal minimums in terms of connections:
Sweep through the edges updating the
vertices with a count of the
connecting edges.
For each connected node, create an array of Cn counts initially zero
where Cn is the count of edges
connected to the node n.
Sweep through the edges again, for the connected nodes n1 and n2,
increment the count in n1
corresponding to Cn2 and vice versa.
If Cn2 > Cn1, update the last count
in the n1 array and vice versa.
Sweep through the connected nodes again, calculating an upper bound on
the largest simplex each node could
be a part of. You could build a pigeon-hole array with a list of vertices
for each upper bound as you sweep through the nodes.
Work through the pigeon-hole array from largest to smallest extracting and
folding nodes into unique sets.
If your nodes are in a set N and your edges in a set E, the complexity will be:
O(|N|+|E|+O(Step 5))
If the above approximation suffices, the question becomes: How efficiently can you fold nodes into existing ranges given the requirements?

This maybe not the answer you might expect, but I can't find a place to add a comment. So I type it directly here. I can't fully understand your question. Or does it need specific knowledge to understand? What is this independent set? As I know a node in an independent set from a directed graph have a two way path to any other node in this set. Is your notion the same?
If this problem is like what I assume, independent sets can be found by this algorithm:
1. do depth-first search on the directed graph, records the time of tree rooted by this node is traversed.
2. then reverse all the edges in this graph
3. do depth-frist search again on the modified graph.
The algorihtm is precisely explained by book "introduction to alogrithm"


What algorithm should I use to get all possible paths in a directed weighted graph, with positive weights?

I have a directed weighted graph, with positive weights, which looks something like this :-
What I am trying to do is:-
Find all possible paths between two nodes.
Arrange the paths in ascending order, based on their path length (as given by the edge weights), say top 5 atleast.
Use an optimal way to do so, so that even in cases of larger number of nodes, the program won't take much time computing.
E.g.:- Say my initial node is d, and final node is c.
So the output should be something like
d to c = 11
d to e to c = 17
d to b to c = 25
d to b to a to c = 31
d to b to a to f to c = 38
How can I achieve this?
The best approach would be to take the Dijkstra’s shortest path algorithm, we can get a shortest path in O(E + VLogV) time.
Take this basic approach to help you find the shortest path possible:
Look at all nodes directly adjacent to the starting node. The values carried by the edges connecting the start and these adjacent nodes are the shortest distances to each respective node.
Record these distances on the node - overwriting infinity - and also cross off the nodes, meaning that their shortest path has been found.
Select one of the nodes which has had its shortest path calculated, we’ll call this our pivot. Look at the nodes adjacent to it (we’ll call these our destination nodes) and the distances separating them.
For every ending (destination node):
If the value in the pivot plus the edge value connecting it totals less than the destination node’s value, then update its value, as a new shorter path has been found.
If all routes to this destination node have been explored, it can be crossed off.
Repeat step 2 until all nodes have been crossed off. We now have a graph where the values held in any node will be the shortest distance to it from the start node.
Find all possible paths between two nodes
You could use bruteforce here, but it is possible, that you get a lot of paths, and it will really take years for bigger graphs (>100 nodes, depending on a lot of facotrs).
Arrange the paths in ascending order, based on their path length (as given by the edge weights), say top 5 atleast.
Simply sort them, and take the 5 first. (You could use a combination of a list of edges and an integer/double for the length of the path).
Use an optimal way to do so, so that even in cases of larger number of nodes, the program won't take much time computing.
Even finding all possible paths between two nodes is NP-Hard (Source, it's for undirected graphs, but is still valid). You will have to use heuristics.
What do you mean with a larger number of nodes? Do you mean 100 or 100 million? It depends on your context.

Find all cycles based on label of edge in a disconnected labeled Un-directed graph?

1.The condition for a cycle to be valid in this graph is such that the edges forming a cycle should have at least one label common between all of them.
2. Loops are not considered as cycles.
3. Graph may have lots of disconnected components .
Consider the following graph
The valid cycles are
1. C,D,E(since T3 is common among them).
2. F,G,H (T4 is common among them).
The invalid cycles are
1. A (loops are not considered as cycles)
2. A,B,C (As no common labels are found).
The goal is to find those valid cycles and store the vertices along with common labels that formed the cycle separately(may be in a hash table with vertices of cycle as key and common labels as values).
what will be the best cycle detection algorithm that can be applied to this kind of problem.
Thanks in advance.
The simplest solution that I can think is separate those graph based on the edge label.
For example you have this graph:
A --T1-- B
B --T2,T1-- C
C --T3,T1-- A
As there are three labels, create three graphs based on the edge.
A --T1-- B
B --T1-- C
C --T1-- A
B --T2-- C
C --T3-- A
After that you can do Depth First Search (DFS) algorithm to find the cycle.
I would suggest to use the algorithm described in this article about finding all cycles in an undirected graph.
However, the small modification needed is that you must run the algorithm described k iterations where k is the number of distinct labels. Additionally, each time you run the algorithm, only consider edges with the label T_i.
The running time with the modification is O(k(M + N)). Note that this side-steps the need to "create" k separate graphs as described by malioboro.

Use O(n^2) time to fix a mistake in bipartite matching

This is a problem from Algorithm Design book.
Given a bipartite graph with vertices G=(V,E) where V=(A,B) such that |A|=|B|=n.
We manage to perfectly match n-2 nodes in A to n-2 nodes in B. However, for the remaining two nodes in A we map them both to a certain node in B (not one of the n-2 nodes in B that are already matched to.)
Given the information from the "matching" above, how to use O(n^2) time to decide whether a perfect matching between A and B actually exists? A hint is fine. Thank you.
Let's have u and v be the two nodes in A that match to the same node x in B. Pick one of those two nodes - call it u - and remove the edge to x from the matching. You are now left with a graph where you have a matching between n - 1 of the nodes from A and n - 1 of the nodes from B. The question now is whether you can extend this matching to make it even bigger.
There's a really nice way to do this using Berge's theorem, which says that a matching in a graph is maximum if and only if there is no alternating path between two unmatched nodes. (An alternating path is one that alternates between using edges not included in the matching and edges included in the matching). You can find a path like this by starting from the node u and trying to find a path to x by doing a modified binary search, where when you go from A to B you only follow unmatched edges and when you go from B back to A you only follow matched edges. If an alternating path exists from u to x, then you'll be sure to find it this way, and if no such path exists, then you can be certain of that as well.
If you do find an alternating path from u to x, you can "flip" it to increase the size of the matching by one. Specifically, take all the edges in the path that aren't in the matching and add them in, and take all the edges that were in the matching and delete them. The resulting is still a valid matching that has one more edge in it than what you started with (if you don't see why this is, play around with some examples and see what you find, or look at the proof of Berge's theorem).
Overall, this approach will require time O(m + n), where m is the number of edges in the graph and n is the number of nodes. The number of edges m is at most O(n2) in a bipartite graph, so this matches your time bound (and, in fact, is actually a bit tighter!)
Transform this problem to the max flow min cut problem by adding a source s which is connected to A by unit capacity edges and a sink t to which B is connected by unit capacity edges.
As templatetypedef said in their answer, we already have a flow of size n-1 on this network.
The problem is now to determine whether the size of the flow can be increased to n. This can be achieved by running one round of Edmonds-Karp heuristic which takes O(E)=O(n^2) time (i.e find the shortest path in the residual graph of the flow of size n-1 above and look for the bottleneck edge.)

What is meant by the set of all possible configuration in a given graph G

I'm trying to understand a Solved exercise 2, Chapter 3 - Algorithm design by tardos.
But i'm not getting the idea of the answer.
In short the question is
We are given two robots located at node a & node b. The robots need to travel to node c and d respectively. The problem is if one of the nodes gets close to each other. "Let's assume the distance is r <= 1 so that if they become close to each other by one node or less" they will have an interference problem, So they won't be able to transmit data to the base station.
The answer is quite long and it does not make any sense to me or I'm not getting its idea.
Anyway I was thinking can't we just perform DFS/BFS to find a path from node a to c, & from b to d. then we modify the DFS/BFS Algorithm so that we keep checking at every movement if the robots are getting close to each other?
Since it's required to solve this problem in polynomial time, I don't think this modification to any of the algorithm "BFS/DFS" will consume a lot of time.
The solution is "From the book"
This problem can be tricky to think about if we view things at the level of the underlying graph G: for a given configuration of the robots—that is, the current location of each one—it’s not clear what rule we should be using to decide how to move one of the robots next. So instead we apply an idea that can be very useful for situations in which we’re trying to perform this type of search. We observe that our problem looks a lot like a path-finding problem, not in the original graph G but in the space of all possible configurations.
Let us define the following (larger) graph H. The node set of H is the set of all possible configurations of the robots; that is, H consists of all possible pairs of nodes in G. We join two nodes of H by an edge if they represent configurations that could be consecutive in a schedule; that is, (u,v) and (u′,v′)will be joined by an edge in H if one of the pairs u,u′ or v,v′ are equal, and the other pair corresponds to an edge in G.
Why the need for larger graph H?
What does he mean by: The node set of H is the set of all possible configurations of the robots; that is, H consists of all possible pairs of nodes in G.
And what does he mean by: We join two nodes of H by an edge if they represent configurations that could be consecutive in a schedule; that is, (u,v) and (u′,v′) will be joined by an edge in H if one of the pairs u,u′ or v,v′ are equal, and the other pair corresponds to an edge in G.?
I do not have the book, but it seems from their answer that at each step they move one robot or the other. Assuming that, H consists of all possible pairs of nodes that are more than distance r apart. The nodes in H are adjacent if they can be reached by moving one robot or the other.
There are not enough details in your proposed algorithm to say anything about it.
Anyway I was thinking can't we just perform DFS/BFS to find a path from node a to c, & from b to d. then we modify the DFS/BFS Algorithm so that we keep checking at every movement if the robots are getting close to each other?
I don't think this would be possible. What you're proposing is to calculate the full path, and afterwards check if the given path could work. If not, how would you handle the situation so that when you rerun the algorithm, it won't find that pathological path? You could exclude that from the set of possible options, but I don't see think that'd be a good approach.
Suppose a path of length n, and now suppose that the pathology resides in the first step of the given path. Suppose now that this happens every time you recalculate the path. You would have to recalculate the path a lot of times just because the algorithm itself isn't aware of the restrictions needed to get to the right answer.
I think this is the point: the algorithm itself doesn't consider the problem's restrictions, and that is the main problem, because there's no easy way of correcting the given (wrong) solution.
What does he mean by: The node set of H is the set of all possible configurations of the robots; that is, H consists of all possible pairs of nodes in G.
What they mean by that is that each node in H represents each possible position of the two robots, which is the same as "all possible pairs of nodes in G".
E.g.: graph G has nodes A, B, C, D, E. H will have nodes AB, AC, AD, AE, BC, BD, BE, CD, CE, DE (consider AB = BA for further analysis).
Let the two robots be named r1 and r2, they start at nodes A and B (given info in the question), so the path will start in node AB in graph H. Next, the possibilities are:
r1 moves to a neighbor node from A
r2 moves to a neighbor node from B
(...repeat for each step unitl r1 and r2 each reach its destination).
All these possible positions of the two robots at the same time are the configurations the answer talks about.
And what does he mean by: We join two nodes of H by an edge if they represent configurations that could be consecutive in a schedule; that is, (u,v) and (u′,v′) will be joined by an edge in H if one of the pairs u,u′ or v,v′ are equal, and the other pair corresponds to an edge in G.?
Let's look at the possibilities from what they state here:
(u,v) and (u′,v′) will be joined by an edge in H if one of the pairs u,u′ or v,v′ are equal, and the other pair corresponds to an edge in G.
The possibilities are:
(u,v) and (u,w) / (v,w) is and edge in E. In this case r2 moves to one of the neighbors from its current node.
(u,v) and (w,v) / (u,w) is and edge in E. In this case r1 moves to one of the neighbors from its current node.
This solution was a bit tricky to me too at first. But after reading it several times and drawing some examples, when I finally bumped into your question, the way you separated each part of the problem then helped me to fully understand each part of the solution. So, a big thanks to you for this question!
Hope it's clearer now for anyone stuck with this problem!

What is the algorithm for generating a random Deterministic Finite Automata?

The DFA must have the following four properties:
The DFA has N nodes
Each node has 2 outgoing transitions.
Each node is reachable from every other node.
The DFA is chosen with perfectly uniform randomness from all possibilities
This is what I have so far:
Start with a collection of N nodes.
Choose a node that has not already been chosen.
Connect its output to 2 other randomly selected nodes
Label one transition 1 and the other transition 0.
Go to 2, unless all nodes have been chosen.
Determine if there is a node with no incoming connections.
If so, steal an incoming connection from a node with more than 1 incoming connection.
Go to 6, unless there are no nodes with no incoming connections
However, this is algorithm is not correct. Consider the graph where node 1 has its two connections going to node 2 (and vice versa), while node 3 has its two connection going to node 4 (and vice versa). That is something like:
1 <==> 2
3 <==> 4
Where, by <==> I mean two outgoing connections both ways (so a total of 4 connections). This seems to form 2 cliques, which means that not every state is reachable from every other state.
Does anyone know how to complete the algorithm? Or, does anyone know another algorithm? I seem to vaguely recall that a binary tree can be used to construct this, but I am not sure about that.
Strong connectivity is a difficult constraint. Let's generate uniform random surjective transition functions and then test them with e.g. Tarjan's linear-time SCC algorithm until we get one that's strongly connected. This process has the right distribution, but it's not clear that it's efficient; my researcher's intuition is that the limiting probability of strong connectivity is less than 1 but greater than 0, which would imply only O(1) iterations are necessary in expectation.
Generating surjective transition functions is itself nontrivial. Unfortunately, without that constraint it is exponentially unlikely that every state has an incoming transition. Use the algorithm described in the answers to this question to sample a uniform random partition of {(1, a), (1, b), (2, a), (2, b), …, (N, a), (N, b)} with N parts. Permute the nodes randomly and assign them to parts.
For example, let N = 3 and suppose that the random partition is
{{(1, a), (2, a), (3, b)}, {(2, b)}, {(1, b), (3, a)}}.
We choose a random permutation 2, 3, 1 and derive a transition function
(1, a) |-> 2
(1, b) |-> 1
(2, a) |-> 2
(2, b) |-> 3
(3, a) |-> 1
(3, b) |-> 2
In what follows I'll use the basic terminology of graph theory.
You could:
Start with a directed graph with N vertices and no arcs.
Generate a random permutation of the N vertices to produce a random Hamiltonian cycle, and add it to the graph.
For each vertex add one outgoing arc to a randomly chosen vertex.
The result will satisfy all three requirements.
There is a expected running time O(n^{3/2}) algorithm.
If you generate a uniform random digraph with m vertices such that each vertex has k labelled out-arcs (a k-out digraph), then with high probability the largest SCC (strongly connected component) in this digraph is of size around c_k m, where c_k is a constant depending on k. Actually, there is about 1/\sqrt{m} probability that the size of this SCC is exactly c_k m (rounded to an integer).
So you can generate a uniform random 2-out digraph of size n/c_k, and check the size of the largest SCC. If its size is not exactly n, just try again until success. The expected number of trials needed is \sqrt{n}. And generating each digraph should be done in O(n) time. So in total the algorithm has expected running time O(n^{3/2}). See this paper for more details.
Just keep growing a set of nodes which are all reachable. Once they're all reachable, fill in the blanks.
Start with a set of N nodes called A.
Choose a node from A and put it in set B.
While there are nodes left in set A
Choose a node x from set A
Choose a node y from set B with less than two outgoing transitions.
Choose a node z from set B
Add a transition from y to x.
Add a transition from x to z
Move x to set B
For each node n in B
While n has less than two outgoing transitions
Choose a node m in B
Add a transition from n to m
Choose a node to be the start node.
Choose some number of nodes to be accepting nodes.
Every node in set B can reach every node in set B. As long as a node can be reached from a node in set B and that node can reach a node in set B, it can be added to the set.
The simplest way that I can think of is to (uniformly) generate a random DFA with N nodes and two outgoing edges per node, ignoring the other constraints, and then throw away any that are not strongly connected (which is easy to test using a strongly connected components algorithm). Generating uniform DFAs should be straightforward without the reachability constraint. The one thing that could be problematic performance-wise is how many DFAs you would need to skip before you found one with the reachability property. You should try this algorithm first, though, and see how long it ends up taking to generate an acceptable DFA.
We can start with a random number of states N1 between N and 2N.
Assume the initial state the as the state number 1.
For each state, for each character in the input alphabet we generate a random transition (between 1 and N1).
We take the connex automaton starting from the initial state. We check the number of states, and after few tries we get one with N states.
If we wish a minimal automaton too, remains only the assignment of final states, however there are great chances that a random assignment gets a minimal automaton as well.
The following references seem to be relevant to your question:
F. Bassino, J. David and C. Nicaud, Enumeration and random generation of possibly incomplete deterministic automata, Pure Mathematics and Applications 19 (2-3) (2009) 1-16.
F. Bassino and C. Nicaud. Enumeration and Random Generation of Accessible Automata. Theor. Comp. Sc.. 381 (2007) 86-104.
