Graph and permutation problem - algorithm

I have a graph (with nodes and edges) containing symmetry and a group of permutations to label the nodes so no edges are changed (automorphisms). Now I would like to determine for which nodes a permutation exchanges two equivalent (i.e. nodes with the same color or symmetry class) neighboring nodes.
When the nodes with equivalent neighbors stay the same, simply checking if the neighbors are exchanged in the permutation is enough. However, when the nodes with equivalent neighbors are also permuted (i.e. there are multiple nodes with the same color/symmetry class with the same equivalent neighbors), the problem becomes more complex.
Is there any known algorithm for such a problem?
Some remarks:
The graph has no coordinates, it's a topology only
An example:
Identity permutation: http://imagebin.ca/view/2vAOi0I.html
There are 384 automorphic permutations: http://pastebin.org/157954
Simple example where the permutation inverts nodes 5 & 23: http://imagebin.ca/view/myQCvZnp.html
As long as 5 and 23 remain in the same place it is easy to determine if they are inverted (compared to the identity permutation). However, when these points are also interchanged it becomes more difficult.
Difficult example, permutation 67: http://imagebin.ca/view/9gl-Wmzt.html

I don't think your problem is well-defined. Imagine the following graph:
1
/ \
/ \
2 3
/ \ / \
4 5 6 7
Now consider the two automorphisms that swap the two subtrees of 1.
automorphism A: 1<->1, 2<->3, 4<->6, and 5<->7
automorphism B: 1<->1, 2<->3, 4<->7, and 5<->6
Which one of these "inverts" the children of 2? Because 2 gets mapped to 3, we have to decide whether the natural correspondence is 4-6 and 5-7, or 4-7 and 5-6. But we have no information we can use to decide this fact.

Related

Algorithm for node assignment in graph

There are N nodes (1 ≤ N ≤ 2⋅10^5) and M (1 ≤ M ≤ 2⋅10^5) directed edges in a graph. Every node has an assigned number (an integer in the range 1...N) that we are trying to determine.
All nodes with a certain assigned number will have directed edges leading to other nodes with another certain assigned number. This also implies that if one node has multiple directed edges coming out of it, then the nodes that it leads to all have the same assigned number. We have to use this information to determine an assignment of numbers such that the number of distinct numbers among all nodes is maximized.
Because there are multiple possible answers, the output should be the assignment that minimizes the numbers assigned to nodes 1…N, in that order. Essentially the answer is the lexicographically smallest one.
Example:
In a graph of 9 nodes and 12 edges, here are the edges. For the two integers i and j on each line, there is a directed edge from i to j.
3 4
6 9
4 2
2 9
8 3
7 1
3 5
5 8
1 2
4 6
8 7
9 4
The correct assignment is that nodes 1, 4, 5 have the assigned number 1; nodes 2, 6, 8 have the assigned number 2; and nodes 3, 7, 9 have the assigned number 3. This makes sense because nodes 1, 4, 5 lead to nodes 2, 6, 8, which lead to nodes 3, 7, 9.
To solve this problem, I thought that you could create a graph with disconnected subgraphs each representing a group of nodes that have the same assigned number. To do this, I could simply scan through all the nodes, and if a node has multiple directed edges to other nodes, you should add them to your graph as a connected component. If some of the nodes were already in the graph, you could simply add edges in between the current components.
Then, for the rest of the nodes, you could find which nodes they have directed edges to, and somehow use that information to add them to your new graph.
Would this strategy work? If so, how can I properly implement the second portion of my algorithm?
EDIT 1: Earlier I interpreted the problem statement incorrectly; I have now posted the correct interpretation and my new way of approaching the problem.
EDIT 2: So once I go through all the nodes once, adding edges in the way I described above, I would determine the components for each node. Then I would iterate through the nodes again, this time making sure to add the rest of the edges into the graph recursively. For example, if a node with an assigned number has a directed edge to a node that hasn't been assigned a number, I can add that node to its designated component. I can also use Union Find to maintain the components.
While this will be fast enough, I'm worried that there may be errors - for example, when I do this recursive solution, it is possible that when a node is assigned a number, other nodes with assigned numbers that are connected to that node may not work with it. Basically, there would be a contradiction. I would have to come up with a solution for that.
For each node, print rand() % rand() + 1 and pray. With dedication, you might pass all cases.

Storing data for trees and graphs?

In most tree or graph problems i tried to solve,the input is generally the entire tree or graph structure in a node1->leafs or node1->adjacent nodes format.
Is there any list of commonly used structures to save this data in memory which later helps for the intended algorithm.For example:
Say i have a list of graph nodes like:
1 3 8 2 4.....# 1 is connected to 3 8 2 4...nodes
2 5 1 3... # 2 is connected to 5 1 3...nodes
3 1 2... #likewise
. ...
8 ......
so if i want to use the random contraction algorithm (in which i will have to contract edges say i contract 1 and 8..i use a multi-linked list structure in which each node on the adjacency list points to its corresponding row i.e.8 in the first line points to the 8th node.
Now the question,why i chose this structure to store data?
contracting is effectively making 1 and 8 one single entity,
so i read 1's adjacency list starting from 3 and go to 3rds adjacency list change 1 to 8 and next 8's row make 1 to 8 now go to 2's list change 1 to 8....and finally i append 1s list to 8 and remove duplicates..Yep,so finally 1 is deleted from graph after contracting 1 and 8
I want to know all the usually or rarely used structures for storing trees and graphs,if associated with algos the algo name as well?Thank You
One common way to store graphs is to use an n-by-n matrix, where n is the number of vertices in the graph. If you simply wanted to store the adjacency, if X is the matrix, then X[i][j] = 1 if vertex j is reachable from vertex i, and 0 otherwise. You could also store edge costs or edge capacities in this manner. The disadvantage is of course the amount of memory being used, O(n^2) instead of O(n+m) where m is the number of edges, but the advantage is O(1) lookup for every possible vertex pair.
Floyd's algorithm for solving the All Pairs Shortest Paths problem can naturally make use of such a matrix, as well as more complex sub-cubic algorithms for solving various graph paths problems that utilize faster matrix multiplication over a ring.

Using disjoint-set data structure, graph theory

I'm practicing solving programming problems in free time. This problem I spotted some time ago and still don't know how to solve it:
For a given undirected graph with n vertices and m edges (both less than 2 × 106)
I need to split its vertices into as many groups as possible, but with one
condition: each pair of vertices from different groups are connected by edge.
Each vertex is in exactly one group. At the end I need to know the size of
each group.
I was proud when I came up with this solution: consider complemented graph of the original graph and use Disjoint-set data structure for it. It gives us the right answer (not difficult to prove). But it's only theoretical solution. With given constraints it's very very bad, not optimal. But I believe this approach can be somehow smartly fixed. But how?
Can anyone help?
EDIT: for a graph with vertices from 1 to 7 and 16 edges:
1 3
1 4
1 5
2 3
3 4
4 5
4 7
4 6
5 6
6 7
2 4
2 7
2 5
3 5
3 7
1 7
we have 3 groups with sizes: 1, 2 and 4.
These groups are: {4}, {5,7}, {1,2,3,6} respectively. There are edges connecting each pair of vertices from different groups and we can't create more groups.
I think the only ingredient you're missing is how to deal with sparse graphs.
Let's think about this in terms of finding the biggest possible complete graph where the only operation I can do is group a set of nodes (say v_1, ..., v_k) together and give the new supernode edges only to those nodes u that were connected to all of v_1, ..., v_k.
If your graph has fewer than n^2/4 edges, randomly sample n node pairs, noting which pairs are not joined by an edge. Union-find is an easy way to code this up. Now rebuild the graph using as groups the sets you found by this random sampling. Recurse on this reduced graph. (I'm not quite sure how to analyse this step, but I believe each sample-rebuild cycle reduces the graph size by at least a constant factor with high probability, so this whole process takes near-linear time.)
Once you have a fairly dense graph (at least n^2/4 edges), you can convert to an adjacency matrix representation and do exactly what you were suggesting --- check all node pairs, do a union whenever you see that two nodes aren't joined by an edge, and read off the sets.

What is the algorithm for generating a random Deterministic Finite Automata?

The DFA must have the following four properties:
The DFA has N nodes
Each node has 2 outgoing transitions.
Each node is reachable from every other node.
The DFA is chosen with perfectly uniform randomness from all possibilities
This is what I have so far:
Start with a collection of N nodes.
Choose a node that has not already been chosen.
Connect its output to 2 other randomly selected nodes
Label one transition 1 and the other transition 0.
Go to 2, unless all nodes have been chosen.
Determine if there is a node with no incoming connections.
If so, steal an incoming connection from a node with more than 1 incoming connection.
Go to 6, unless there are no nodes with no incoming connections
However, this is algorithm is not correct. Consider the graph where node 1 has its two connections going to node 2 (and vice versa), while node 3 has its two connection going to node 4 (and vice versa). That is something like:
1 <==> 2
3 <==> 4
Where, by <==> I mean two outgoing connections both ways (so a total of 4 connections). This seems to form 2 cliques, which means that not every state is reachable from every other state.
Does anyone know how to complete the algorithm? Or, does anyone know another algorithm? I seem to vaguely recall that a binary tree can be used to construct this, but I am not sure about that.
Strong connectivity is a difficult constraint. Let's generate uniform random surjective transition functions and then test them with e.g. Tarjan's linear-time SCC algorithm until we get one that's strongly connected. This process has the right distribution, but it's not clear that it's efficient; my researcher's intuition is that the limiting probability of strong connectivity is less than 1 but greater than 0, which would imply only O(1) iterations are necessary in expectation.
Generating surjective transition functions is itself nontrivial. Unfortunately, without that constraint it is exponentially unlikely that every state has an incoming transition. Use the algorithm described in the answers to this question to sample a uniform random partition of {(1, a), (1, b), (2, a), (2, b), …, (N, a), (N, b)} with N parts. Permute the nodes randomly and assign them to parts.
For example, let N = 3 and suppose that the random partition is
{{(1, a), (2, a), (3, b)}, {(2, b)}, {(1, b), (3, a)}}.
We choose a random permutation 2, 3, 1 and derive a transition function
(1, a) |-> 2
(1, b) |-> 1
(2, a) |-> 2
(2, b) |-> 3
(3, a) |-> 1
(3, b) |-> 2
In what follows I'll use the basic terminology of graph theory.
You could:
Start with a directed graph with N vertices and no arcs.
Generate a random permutation of the N vertices to produce a random Hamiltonian cycle, and add it to the graph.
For each vertex add one outgoing arc to a randomly chosen vertex.
The result will satisfy all three requirements.
There is a expected running time O(n^{3/2}) algorithm.
If you generate a uniform random digraph with m vertices such that each vertex has k labelled out-arcs (a k-out digraph), then with high probability the largest SCC (strongly connected component) in this digraph is of size around c_k m, where c_k is a constant depending on k. Actually, there is about 1/\sqrt{m} probability that the size of this SCC is exactly c_k m (rounded to an integer).
So you can generate a uniform random 2-out digraph of size n/c_k, and check the size of the largest SCC. If its size is not exactly n, just try again until success. The expected number of trials needed is \sqrt{n}. And generating each digraph should be done in O(n) time. So in total the algorithm has expected running time O(n^{3/2}). See this paper for more details.
Just keep growing a set of nodes which are all reachable. Once they're all reachable, fill in the blanks.
Start with a set of N nodes called A.
Choose a node from A and put it in set B.
While there are nodes left in set A
Choose a node x from set A
Choose a node y from set B with less than two outgoing transitions.
Choose a node z from set B
Add a transition from y to x.
Add a transition from x to z
Move x to set B
For each node n in B
While n has less than two outgoing transitions
Choose a node m in B
Add a transition from n to m
Choose a node to be the start node.
Choose some number of nodes to be accepting nodes.
Every node in set B can reach every node in set B. As long as a node can be reached from a node in set B and that node can reach a node in set B, it can be added to the set.
The simplest way that I can think of is to (uniformly) generate a random DFA with N nodes and two outgoing edges per node, ignoring the other constraints, and then throw away any that are not strongly connected (which is easy to test using a strongly connected components algorithm). Generating uniform DFAs should be straightforward without the reachability constraint. The one thing that could be problematic performance-wise is how many DFAs you would need to skip before you found one with the reachability property. You should try this algorithm first, though, and see how long it ends up taking to generate an acceptable DFA.
We can start with a random number of states N1 between N and 2N.
Assume the initial state the as the state number 1.
For each state, for each character in the input alphabet we generate a random transition (between 1 and N1).
We take the connex automaton starting from the initial state. We check the number of states, and after few tries we get one with N states.
If we wish a minimal automaton too, remains only the assignment of final states, however there are great chances that a random assignment gets a minimal automaton as well.
The following references seem to be relevant to your question:
F. Bassino, J. David and C. Nicaud, Enumeration and random generation of possibly incomplete deterministic automata, Pure Mathematics and Applications 19 (2-3) (2009) 1-16.
F. Bassino and C. Nicaud. Enumeration and Random Generation of Accessible Automata. Theor. Comp. Sc.. 381 (2007) 86-104.

Algorithm/approximation for combined independent set/hamming distance

Input: Graph G
Output: several independent sets, so that the membership of a node to all independent sets is unique. A node therefore has no connections to any node in its own set. Here is an example path.
Since clarification was called for here another rephrasal:
Divide a given graph into sets so that
i can tell a node from all others by its membership in sets e.g. if node i is present only in set A no other node should be present in set A only
if node j is present in set A and B then no other node should be present in set A and B only. if the membership of nodes is coded by a bit pattern, then these bit patterns have hamming distance at least one
if two nodes are adjacent in the graph, they should not be present in the same set, hence be an independent set
Example:
B has no adjacent nodes
D=>A, A=>D
Solution:
A B /
/ B D
A has bit pattern 10 and no adjacent node in its set. B has bit pattern 11 and no adjacent node, D has 01
therefore all nodes have hamming distance at least 1 an no adjacent nodes => correct
Wrong, because D and A are connected:
A D B
/ D B
A has bit pattern 10 and D in its set, they are adjacent. B has bit pattern 11 and no adjacent node, D has 11 as has B, so there are two errors in this solution and therefore it is not accepted.
Of course this should be extended to more Sets as the number of Nodes in the Graph increases, since you need at least log(n) sets.
I already wrote a transformation into MAX-SAT, to use a sat-solver for this. but the number of clauses is just to big. A more direct approach would be nice. So far I have an approximation, but I would like an exact solution or at least a better approximation.
I have tried an approach where I used a particle swarm to optimize from an arbitrary solution towards a better one. However the running time is pretty awful and the results are far from great. I am looking for a dynamic algorithm or something, however i cannot fathom how to divide and conquer this problem.
Not a complete answer, and I don't know how useful it will be to you. But here goes:
The hamming distance strikes me as a red herring. Your problem statement says it must be at least 1 but it could be 1000. It suffices to say the bit encoding for each node's set memberships is unique.
Your problem statement doesn't spell it out, but your solution above suggests every node must be a member of at least 1 set. ie. a bit encoding of all 0's is not allowed for any node's set memberships.
Ignoring connected nodes for a moment, disjoint nodes are easy: Simply number them sequentially with an unused bit encoding. Save those for last.
Your example above uses directed edges, but again, that strikes me as a red herring. If A cannot be in the same set as D because A=>D, D cannot be in the same set as A regardless whether D=>A.
You mention needing at least log(N) sets. You will also have at most N sets. A fully connected graph (with (N^2-N)/2 undirected edges) will require N sets each containing a single node.
In fact, if your graph contains a fully connected simplex of M dimensions (M in 1..N-1) with M+1 vertices and (M^2+M)/2 undirected edges, you will require at least M+1 sets.
In your example above, you have one such simplex (M=1) with 2 vertices {A,D} and 1 (undirected) edge {(A,D)}.
It would seem that your problem boils down to finding the largest fully connected simplexes in your graph. Stated differently, you have a routing problem: How many dimensions do you need to route your edges so none cross? It doesn't sound like a very scalable problem.
The first large simplex found is easy. Every vertex node gets a new set with its own bit.
The disjoint nodes are easy. Once the connected nodes are dealt with, simply number the disjoint nodes sequentially skipping any previously used bit patterns. From your example above, since A and D take 01 and 10, the next available bit pattern for B is 11.
The tricky part then becomes how to fold any remaining simplexes as much as possible into the existing range before creating any new sets with new bits. When folding, one must use 2 or more bits (sets) for each node, and the bits (sets) must not intersect with the bits (sets) for any adjacent node.
Consider what happens to your example above when one adds another node, C, to the example:
If C connects directly to both A and D, then the initial problem becomes finding the 2-simplex with 3 vertices {A,C,D} and 3 edges {(A,c),(A,D),(C,D)}. Once A, C and D take the bit patterns 001, 010 and 100, the lowest available bit pattern for disjoint B is 011.
If, on the other hand, C connects directly A or D but not both, the graph has two 1-simplexes. Supposing we find the 1-simplex with vertices {A,D} first giving them the bit patterns 01 and 10, the problem then becomes how to fold C into that range. The only bit pattern with at least 2 bits is 11, but that intersects with whichever node C connects to so we have to create a new set and put C in it. At this point, the solution is similar to the one above.
If C is disjoint, either B or C will get the bit pattern 11 and the remaining one will need a new set and get the bit pattern 100.
Suppose C connects to B but not to A or D. Again, the graph has two 1-simplexes but this time disjoint. Suppose {A,D} is found first as above giving A and D the bit patterns 10 and 01. We can fold B or C into the existing range. The only available bit pattern in the range is 11 and either B or C could get that pattern as neither is adjacent to A or D. Once 11 is used, no bit patterns with 2 or more bits set remain, and we will have to create a new set for the remaining node giving it the bit pattern 100.
Suppose C connects to all 3 A, B and D. In this case, the graph has a 2-simplex with 3 vertexes {A,C,D} and a 1-simplex with 2 vertexes {B, C}. Proceeding as above, after processing the largest simplex, A, C and D will have bit patterns 001, 010, 100. For folding B into this range, the available bit patterns with 2 or more bits set are: 011, 101, 110 and 111. All of these except 101 intersect with C so B would get the bit pattern 101.
The question then becomes: How efficiently can you find the largest fully-connected simplexes?
If finding the largest fully connected simplex is too expensive, one could put an approximate upper bound on potential fully connected simplexes by finding maximal minimums in terms of connections:
Sweep through the edges updating the
vertices with a count of the
connecting edges.
For each connected node, create an array of Cn counts initially zero
where Cn is the count of edges
connected to the node n.
Sweep through the edges again, for the connected nodes n1 and n2,
increment the count in n1
corresponding to Cn2 and vice versa.
If Cn2 > Cn1, update the last count
in the n1 array and vice versa.
Sweep through the connected nodes again, calculating an upper bound on
the largest simplex each node could
be a part of. You could build a pigeon-hole array with a list of vertices
for each upper bound as you sweep through the nodes.
Work through the pigeon-hole array from largest to smallest extracting and
folding nodes into unique sets.
If your nodes are in a set N and your edges in a set E, the complexity will be:
O(|N|+|E|+O(Step 5))
If the above approximation suffices, the question becomes: How efficiently can you fold nodes into existing ranges given the requirements?
This maybe not the answer you might expect, but I can't find a place to add a comment. So I type it directly here. I can't fully understand your question. Or does it need specific knowledge to understand? What is this independent set? As I know a node in an independent set from a directed graph have a two way path to any other node in this set. Is your notion the same?
If this problem is like what I assume, independent sets can be found by this algorithm:
1. do depth-first search on the directed graph, records the time of tree rooted by this node is traversed.
2. then reverse all the edges in this graph
3. do depth-frist search again on the modified graph.
The algorihtm is precisely explained by book "introduction to alogrithm"

Resources