Matching non-isomorphic graphs

Matching non-isomorphic graphs - algorithm

I have two graphs G1 and G2, which are not isomorphic. I need to make a new graph G1' such that, with the minimum changes in G1, it will have the nodes of both G1 as well as G2. For example, let's say there is a node n1 in G1 with three connecting nodes n11, n12, n13. If now a 'corresponding' node n2 in G2 has 5 nodes n21, n22, n23, n24, n25, then n1' in G1' also needs to have five nodes n11', n12', n13', n14', n15'. The first three copied from G1 and the two extra nodes which will have value of the last of the three. The tree emanating from the extra nodes will be either entirely newly created or will comprise some extra nodes from G1 that haven't got equivalent nodes in G2 (are not 'exhausted' in some sense).
The problems are 1) finding the most suitable seed as the starting point so that the starting views are as much similar as possible 2) Building the tree from the extra nodes keeping the added node count to the minimum
Edit:
I will try to explain this further with the help of an illustration. My knowledge of graph theory is very superficial, so please excuse me if something sounds silly.
I broadly want to obtain a graph which, with the minimum number of node manipulations, can take the form of one of the two non-isomorphic source graphs.
In the example above graph G' can take two forms G or H with some amount pf node shuffling.
1) To make it G, we keep all the orange nodes at their position. The dotted nodes will 'merge' to their neighboring nodes. So B21' will have the value of A21 and will be at the same position (dissolving the corresponding edges). Likewise will happen with the pairs B31'-A31, B14'-A15 B25'-B23, A32'-A22 and A23'-A32. With this configuration the graph would resembles G completely, without any edges 'sticking out'
2) To make it isomorphic with H, A11 and A12, will take the values of A13, A32 and A32' that of A23, A23' that of A22. The dotted nodes will 'come out' of their merged positions.
The problem is to find G'. Maybe there is no ready graph operation or the solution is impossible, but any pointer to achieve this with any degree of approximation and efficiency is most welcome.
NB: The starting nodes A1 and B1 are arbitrary. First half of the problem is identifying these nodes so that the views are as much similar as possible.

This is at least as hard as the graph isomorphism problem which is currently not known to be solvable in polynomial time. As such, you should not expect to be able to find an efficient algorithm for your problem.
The correspondence is straightforward to see because if G1 and G2 were in fact isomorphic, you would have G1' = G1, so an algorithm which solves this problem could be used to solve the graph isomorphism problem.

Related

Graph theory: best algorithm to find combination of edges “directions”, where each node has at most one edge directed to it

I’m dealing with a graph where there are a certain number of nodes, and there are predefined connections between them which don’t have “directions” yet.
Problem is to give all the edges a direction (ex. If there’s a connection between A And B, give this edge the A->B direction, or B->A), in a way that no node is at the receiving end of more than one edge.
Examples:
For this model (A-B-C), A->B->C works, but A->B<-C does not work, as B is at the receiving end of more than one connection. Although A<-B->C works, as B is on the giving end of both of its connections.
I’ve tried loop detection, but the fact that these nodes can be arbitrarily connected to one another, there can be numerous loops which may or may not be directly attached to each other, I could not find a solution to make use of the information.
Number of nodes can be north of thousands, and connections can be many hundreds in my case. This also rules out brute force.
It is not guaranteed that there will be a definite solution, the aim of the algorithm is to find a combination where there’s the least number of connections causing nodes to have more than one edge pointing to them.

Not a complete algorithm, but given your description of the problem in the comments, I feel like these steps will probably bring the problem back into the brute-forcible range.
First, you should "trim" your graph. Any nodes of degree one should be pruned, with their connected edge being directed at the pruned node. Since no other edge can point to that node, we know that this choice is optimal. Rinse and repeat until all nodes remaining have two or more edges.
Next, as you mentioned, you should exclude any isolated nodes. You can actually extend this up to connected components of size <= 3. This is because for up to three nodes, your number of edges cannot exceed the number of nodes, so you can randomly assign one edge, and the rest will fall into place.
Now, what will remain are a bunch of large, highly-connected, connected components. You could actually do one more check and see if any of these form a single cycle (all nodes degree two) and then assign one edge randomly, but this is probably a fairly rare case. You'll probably just want to start brute forcing each of these independently. It'd probably be best to start from the nodes with the smallest number of edges first, updating the degree of nodes as you assign edges (and also pruning any degree one edges as before), backtracking as necessary.

This is a continuation of the answer by Dillon Davis.
After tree-like branches are removed, and simple cycles are resolved, the remaining graph has nodes of degree 2 or more. I propose that (for the purposes of analyzing the graph) all of the nodes of degree 2 can be removed.
Allow me to explain by example. In this example, when a node is represented by a number, that number is the degree of the node. When a node is represented by a letter, that node has degree 2. So the graph
3 - A - B - C - 4
represents a node of degree 3, connected to a chain of nodes of degree 2, connected to a node of degree 4.
The two ideal choices for this section of the graph are
3 -> A -> B -> C -> 4
3 <- A <- B <- C <- 4
These are ideal in the sense that each lettered node has exactly one incoming edge. I propose that these aren't just ideal choices, they are the only choices. Consider the first ideal solution
3 -> A -> B -> C -> 4
If node 4 has too many incoming edges, we can reduce its count by reversing the edge to C, giving
3 -> A -> B -> C <- 4
But that hasn't improved the situation, it trades "too many edges into 4" with "too many edges into C". Subsequently reversing the edge between C and B resolves C, but breaks B. Keep reversing along the chain and eventually the connection between A and 3 is reversed, and we've arrived at the second ideal solution.
Which leads me to conclude that (for the purposes of analysis)
3 - A - B - C - 4
is equivalent to
3 - 4
So how is this useful in simplifying the problem. Consider the following graph:
When nodes A and B are removed, the remaining edge connects the top node 3 to itself, so that edge can be removed. Likewise for C and D. Which leaves a graph with a single edge. Choose either direction for that edge. Then complete the solution by choosing a direction for the simple cycle A-B-3, and independently choose a direction for the simple cycle C-D-3.
Here's another example:
In this case, removing A and B creates redundant edges between the remaining nodes. After removing the redundant edges, choose either direction for the edge. The direction of that edge determines the direction of the cycle 3-A-3, and cycle 3-B-3.

I wasn't sure about adding another answer, but the answer by user3386109 gave me insight into what I believe is the complete solution, and I felt that it differs too drastically from the spirit of my original answer to include as an edit.
To recap, we have a few tools under out belt:
We can prune nodes with a single edge optimally, repeating the process to completion
We can assign a direction to any edge in a simple cycle (connected components with only nodes of degree 2) and the rest will follow (optimally).
Nodes with two edges in more complex cycles can be temporarily ignored, as their edge directions will be assigned by higher degree nodes.
After reading the last point, the problem itself becomes a bit more clear. Once we have pruned the degree one nodes in bullet one, all remaining nodes have at least two edges. We can say for certain in the optimal graph that each of these nodes will have at least one directional edge pointing to them. As proof, since each node has at least two edges, but the connected component is not a simple cycle (else it would be eliminated in bullet 2), we have more edges than nodes. If any node has zero edges directed towards it, one of those edges could be reversed to reduce the number of conflicting edges, or to "free up" another node to have zero inward edges, to then do the same.
Armed with this knowledge, we know that the minimal number of conflicts (extra edges directed at nodes that already have an edge directed at them) equals the number of edges minus the number of vertices in our pruned graph. We can also conclude that as long as we manage to direct at least one edge to each node, we'll have an optimal graph, regardless of how we scatter the conflicting edges.
Originally I tried to draft an algorithm based on bullet three to accomplish this assignment, but it turns out the answer is actually a lot simpler than that even. The only way we can accidentally create a node with no edges directed away from it is by actively directing all edges away from that node. The solution is to pick a single edge in the connected component, and assign it a direction at random. Then, do a search (DFS, BFS, anything) outward from the node its directed at, assigning directions to the edges as you go, in the direction you that traverse them. Any node you reach will have an edge directed at it (the edge you took to reach it), and the root node has the edge you manually assigned to it.
In the end, this will produce a graph with the minimal number of extra edges directed at nodes. If you instead wish to minimize the number of nodes containing conflicting edges, solve the problem as stated above, and then form a subgraph of the nodes of degree three or more and their connecting edges. Solve for the minimal vertex cover of this subgraph, and then reverse the direction of the edges connecting nodes not in the minimal vertex cover yet containing conflicting edges, with those of the corresponding node in the minimal vertex cover.

Which algorithm should match this specific Graph

specific question here. Suppose you have a graph where each vertice specifies how many connections they must have to another vertices and the following rules/properties apply:
1- The graph can be incomplete (no need to every vertice to have a connection with every other)
2- There can be two connections between two vertices only if they are in opposite directions (e.g: A points do B, B points to A).
3- Suppose they are on a 2D plane, there can be no crossing of connections (not even tangents).
4- Theres no interest for the shortest path, just respecting the properties and knowing if the solution is unique or not.
5- There can be no possible solution
EDIT: Alright guys sorry for not being specific. I'll try to clarify my point here: what I want to do is given a number of vertices, know if a graph is connected (if all the points have at least a connection to the graph). The vertices given can be impossible to make a graph of it so I want to know if there's is a solution, if the solution is unique or not or (worst case scenario) if there is no possible solution. I think that clarifies point 4 and 5. The graph is undirected, the connections can Not curve, only straight lines.The Nodes (vertices) are fixed, we have their position from or W/E input. I wanted to know the best approach and I've been researching and it is a connectivity problem, though maybe some specific alg may be more efficient doing this task. That's all, sorry for late reply
EDIT2: Alright guys would the problem be different if we think that each vertice is on a row and column of a plane matrix and they can only connect with other Vertices on the same column or row? So it would be just 90/180/270/360 straight connections. This would hugely shorten the possibilities right?

I am going to assume that the question is: Given the degree of each vertex, work out a graph that passes all the constraints given.
I think you can reduce this to a very large integer programming problem - linear constraints, but with the variables required to be integers (in fact either 0 or 1), which makes the problem much more difficult than ordinary linear programming.
Let the unknowns be of the form Xij, where Xij is 1 if there is an edge from node i to node j, and 0 otherwise. The requirements on the number of connections then amount to requirements of the form SUM_{all i}Xij = K for some K dependent on the requirement. The requirement that the graph is planar reduces to the requirement that the graph not contain two known graphs as subgraphs - https://en.wikipedia.org/wiki/Graph_minor. Each possible subgraph then produces a constraint such as X01 + X02 + ... < 5 - there will be a huge number of these constraints - so large that for large number of nodes simply producing all the constraints may be too expensive to be practical, let alone solving them. The number of constraints goes up as at least the 6th power of the number of nodes. However this is polynomial, so theoretically practical to write down the MIP to be solved - so perhaps this is better than no algorithm at all.

Assuming that you are asking us to:
Find out if it is possible to generate one-or-more directed planar graphs such that each vertex has a given out-degree (not necessarily the same out-degree per vertex).
Let's also assume that you want the graph to be connected.
If there are n vertices and the vertices have degrees d_1 ... d_n then for vertex i there are C(n-1,d_i) = (n-1)!/((d_i)!*(n-1-d_i)!) possible combinations of out-edges from that vertex. Taking the product of all these combinations over all the vertices will give you the upper bound on the number of possible graphs.
The naive approach is:
Generate all possible graphs.
Filter the graphs to only have connected graphs.
Run a planarity test on the graph to determine if it is planar (you can consider the graph to be undirected in this step); discard if it isn't.
Profit!

Graph Theory Algorithm

The given problem is
Given a forest with n vertices, add edges to make it into a tree with
minimal diameter.
I tried many approaches but none of them passed system test cases.Please suggest some algorithm to solve this problem.
This is the link of the editorial ncpc.idi.ntnu.no/ncpc2015/ncpc2015slides.pdf The problem name is Adjoin the Networks. I am not able to understand the solution provided in the editorial
Update:
https://www.quora.com/What-is-the-solution-for-Dreaming-on-IOI-2013
This link provides the best explanation for the solution mentioned in the editorial

The eccentricity of a vertex v, denoted ecc(v), is defined as ecc(v):=max_u d(u,v), i.e. as the distance to a most distant vertex in the graph. A center of a graphG is any vertex v for which ecc(v)=min_v max_u d(u,v), i.e. a center is a vertex that minimizes the eccentricity.
If you merge two trees (from different connected components), T1 and T2, by putting an edge between their centers c1 and c2, you get a tree T with diam T = max(diam T1, diam T2, 1+rad(T1)+rad(T2)).
The correctness of the approach below should be evident from these properties.
Here's one idea for the algorithm, off the top of my head:
let T1, T2, ..., Tk be the trees comprising the forest.
compute a center vertex ci for each of the trees Ti.
connect components by putting edges between centers in an intelligent way.
Of course, the problem is now how to cleverly solve the last bullet. Intuitively I'd suggest you connect the treest with the largest diameters first (and then update the diameter of the new tree and compute a center of the new tree). Perhaps something like this:
while the priority queue contains more than one tree do
let T1 and T2 be the trees with the largest diameters; let c1 and c2 be their centers;
connect c1 and c2 to form a new tree T;
compute a new center c of T, compute diam T and put T back into priority queue (which can be a max-heap that uses diameter as the key).
done
Update. I'm not sure whether to join largest-diameter trees first or the other way around (i.e. smallest-diameter trees first). But it's now very easy to do a sketch of a proof (once you figure out which way to go) that this is the right way to go.
Update. The math certainly goes through if you connect largest first (as suggested in the PDF).

What is meant by the set of all possible configuration in a given graph G

I'm trying to understand a Solved exercise 2, Chapter 3 - Algorithm design by tardos.
But i'm not getting the idea of the answer.
In short the question is
We are given two robots located at node a & node b. The robots need to travel to node c and d respectively. The problem is if one of the nodes gets close to each other. "Let's assume the distance is r <= 1 so that if they become close to each other by one node or less" they will have an interference problem, So they won't be able to transmit data to the base station.
The answer is quite long and it does not make any sense to me or I'm not getting its idea.
Anyway I was thinking can't we just perform DFS/BFS to find a path from node a to c, & from b to d. then we modify the DFS/BFS Algorithm so that we keep checking at every movement if the robots are getting close to each other?
Since it's required to solve this problem in polynomial time, I don't think this modification to any of the algorithm "BFS/DFS" will consume a lot of time.
The solution is "From the book"
This problem can be tricky to think about if we view things at the level of the underlying graph G: for a given configuration of the robots—that is, the current location of each one—it’s not clear what rule we should be using to decide how to move one of the robots next. So instead we apply an idea that can be very useful for situations in which we’re trying to perform this type of search. We observe that our problem looks a lot like a path-finding problem, not in the original graph G but in the space of all possible configurations.
Let us define the following (larger) graph H. The node set of H is the set of all possible configurations of the robots; that is, H consists of all possible pairs of nodes in G. We join two nodes of H by an edge if they represent configurations that could be consecutive in a schedule; that is, (u,v) and (u′,v′)will be joined by an edge in H if one of the pairs u,u′ or v,v′ are equal, and the other pair corresponds to an edge in G.
Why the need for larger graph H?
What does he mean by: The node set of H is the set of all possible configurations of the robots; that is, H consists of all possible pairs of nodes in G.
And what does he mean by: We join two nodes of H by an edge if they represent configurations that could be consecutive in a schedule; that is, (u,v) and (u′,v′) will be joined by an edge in H if one of the pairs u,u′ or v,v′ are equal, and the other pair corresponds to an edge in G.?

I do not have the book, but it seems from their answer that at each step they move one robot or the other. Assuming that, H consists of all possible pairs of nodes that are more than distance r apart. The nodes in H are adjacent if they can be reached by moving one robot or the other.
There are not enough details in your proposed algorithm to say anything about it.

Anyway I was thinking can't we just perform DFS/BFS to find a path from node a to c, & from b to d. then we modify the DFS/BFS Algorithm so that we keep checking at every movement if the robots are getting close to each other?
I don't think this would be possible. What you're proposing is to calculate the full path, and afterwards check if the given path could work. If not, how would you handle the situation so that when you rerun the algorithm, it won't find that pathological path? You could exclude that from the set of possible options, but I don't see think that'd be a good approach.
Suppose a path of length n, and now suppose that the pathology resides in the first step of the given path. Suppose now that this happens every time you recalculate the path. You would have to recalculate the path a lot of times just because the algorithm itself isn't aware of the restrictions needed to get to the right answer.
I think this is the point: the algorithm itself doesn't consider the problem's restrictions, and that is the main problem, because there's no easy way of correcting the given (wrong) solution.
What does he mean by: The node set of H is the set of all possible configurations of the robots; that is, H consists of all possible pairs of nodes in G.
What they mean by that is that each node in H represents each possible position of the two robots, which is the same as "all possible pairs of nodes in G".
E.g.: graph G has nodes A, B, C, D, E. H will have nodes AB, AC, AD, AE, BC, BD, BE, CD, CE, DE (consider AB = BA for further analysis).
Let the two robots be named r1 and r2, they start at nodes A and B (given info in the question), so the path will start in node AB in graph H. Next, the possibilities are:
r1 moves to a neighbor node from A
r2 moves to a neighbor node from B
(...repeat for each step unitl r1 and r2 each reach its destination).
All these possible positions of the two robots at the same time are the configurations the answer talks about.
And what does he mean by: We join two nodes of H by an edge if they represent configurations that could be consecutive in a schedule; that is, (u,v) and (u′,v′) will be joined by an edge in H if one of the pairs u,u′ or v,v′ are equal, and the other pair corresponds to an edge in G.?
Let's look at the possibilities from what they state here:
(u,v) and (u′,v′) will be joined by an edge in H if one of the pairs u,u′ or v,v′ are equal, and the other pair corresponds to an edge in G.
The possibilities are:
(u,v) and (u,w) / (v,w) is and edge in E. In this case r2 moves to one of the neighbors from its current node.
(u,v) and (w,v) / (u,w) is and edge in E. In this case r1 moves to one of the neighbors from its current node.
This solution was a bit tricky to me too at first. But after reading it several times and drawing some examples, when I finally bumped into your question, the way you separated each part of the problem then helped me to fully understand each part of the solution. So, a big thanks to you for this question!
Hope it's clearer now for anyone stuck with this problem!

How can I find Maximum Common Subgraph of two graphs?

Hi i need a help in finding a graph algorithm
im working on the following equation related to distance functions
d (g1, g2) = 1- │mcs(g1,g2) │ /
│g1│+│g2│-│mcs (g1, g2) │
Where
d (g1,g2) : is a distance function based on maximum common sub graph
.
g1, g2 are two graphs .
mcs (g1,g2): is the maximum common sub graph of two graphs g1,g2
where mcs is the largest graph (by some measure involving the number
of nodes and edges )contained in both subject graphs .
│g1│: Cardinality of the common induced sub graph g1
│g2│: Cardinality of the common induced sub graph g2
My Question: How can I calculate MCS?
I searched the internet but most of the algorithms are complicated anyone know from where i can get a simple algorithm to program this equation in matlab.

The problem is NP-Complete1.
The reduction from the Clique Problem2. Given an instance of Clique Problem - a graph G=(V,E), create a complete clique G'=(V,E') such that E' = {(u,v) | u != v, for each u,v in V).
The solution to the maximal clique problem is the same solution for the maximal subgraph problem for G and G'. Since clique problem is NP-Hard, so does this problem.
Thus, there is no known polynomial solution to this problem.
If you are looking for an exact algorithm, you could try exhaustive search approach and/or a branch & bound approach to solve it. Sorry for the bad news, but at least you know not to look for something that (probably) doesn't exist (unless P=NP, of course)
EDIT: exponential brute force solution to the problem:
You can just check all possible subsets, and check if it is a feasible solution.
Pseudo Code:
findMCS(vertices,G1,G2,currentSubset):
if vertices is empty: //base clause, no more candidates to check
if isCommonSubgraph(G1,G2,currentSubset):
return clone(currentSubset)
else:
return {}
v <- vertices.pop() //take a look at the first element
cand1 <- findMCS(vertices,G1,G2,currentSubset) //find MCS if it is NOT in the subset
currentSubset.append(v)
if isCommonSubgrah(G1,G2,currentSubset): //find MCS if it is in the subset
cand2 <- findMCS(vertices,G1,G2,currentSubset)
currentSubset.remvoe(v) //clean up environment before getting back from recursive call
return (|cand1| > |cand2| ? cand1 : cand2) //return the maximal subset from all candidates
Complexity of the above is O(2^n) (checking all possible subsets), and invoke it with: findMCS(G1.vertices, G1, G2, []) (where [] is an empty list).
Note:
isCommonSubgrah(G1,G2,currentSubset) is an easy to calculate method that just answers true if and only if currentSubset is a common subgraph of G1 and G2.
|cand1| and |cand2| is the sizes of these lists.
(1)Assuming that Maximum sub graph is a subset U in V such that for each u1,u2 in U (u1,u2) is in E1 if and only if (u1,u2) is in E2 (intuitively, a maximal subset of the vertices that share the exact same edges in the two graphs)
(2) Clique Problem: Given an instance of G=(V,E) find maximal subset U in V such that for each u1,u2 in U : u1 = u2 or (u1,u2) is in E.

The backtracking search algorithm proposed by James J. McGregor may be utilized to identify the MCS between two graphs.

You can't even check if one graph is a subgraph of the other one, it's the subgraph isomorphism problem known to be NP-complete. Hereby you can't find the maximal subgraph because you can't check the isomorphism property (in polynomial time).

The main problem is finding a correspondence between nodes in the original graphs (essentially a renumbering of the vertices). For instance, if we have node p in graph g1 and node q in graph g2 where p and q are equivalent, we'd like to map them to a node s in the common subgraph, c.
The reason that the Clique Problem is so difficult is that, without any way of checking whether two nodes in different graphs actually refer to the same node, we have to try all possible combinations of pairs of nodes and check if each pair is consistent and represents the "best" correspondence.
Since the nodes in these graphs represent geographic locations, we should be able to come up with a reasonable distance metric that tells us how likely it is that a node in one graph is the same as any node in the other graph. Since the GPS coordinates of the two nodes are probably not identical, we need to make some assumptions based on the problem.
If we have a map of the region in which the data points occur, represented as a graph m, we can renumber or rename the nodes in g1 and g2 to correspond to their closest equivalent in in m.
Distance (either between the original graphs and m or between points in g1 and g2) can either be the Euclidean distance or the Manhattan distance, depending on what makes more sense for your graphs.
You'll have to be careful in deciding how far apart two nodes can be and still be considered equivalent. Too small and you won't get any matches; too large and your entire graph could be condensed into one node.
Two or more nodes in an original graph could possible all map to the same node in c. If the location data is updated frequently in relation to the distance between nodes, for instance.
Conversely, an edge between a pair of successive nodes in an original graph could also map to a path containing multiple edges if the update frequency is low in relation to the distances. So you'll have to figure out whether it makes sense to introduce these intermediate nodes into the common graph or treat the whole path as a single edge.
Once you've got the renumbering of the nodes you can use the method that Jens suggests to find the intersection of the renumbered graphs. This is all very general since I don't have a lot of details about your specific problem, but hopefully it's enough to get you started.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio