Finding equal subgraphs - algorithm

Given:
a directed Graph
Nodes have labels
the same label can appear more than once
edges don't have labels
I want to find the set of largest (connected) subgraphs which are equal taking the labels of the nodes into account.
The graph could be huge (millions of nodes) does anyone know an efficient solution for this?
I'm looking for algorithm and ideally a Java implementation.
Update: Since this problem is most likely NP-complete. I would also be interested in an algorithm that produces an approximated solution.
This seems to be close at least:
Frequent Subgraphs

I strongly suspect that's NP-hard.
Even if all the labels are the same that's at least as hard as graph isomorphism. (Join the two graphs together as a single disconnected graph; are the largest equal subgraphs the two original graphs?)
If identical labels are relatively rare it might be tractable.

Related

How to build a Minimum Spanning Tree given a list of 200 000 nodes?

Problem
I have a list of approximatly 200000 nodes that represent lat/lon position in a city and I have to compute the Minimum Spanning Tree. I know that I need to use Prim algorithm but first of all I need a connected graph. (We can assume that those nodes are in a Euclidian plan)
To build this connected graph I thought firstly to compute the complete graph but (205000*(205000-1)/2 is around 19 billions edges and I can't handle that.
Options
Then I came across to Delaunay triangulation: with the fact that if I build this "Delauney graph", it contains a sub graph that is the Minimum Spanning Tree according and I have a total of around 600000 edges according to Wikipedia [..]it has at most 3n-6 edges. So it may be a good starting point for a Minimum Spanning Tree algorithm.
Another options is to build an approximately connected graph but with that I will maybe miss important edges that will influence my Minimum Spanning Tree.
My question
Is Delaunay a reliable solution in this case? If so, is there any other reliable solution than delaunay triangulation to this problem ?
Further information: this problem has to be solved in C.
The Delaunay triangulation of a point set is always a superset of the EMST of these points. So it is absolutely "reliable"*. And recommended, as it has a size linear in the number of points and can be efficiently built.
*When there are cocircular point quadruples, neither the triangulation nor the EMST are uniquely defined, but this is usually harmless.
There's a big question here of what libraries you have access to and how much you trust yourself as a coder. (I'm assuming the fact that you're new on SO should not be taken as a measure of your overall experience as a programmer - if it is, well, RIP.)
If we assume you don't have access to Delaunay and can't implement it yourself, minimum spanning trees algorithms that pre-suppose a graph aren't necessarily off limits to you. You can have the complete graph conceptually but not actually. Kruskal's algorithm, for instance, assumes you have a sorted list of all edges in your graph; most of your edges will not be near the minimum, and you do not have to compare all n^2 to find the minimum.
You can find minimum edges quickly by estimations that give you a reduced set, then refinement. For instance, if you divide your graph into a 100*100 grid, for any point p in the graph, points in the same grid square as p are guaranteed to be closer than points three or more squares away. This gives a much smaller set of points you have to compare to safely know you've found the closest.
It still won't be easy, but that might be easier than Delaunay.

Which algorithm should match this specific Graph

specific question here. Suppose you have a graph where each vertice specifies how many connections they must have to another vertices and the following rules/properties apply:
1- The graph can be incomplete (no need to every vertice to have a connection with every other)
2- There can be two connections between two vertices only if they are in opposite directions (e.g: A points do B, B points to A).
3- Suppose they are on a 2D plane, there can be no crossing of connections (not even tangents).
4- Theres no interest for the shortest path, just respecting the properties and knowing if the solution is unique or not.
5- There can be no possible solution
EDIT: Alright guys sorry for not being specific. I'll try to clarify my point here: what I want to do is given a number of vertices, know if a graph is connected (if all the points have at least a connection to the graph). The vertices given can be impossible to make a graph of it so I want to know if there's is a solution, if the solution is unique or not or (worst case scenario) if there is no possible solution. I think that clarifies point 4 and 5. The graph is undirected, the connections can Not curve, only straight lines.The Nodes (vertices) are fixed, we have their position from or W/E input. I wanted to know the best approach and I've been researching and it is a connectivity problem, though maybe some specific alg may be more efficient doing this task. That's all, sorry for late reply
EDIT2: Alright guys would the problem be different if we think that each vertice is on a row and column of a plane matrix and they can only connect with other Vertices on the same column or row? So it would be just 90/180/270/360 straight connections. This would hugely shorten the possibilities right?
I am going to assume that the question is: Given the degree of each vertex, work out a graph that passes all the constraints given.
I think you can reduce this to a very large integer programming problem - linear constraints, but with the variables required to be integers (in fact either 0 or 1), which makes the problem much more difficult than ordinary linear programming.
Let the unknowns be of the form Xij, where Xij is 1 if there is an edge from node i to node j, and 0 otherwise. The requirements on the number of connections then amount to requirements of the form SUM_{all i}Xij = K for some K dependent on the requirement. The requirement that the graph is planar reduces to the requirement that the graph not contain two known graphs as subgraphs - https://en.wikipedia.org/wiki/Graph_minor. Each possible subgraph then produces a constraint such as X01 + X02 + ... < 5 - there will be a huge number of these constraints - so large that for large number of nodes simply producing all the constraints may be too expensive to be practical, let alone solving them. The number of constraints goes up as at least the 6th power of the number of nodes. However this is polynomial, so theoretically practical to write down the MIP to be solved - so perhaps this is better than no algorithm at all.
Assuming that you are asking us to:
Find out if it is possible to generate one-or-more directed planar graphs such that each vertex has a given out-degree (not necessarily the same out-degree per vertex).
Let's also assume that you want the graph to be connected.
If there are n vertices and the vertices have degrees d_1 ... d_n then for vertex i there are C(n-1,d_i) = (n-1)!/((d_i)!*(n-1-d_i)!) possible combinations of out-edges from that vertex. Taking the product of all these combinations over all the vertices will give you the upper bound on the number of possible graphs.
The naive approach is:
Generate all possible graphs.
Filter the graphs to only have connected graphs.
Run a planarity test on the graph to determine if it is planar (you can consider the graph to be undirected in this step); discard if it isn't.
Profit!

Rearranging a graph so that certain nodes are not adjacent?

EDIT: Precisely, I am trying to find two disjoint independent sets of known size in a graph shaped like a triangular grid, which may have holes and has a variable perimeter shape.
I'm not very well versed in graph theory, so I'm not sure if there exists an efficient solution for this problem. Consider the following graphs:
The colors of any two nodes can be swapped. The goal is to ensure that no two red nodes are adjacent, and no two green nodes are adjacent. The edges marked with exclamation points are invalid. Basically, I need to write two algorithms:
Determine that the nodes in a given graph can be arranged so that red and green nodes are not adjacent to nodes of the same color.
Actually rearrange the nodes.
I'm a little lost on how to implement this. It's not too difficult to separate the nodes of one color, but repeating the process for the second color may mess up the first color. Without a way to determine whether the graph can actually be arranged properly, this process could loop forever.
Is there some kind of algorithm that I can use/write for this? I'm mainly interested in the first image's graph (a triangular grid), but a generic algorithm would work as well.
First, let's note that the problem is a variant of graph coloring.
Now, if you only dealing with 2 colors (red,green) - coloring a graph with 2 colors is fairly easy, and is basically done by finding out if the graph is bipartite, and coloring each "side" of the graph in one color. Finding if a graph is bipartite is fairly simple.
However, if you want more than two colors, the problem becomes NP-Complete, and is actually a variant of the Graph Coloring Problem.
Graph Coloring Problem:
Given a graph G=(V,E) and a number k determine if there is a
function c:V->{1,2.,,,.k} such that c(v) = v(u) -> (v,u) is not an
edge.
Informally, you can color the graph in k colors, and you need to determine if there is some coloring such that you never color 2 nodes that share an edge with the same color.
Note that while it seems your problem is slightly easier, since you already know what is the number of nodes in each color, it doesn't really make a difference.
Assume you have a polynomial time algorithm A that solves your problem.
Now, given an instance (G,k) of graph coloring - there are only O(n^3) possibilities to #color1,#color2,#color3 - so by examining each of these and invoking A on it, you can find a polynomial time solution to Graph-Coloring. This will mean P=NP, which is most likely (according to most CS researchers) not the case.
tl;dr:
For 2 colors: find out if the graph is bipartite - and give one color to each side of the graph.
For 3 or more colors: There is no known efficient solution, and the general belief is one does not exist.
I thought this problem would be easier for planar graph, but unfortunately it's not the case. Best match for this problem I was able to find is minimum sum coloring and largest bipartite subgraph.
For largest bipartite subgraph, assume that number of reds + number of greens exactly match the size of largest bipartite subgraph. Then this problem is equivalent to your. Paper claims that it's still NP-hard even for planar graphs.
For minimum sum coloring, assume that red color has weight 1, green color has color 2, and we have infinitely many blue* colors with some weight of >graph size. Then if answer is exactly minimal sum coloring, there is no polynomial algorithm to find it (although paper referes to such algorithm for chordal graphs).
Anyway, it seems that the closer your red+green count to the 'optimal' in some sense subgraph, the more difficult problem is.
If you can afford inexact solution, or relaxed solution then you only spearate, say, reds, you have an option. As I said in comment, approximate solution of maximum independent set problem for planar graph. Then color this set into red and blue colors, if it is large enough.
If you know that red+green is much less than total number of vertices, another approximation can work. Look at introduction chapter of this article. It claims that:
For graphs which are promised to have small chromatic number, a better
guarantee is known: given a k-colorable graph, an algorithm due to
Karger et al. [12] uses semidefinite programming (SDP) to find an
independent set of size about Ω(n/∆^(1−2/k)).
Your graph is for sure 4-colorable, so you can count on large enough independent set. The same article states that greedy solution already can find large enough independent set.

Looking for an algorithm by possibly Dijkstra

I am looking for an algorithm that distributes nodes on a plane, such that the edges are
all the same size. I think it is by Dijkstra, but I cannot remember.
Anyone heard of this algorithm?
In general this will be impossible. Effectively you want something similar to the finite pictures in tilings of the plane.
There are some simple cases - regular polygons and a few graphs which include joined polygons, but even something as simple as the complete graph for 4 points (tetrahedron) is impossible.
If you want something that tries to balance the impossible constraints, try graphviz and its neato program.
Well if you want to create any graph with such property, then there are number of graphs that may help you with that, for instance: a line, a ring, a tree etc .. but in here, you are the one who decide what edges to include or exclude.
If you have a certain graph, and you want to have all edges of the same size then this is impossible (because of some cases) - such as: a complete graph of more than 3 nodes, a star topology with one master and more than 5 slaves, and slaves that are directly close to each other are neighbors. [I believe the cases in the other posts tells you more]
A special case, is given a graph $G(V,E)$, draw $G$ such that the length of each edge in $e \in E$ is less than a unit. This is an NP-Hard problem. [That is, you cannot decide whether an arbitrary graph $G$ is a unit disk graph]

Graph Algorithm To Find All Paths Between N Arbitrary Vertices

I have an graph with the following attributes:
Undirected
Not weighted
Each vertex has a minimum of 2 and maximum of 6 edges connected to it.
Vertex count will be < 100
Graph is static and no vertices/edges can be added/removed or edited.
I'm looking for paths between a random subset of the vertices (at least 2). The paths should simple paths that only go through any vertex once.
My end goal is to have a set of routes so that you can start at one of the subset vertices and reach any of the other subset vertices. Its not necessary to pass through all the subset nodes when following a route.
All of the algorithms I've found (Dijkstra,Depth first search etc.) seem to be dealing with paths between two vertices and shortest paths.
Is there a known algorithm that will give me all the paths (I suppose these are subgraphs) that connect these subset of vertices?
edit:
I've created a (warning! programmer art) animated gif to illustrate what i'm trying to achieve: http://imgur.com/mGVlX.gif
There are two stages pre-process and runtime.
pre-process
I have a graph and a subset of the vertices (blue nodes)
I generate all the possible routes that connect all the blue nodes
runtime
I can start at any blue node select any of the generated routes and travel along it to reach my destination blue node.
So my task is more about creating all of the subgraphs (routes) that connect all blue nodes, rather than creating a path from A->B.
There are so many ways to approach this and in order not confuse things, here's a separate answer that's addressing the description of your core problem:
Finding ALL possible subgraphs that connect your blue vertices is probably overkill if you're only going to use one at a time anyway. I would rather use an algorithm that finds a single one, but randomly (so not any shortest path algorithm or such, since it will always be the same).
If you want to save one of these subgraphs, you simply have to save the seed you used for the random number generator and you'll be able to produce the same subgraph again.
Also, if you really want to find a bunch of subgraphs, a randomized algorithm is still a good choice since you can run it several times with different seeds.
The only real downside is that you will never know if you've found every single one of the possible subgraphs, but it doesn't really sound like that's a requirement for your application.
So, on to the algorithm: Depending on the properties of your graph(s), the optimal algorithm might vary, but you could always start of with a simple random walk, starting from one blue node, walking to another blue one (while making sure you're not walking in your own old footsteps). Then choose a random node on that path and start walking to the next blue from there, and so on.
For certain graphs, this has very bad worst-case complexity but might suffice for your case. There are of course more intelligent ways to find random paths, but I'd start out easy and see if it's good enough. As they say, premature optimization is evil ;)
A simple breadth-first search will give you the shortest paths from one source vertex to all other vertices. So you can perform a BFS starting from each vertex in the subset you're interested in, to get the distances to all other vertices.
Note that in some places, BFS will be described as giving the path between a pair of vertices, but this is not necessary: You can keep running it until it has visited all nodes in the graph.
This algorithm is similar to Johnson's algorithm, but greatly simplified thanks to the fact that your graph is unweighted.
Time complexity: Since there is a constant number of edges per vertex, each BFS will take O(n), and the total will take O(kn), where n is the number of vertices and k is the size of the subset. As a comparison, the Floyd-Warshall algorithm will take O(n^3).
What you're searching for is (if I understand it correctly) not really all paths, but rather all spanning trees. Read the wikipedia article about spanning trees here to determine if those are what you're looking for. If it is, there is a paper you would probably want to read:
Gabow, Harold N.; Myers, Eugene W. (1978). "Finding All Spanning Trees of Directed and Undirected Graphs". SIAM J. Comput. 7 (280).

Resources