I'm just wondering if, like for strings where we have the Levenshtein distance (or edit distance) between two strings, is there something similar for graphs?
I mean, a scalar measure that identifies the number of atomic operations (node and edges insertion/deletion) to transform a graph G1 to a graph G2.
I think graph edit distance is the measure that you were looking for.
Graph edit distance measures the minimum number of graph edit operations to transform one graph to another, and the allowed graph edit operations includes:
Insert/delete an isolated vertex
Insert/delete an edge
Change the label of a vertex/edge (if labeled graphs)
However, computing the graph edit distance between two graphs is NP-hard. The most efficient algorithm for computing this is an A*-based algorithm, and there are other sub-optimal algorithms.
You should look at the paper A survey of graph edit distance
For a general graph it is a NP-complete problem as others mentioned in their answer. But for tree graph there are well known polynomial algorithms. May be most famous of them is "Zhang Shasha" algorithm which was published in 1989.
Note:
The Levenshtein distance (or edit distance) is between two strings
But in Graph you should search between at least N! position that you find Identity of each edge and vertex.
You can compare between two graph by unique index easily,But
The master question is define identity for each vertex and edge.this question (find identity for each vertex and edge in two graph that they can to transform ) is very hard and was called isomorphism problem (NP-Complete).
You can search about isomorphism graph.
Related
Problem
I have a list of approximatly 200000 nodes that represent lat/lon position in a city and I have to compute the Minimum Spanning Tree. I know that I need to use Prim algorithm but first of all I need a connected graph. (We can assume that those nodes are in a Euclidian plan)
To build this connected graph I thought firstly to compute the complete graph but (205000*(205000-1)/2 is around 19 billions edges and I can't handle that.
Options
Then I came across to Delaunay triangulation: with the fact that if I build this "Delauney graph", it contains a sub graph that is the Minimum Spanning Tree according and I have a total of around 600000 edges according to Wikipedia [..]it has at most 3n-6 edges. So it may be a good starting point for a Minimum Spanning Tree algorithm.
Another options is to build an approximately connected graph but with that I will maybe miss important edges that will influence my Minimum Spanning Tree.
My question
Is Delaunay a reliable solution in this case? If so, is there any other reliable solution than delaunay triangulation to this problem ?
Further information: this problem has to be solved in C.
The Delaunay triangulation of a point set is always a superset of the EMST of these points. So it is absolutely "reliable"*. And recommended, as it has a size linear in the number of points and can be efficiently built.
*When there are cocircular point quadruples, neither the triangulation nor the EMST are uniquely defined, but this is usually harmless.
There's a big question here of what libraries you have access to and how much you trust yourself as a coder. (I'm assuming the fact that you're new on SO should not be taken as a measure of your overall experience as a programmer - if it is, well, RIP.)
If we assume you don't have access to Delaunay and can't implement it yourself, minimum spanning trees algorithms that pre-suppose a graph aren't necessarily off limits to you. You can have the complete graph conceptually but not actually. Kruskal's algorithm, for instance, assumes you have a sorted list of all edges in your graph; most of your edges will not be near the minimum, and you do not have to compare all n^2 to find the minimum.
You can find minimum edges quickly by estimations that give you a reduced set, then refinement. For instance, if you divide your graph into a 100*100 grid, for any point p in the graph, points in the same grid square as p are guaranteed to be closer than points three or more squares away. This gives a much smaller set of points you have to compare to safely know you've found the closest.
It still won't be easy, but that might be easier than Delaunay.
I've got a DAG of around 3.300 vertices which can be laid out quite successfully by dot as a more or less simple tree (things get complicated because vertices can have more than one predecessor from a whole different rank, so crossovers are frequent). Each vertex in the graph came into being at a specific time in the original process and I want one axis in the layout to represent time: An edge relation like a -> v, b -> v means that a and b came into being at some specific time before v.
Is there a layout algorithm for DAGs which would allow me to specify the positions (or at least the distances) on one axis and come up with an optimal layout regarding edge crossovers on the other?
You can make a topological sorting of the DAG to have the vertices sorted in a way that for every edge x->y, vertex x comes before than y.
Therefore, if you have a -> v, b -> v, you will get something like a, b, v or b, a, v.
Using this you can easily represents DAGs like this:
Yes, as #Arturo-Menchaca said a topological sorting may help to reduce overlapping count of edges. But it may be not optimal. There is no good algorithm for edge crossing minimization. Problem for crossing minimization is NP-complete. The heuristics are applied for solving this problem.
This StackOverflow link may help you: Drawing Directed Acyclic Graphs: Minimizing edge crossing?
I suppose your problem is related to an aesthetically pleasing way of the graph layout. Some heuristics are described in the articles Overview of algorithms for graph drawing, Force-Directed Drawing Algorithms. May be information about planar graph or almost planar graph can help you also.
Some review of the algorithms for checking and drawing planar graphs are described in the Wiki pages Planar graph, Crossing number (graph theory). The libraries and algorithms for planar graph drawing are described in the StackOverflow question How to check if a Graph is a Planar Graph or not? For example the author in the article GA for straight-line grid drawings of maximal planar graphs uses genetic algorithms for straight-line grid drawing.
Good descriptions for almost planar graphs are given in the articles Straight-Line Drawability of a Planar Graph Plus an Edge, On the Crossing Number of Almost Planar Graphs.
Try to modify the original algoritms using your condition with one axis alignment.
If I understood you correctly then you want to minimize the number of edge-crossings in your graph layout. If so, then the answer is "No", because this problem is proved to be NP-complete in the general case. See this, "Crossing Number is NP-Complete, Garey, Johnson".
If you need a not an optimal but just good enough solution, there are multiple articles on this topic because it is heavily related with circuit layouts. Probably googling "crossing number heuristics" and looking through the abstracts of some papers will solve your task better then me trying to guess blindly your requirements.
I am designing a system to find shortest route covering least number of cells. Suppose the plane is divided into rectangular cells. What will be the best suited algorithm for this. I am just looking for the head-start and not the proper code or implementation.
You are dealing with shortest path problem, in an unweighted graph (vertices are the cells in your grid, and edges are possible moves from one cell to the other)
The simplest approach is a simple BFS - that finds the shortest
path from a source to all targets (in unweighted graphs).
This algorithm is fairly simple and is iteratively "discovering" the closest nodes, all nodes of distance 1, then of distance 2, ....
An optimization is using a bi-directional search. Here you can take advantage of having single source and single target by doing BFS from both sides, effectively reducing the number of nodes you need to develop dratically.
One more optimization could be to use A* Search Algorithm with
an admissible heuristic function, such as manhattan distances.
In A* Search, you take advantage that the graph is not some arbitrary graph - but a grid, and you can estimate distance from one node to the other, based on their locations on the grid. The algorithm will uses these estimations to find the optimal path quicker.
Note - all algorithms I suggested find the shortest path, difference is in the time it will take them to find it.
I have a set of points and a distance function applicable to each pair of points. I would like to connect ALL the points together, with the minimum total distance. Do you know about an existing algorithm I could use for that ?
Each point can be linked to several points, so this is not the usual "salesman itinerary" problem :)
Thanks !
What you want is a Minimum spanning tree.
The two most common algorithms to generate one are:
Prim's algorithm
Kruskal's algorithm
As others have said, the minimum spanning tree (MST) will allow you to form a minimum distance sub-graph that connects all of your points.
You will first need to form a graph for your data set though. To efficiently form an undirected graph you could compute the Delaunay triangulation of your point set. The conversion from the triangulation to the graph is then fairly literal - any edge in the triangulation is also an edge in the graph, weighted by the length of the triangulation edge.
There are efficient algorithms for both the MST (Prim's/Kruskal's O(E*log(V))) and Delaunay triangulation (Divide and Conquer O(V*log(V))) phases, so efficient overall approaches are possible.
Hope this helps.
The algorithm you are looking for is called minimum spanning tree. It's useful to find the minimum cost for a water, telephone or electricity grid. There is Prim's algorithm or Kruskal algorithm. IMO Prim's algorithm is a bit easier to understand.
here http://en.wikipedia.org/wiki/Minimum_spanning_tree you can find more information about the minimum spanning tree, so you can adapt it to solve your problem.
Take a look at the Dijkstra's algorithm:
Dijkstra's algorithm, conceived by Dutch computer scientist Edsger Dijkstra in 1956 and published in 1959, is a graph search algorithm that solves the single-source shortest path problem for a graph with nonnegative edge path costs, producing a shortest path tree. This algorithm is often used in routing and as a subroutine in other graph algorithms.
http://en.wikipedia.org/wiki/Dijkstra's_algorithm
I have this technical problem that can be formuated with a Directed Acyclic Graph (DAG).
Nodes represent events (with unknown timing), with directed edges encoding the relation ship: "I'm younger than you/I happened before you".
I need to estimate edge weights (i.e. "dynamic weights") such that the Weighted DAG (WDAG) implies a distance function on the DAG. In other words the sum of weights for paths between nodes A and B should be equal for all paths.
This is an undetermined problem, even if the weights are integers (same underlying reason that topological sorting is not unique, I suppose). In all generality, the edge weights, which represent timeintervals between nodes/events, are real numbers. Therefore I'm introducing some preset objective function C=J(WDAG) on the weighted DAG, here unspecified.
My question is: is there an algorithm that distributes positive definite weights on the WDAG, subject to the constraint 1) that the weights form a distance function of the DAG and 2) minimizing the objective function cost C.
This seems unrelated to the shortest-path or minimum-spanning-tree problems classically associated to WDAG's. Any ideas to a formal or heuristic solution to the above problem?
Regards,
Stephane
I think all you need is topological sorting.
Sort the nodes according to a topological sort.
Distribute them in the [0,1] interval in the order you got.
Now the distance of node-pairs on the [0,1] line will give you the weight of the edge connecting them.
This is one possible solution. If you have more constraints on the weights than what you describe in the question, the problem might be more difficult.