I have a question regarding Voronoi Diagram structure. why Voronoi Diagram is efficient for nearest neighbor search? How it works? i know that this structure is partitioning the space into cells but i still could not understand how it works when the search query is performed.
You need a data structure like a hierarchical cluster or a spatial index. The kirkpatrick algorithm seems very close but you can try a combination of a metric tree and a voronoi diagram:http://www.cc.gatech.edu/~phlosoft/voronoi/. The voronoi diagram is also good for pathfinding:http://www.cs.columbia.edu/~pblaer/projects/path_planner/ because it's the dual of the delaunay triangulation and the minimum spanning tree. Of course you can check each cell but it's trivial and a good exercise for the reader.
Related
Let say I created a Minimum Spanning Tree out of Graph with M nodes. Is there an algorithm to create N number of clusters.
I'm looking to cut some of the links such as that I end up with N clusters and label them i.e. given a node X I can query in which cluster it belongs.
What I think is once I have the MST, I cut the top/max M-N edges of the MST and I will get N clusters ?
Is my logic correct ?
That seems a good way to me. You ask whether it's "correct" -- that I can't say, since I don't know what other unstated criteria you have in mind. All you have actually stated that you want is to create N clusters -- which you could also achieve by throwing away the MST, putting vertex 1 in the first cluster, vertex 2 in the second, ..., vertex N-1 in the (N-1)th, and all remaining vertices in the Nth.
If you're using Kruskal's algorithm to build the MST, you can achieve what you're suggesting by simply stopping the algorithm early, as soon as only N components remain.
A tree is a (very sparse) subset of edges of a graph, if you cut based on them you are not taking into consideration a (possible) vast majority of edges in your graph.
Based on the fact that you want to use a M(inimum)ST algorithm to create clusters, it would seem you want to minimize the set of edges that lie in the n-way cut induced by your clustering. Using an MST as a proxy with a graph with very similar weight edges will produce likely terrible results.
Graph clustering is a heavily studied topic, have you considered using an existing library to accomplish this? If you insist on implementing your own algorithm, I would recommend spectral clustering as a starting point as it will produce decent results without much effort.
Edit based on feedback in coments:
If your main bottleneck is the similarity matrix then the following should be considered:
Investigate sparse matrix/graph representation while implementing something like spectral clustering which is probably going to give much more robust results than single-linkage clustering
Investigate pruning edges from the similarity matrix which you think are unimportant. If pruning is combined with a sparse representation of the similarity matrix, this should yield comparable performance to the MST approach while giving a smooth continuum to tune performance vs quality.
I've got a DAG of around 3.300 vertices which can be laid out quite successfully by dot as a more or less simple tree (things get complicated because vertices can have more than one predecessor from a whole different rank, so crossovers are frequent). Each vertex in the graph came into being at a specific time in the original process and I want one axis in the layout to represent time: An edge relation like a -> v, b -> v means that a and b came into being at some specific time before v.
Is there a layout algorithm for DAGs which would allow me to specify the positions (or at least the distances) on one axis and come up with an optimal layout regarding edge crossovers on the other?
You can make a topological sorting of the DAG to have the vertices sorted in a way that for every edge x->y, vertex x comes before than y.
Therefore, if you have a -> v, b -> v, you will get something like a, b, v or b, a, v.
Using this you can easily represents DAGs like this:
Yes, as #Arturo-Menchaca said a topological sorting may help to reduce overlapping count of edges. But it may be not optimal. There is no good algorithm for edge crossing minimization. Problem for crossing minimization is NP-complete. The heuristics are applied for solving this problem.
This StackOverflow link may help you: Drawing Directed Acyclic Graphs: Minimizing edge crossing?
I suppose your problem is related to an aesthetically pleasing way of the graph layout. Some heuristics are described in the articles Overview of algorithms for graph drawing, Force-Directed Drawing Algorithms. May be information about planar graph or almost planar graph can help you also.
Some review of the algorithms for checking and drawing planar graphs are described in the Wiki pages Planar graph, Crossing number (graph theory). The libraries and algorithms for planar graph drawing are described in the StackOverflow question How to check if a Graph is a Planar Graph or not? For example the author in the article GA for straight-line grid drawings of maximal planar graphs uses genetic algorithms for straight-line grid drawing.
Good descriptions for almost planar graphs are given in the articles Straight-Line Drawability of a Planar Graph Plus an Edge, On the Crossing Number of Almost Planar Graphs.
Try to modify the original algoritms using your condition with one axis alignment.
If I understood you correctly then you want to minimize the number of edge-crossings in your graph layout. If so, then the answer is "No", because this problem is proved to be NP-complete in the general case. See this, "Crossing Number is NP-Complete, Garey, Johnson".
If you need a not an optimal but just good enough solution, there are multiple articles on this topic because it is heavily related with circuit layouts. Probably googling "crossing number heuristics" and looking through the abstracts of some papers will solve your task better then me trying to guess blindly your requirements.
I'm just wondering if, like for strings where we have the Levenshtein distance (or edit distance) between two strings, is there something similar for graphs?
I mean, a scalar measure that identifies the number of atomic operations (node and edges insertion/deletion) to transform a graph G1 to a graph G2.
I think graph edit distance is the measure that you were looking for.
Graph edit distance measures the minimum number of graph edit operations to transform one graph to another, and the allowed graph edit operations includes:
Insert/delete an isolated vertex
Insert/delete an edge
Change the label of a vertex/edge (if labeled graphs)
However, computing the graph edit distance between two graphs is NP-hard. The most efficient algorithm for computing this is an A*-based algorithm, and there are other sub-optimal algorithms.
You should look at the paper A survey of graph edit distance
For a general graph it is a NP-complete problem as others mentioned in their answer. But for tree graph there are well known polynomial algorithms. May be most famous of them is "Zhang Shasha" algorithm which was published in 1989.
Note:
The Levenshtein distance (or edit distance) is between two strings
But in Graph you should search between at least N! position that you find Identity of each edge and vertex.
You can compare between two graph by unique index easily,But
The master question is define identity for each vertex and edge.this question (find identity for each vertex and edge in two graph that they can to transform ) is very hard and was called isomorphism problem (NP-Complete).
You can search about isomorphism graph.
Im looking at this pdf as im trying to build an MSSP(multiple source shortest path) but im lacking the knowledge how to built interdegitating trees. until now i created the Spanning tree therefore the plannar graph is created, but im stuck cause i have no idea how i will build its dual. Is there any specific algorithm/approach or any paper which could help me solve this? As i searched and could find nothing useful.
If you don't have one already, you need a combinatorial embedding. There are efficient algorithms to obtain one from the incidence structure, but natural sources of planar graphs typically have a natural embedding. There are many ways to represent the embedding. I used a permutation π mapping each half-edge to the next half-edge in counterclockwise order with the same head vertex. With each (non-isolated) vertex is associated a circularly linked list of incoming half-edges.
Let rev be the permutation that maps each half-edge to its other half, with opposite head and tail vertices. The embedding permutation for the dual graph is the composition of π followed by rev. It maps each half-edge to the next half-edge on the face in clockwise order (or counterclockwise on the infinite face, because you're looking at its back side). This will be clearer if you try some examples by hand.
After you compute shortest paths from the initial root (I used Dijkstra, and unless your MSSP implementation is much faster than mine, there isn't much relative improvement to be had by using an asymptotically faster algorithm), do a depth-first search where the edges that belong to the shortest-path tree are ignored. (Another alternative is to visit the half-edges of the interdigitating tree in Euler-tour order, but this approach seemed as though it would incur extra logarithmic-time dynamic-tree operations.)
For finding the nearest neighbor, Space Partitioning is one of the algorithms. How does it work?
Suppose I have a 2D set of points (x and y coordinates), and I am given a point (a,b). How would this algorithm find out the nearest neighbor?
Spacial partitioning is actually a family of closely related algorithms that partition space so that applications can process the points or polygons easier.
I reckon there are many ways to solve your problem. I don't know how complex you are willing to build your solution. A simple way would probably to build a binary tree cutting the space into 2. All the points divided between some middle plane. Build your tree by subdividing recursively until you run out of points.
Searching for the nearest neighbor will then be optimized, because each traversal of the tree narrows down the search area.
In some literature, they call this a kd tree
These two videos should help:
Explaining how a KD tree is built up:
http://www.youtube.com/watch?v=T9h2KKJ_Pl8
Explaining how to perform nearest
neighbour search: http://www.youtube.com/watch?v=2SbVSxWGtpI