I am playing around with implementing a junction tree algorithm for belief propagation on a Bayesian Network. I'm struggling a bit with triangulating the graph so the junction trees can be formed.
I understand that finding the optimal triangulation is NP-complete, but can you point me to a general purpose algorithm that results in a 'good enough' triangulation for relatively simple Bayesian Networks?
This is a learning exercise (hobby, not homework), so I don't care much about space/time complexity as long as the algorithm results in a triangulated graph given any undirected graph. Ultimately, I'm trying to understand how exact inference algorithms work before I even try doing any sort of approximation.
I'm tinkering in Python using NetworkX, but any pseudo-code description of such an algorithm using typical graph traversal terminology would be valuable.
Thanks!
If Xi is a possible variable (node) to be deleted then,
S(i) will be the size of the clique created by deleting this variable
C(i) will be the sum of the size of the cliques of the subgraph given by Xi and its adjacent nodes
Heuristic:
In each case select a variable Xi among the set of possible variables to be deleted with minimal S(i)/C(i)
Reference: Heuristic Algorithms for the Triangulation of Graphs
Related
I'm not sure if I'm using the right term here, but for fully connected components I mean there's an (undirected) edge between every pair of vertices in a component, and no additional vertices can be included without breaking this property.
There're a number algorithms for finding strongly connected components in a graph though (for example Tarjan's algorithm), is there an algorithm for finding such "fully connected components"?
What you are looking for is a list of all the maximal cliques of the graph. It's also called the clique problem. No known polynomial time solution exists for a generic undirected graph.
Most versions of the clique problem are hard. The clique decision problem is NP-complete (one of Karp's 21 NP-complete problems). The problem of finding the maximum clique is both fixed-parameter intractable and hard to approximate. And, listing all maximal cliques may require exponential time as there exist graphs with exponentially many maximal cliques. Therefore, much of the theory about the clique problem is devoted to identifying special types of graph that admit more efficient algorithms, or to establishing the computational difficulty of the general problem in various models of computation.
-https://en.wikipedia.org/wiki/Clique_problem
I was also looking at the same question.
https://en.wikipedia.org/wiki/Bron-Kerbosch_algorithm This turns out to be an algorithm to list it, however, it's not fast. If your graph is sparse, you may want to use the vertex ordering version of the algorithm:
For sparse graphs, tighter bounds are possible. In particular the vertex-ordering version of the Bron–Kerbosch algorithm can be made to run in time O(dn3d/3), where d is the degeneracy of the graph, a measure of its sparseness. There exist d-degenerate graphs for which the total number of maximal cliques is (n − d)3d/3, so this bound is close to tight.[6]
Is there an efficient algorithm to find the subgraph with the largest average degree (which may be the graph itself)?
The paper "Finding a Maximum-Density Subgraph" by Andrew Goldberg gives a polynomial-time algorithm for identifying such a graph. It looks like the algorithm makes logarithmically many calls to a max-flow algorithm on appropriately constructed graphs. From what I've read, it looks like the algorithm is just a binary search over the average degree where each guess is checked using a maximum flow on a standard graph construction. Most of the complexity appears to be in arguing why the construction is correct.
Hope this helps!
I have a set of points and a distance function applicable to each pair of points. I would like to connect ALL the points together, with the minimum total distance. Do you know about an existing algorithm I could use for that ?
Each point can be linked to several points, so this is not the usual "salesman itinerary" problem :)
Thanks !
What you want is a Minimum spanning tree.
The two most common algorithms to generate one are:
Prim's algorithm
Kruskal's algorithm
As others have said, the minimum spanning tree (MST) will allow you to form a minimum distance sub-graph that connects all of your points.
You will first need to form a graph for your data set though. To efficiently form an undirected graph you could compute the Delaunay triangulation of your point set. The conversion from the triangulation to the graph is then fairly literal - any edge in the triangulation is also an edge in the graph, weighted by the length of the triangulation edge.
There are efficient algorithms for both the MST (Prim's/Kruskal's O(E*log(V))) and Delaunay triangulation (Divide and Conquer O(V*log(V))) phases, so efficient overall approaches are possible.
Hope this helps.
The algorithm you are looking for is called minimum spanning tree. It's useful to find the minimum cost for a water, telephone or electricity grid. There is Prim's algorithm or Kruskal algorithm. IMO Prim's algorithm is a bit easier to understand.
here http://en.wikipedia.org/wiki/Minimum_spanning_tree you can find more information about the minimum spanning tree, so you can adapt it to solve your problem.
Take a look at the Dijkstra's algorithm:
Dijkstra's algorithm, conceived by Dutch computer scientist Edsger Dijkstra in 1956 and published in 1959, is a graph search algorithm that solves the single-source shortest path problem for a graph with nonnegative edge path costs, producing a shortest path tree. This algorithm is often used in routing and as a subroutine in other graph algorithms.
http://en.wikipedia.org/wiki/Dijkstra's_algorithm
I have a large set of points (n > 10000 in number) in some metric space (e.g. equipped with Jaccard Distance). I want to connect them with a minimal spanning tree, using the metric as the weight on the edges.
Is there an algorithm that runs in less than O(n2) time?
If not, is there an algorithm that runs in less than O(n2) average time (possibly using randomization)?
If not, is there an algorithm that runs in less than O(n2) time and gives a good approximation of the minimum spanning tree?
If not, is there a reason why such algorithm can't exist?
Thank you in advance!
Edit for the posters below:
Classical algorithms for finding minimal spanning tree don't work here. They have an E factor in their running time, but in my case E = n2 since I actually consider the complete graph. I also don't have enough memory to store all the >49995000 possible edges.
Apparently, according to this: Estimating the weight of metric minimum spanning trees in sublinear time there is no deterministic o(n^2) (note: smallOh, which is probably what you meant by less than O(n^2), I suppose) algorithm. That paper also gives a sub-linear randomized algorithm for the metric minimum weight spanning tree.
Also look at this paper: An optimal minimum spanning tree algorithm which gives an optimal algorithm. The paper also claims that the complexity of the optimal algorithm is not yet known!
The references in the first paper should be helpful and that paper is probably the most relevant to your question.
Hope that helps.
When I was looking at a very similar problem 3-4 years ago, I could not find an ideal solution in the literature I looked at.
The trick I think is to find a "small" subset of "likely good" edges, which you can then run plain old Kruskal on. In general, it's likely that many MST edges can be found among the set of edges that join each vertex to its k nearest neighbours, for some small k. These edges might not span the graph, but when they don't, each component can be collapsed to a single vertex (chosen randomly) and the process repeated. (For better accuracy, instead of picking a single representative to become the new "supervertex", pick some small number r of representatives and in the next round examine all r^2 distances between 2 supervertices, choosing the minimum.)
k-nearest-neighbour algorithms are quite well-studied for the case where objects can be represented as vectors in a finite-dimensional Euclidean space, so if you can find a way to map your objects down to that (e.g. with multidimensional scaling) then you may have luck there. In particular, mapping down to 2D allows you to compute a Voronoi diagram, and MST edges will always be between adjacent faces. But from what little I've read, this approach doesn't always produce good-quality results.
Otherwise, you may find clustering approaches useful: Clustering large datasets in arbitrary metric spaces is one of the few papers I found that explicitly deals with objects that are not necessarily finite-dimensional vectors in a Euclidean space, and which gives consideration to the possibility of computationally expensive distance functions.
What are the efficient algorithms for finding a vertex tour in a weighted undirected graph with maximum cost if we need to start from a particular vertex?
It's NPC because if you set weights as 1 for all edges, if HC exists it will be your answer, and so In all you can find HC existence from a single source which is NPC by solving this problem so your problem is NPC, but there are some polynomial approximation algorithms.
Since the problem is NP-hard, you are very unlikely to find an efficient algorithm that solves the problem exactly for all possible weighted input graphs.
However, there might be efficient algorithms that are guaranteed to find an answer that is at most a constant times away from the best possible answer, e.g. there might be an efficient algorithm that is guaranteed to find a path that has weight at least 1/2 of the maximum weight path.
If you are interested in searching for such algorithms, you could try Google searches for "weighted hamiltonian path approximation algorithm", which is close to, but not identical to, your problem. It is not the same because Hamiltonian paths are required to include all vertexes. Here is one research paper that might either contain, or have ideas that lead to, an approximation algorithm for your problem:
http://portal.acm.org/citation.cfm?id=139404.139468
"A general approximation technique for constrained forest problems" by Michel X. Goemans and David P. Williams.
Of course, if your graphs are small enough that you can enumerate all possible paths containing your desired vertex "fast enough for your purposes", then you can solve it exactly.