Random spanning trees of bipartite graphs - algorithm

I'm working on making some code using metaheuristics for finding good solutions to the Fixed Charge Transportation Problem (FCTP).
The problem I'm having is to generate a starting solution, based on finding a spanning tree for the underlying bipartite graph.
I want it to be a random spanning tree, so that I can run the procedure on the same problem multiple times, possibly getting different solutions.
I'm having some difficulties doing this. The approach I've gone for so far is to make a random permutation of the arcs, then iterate through this list, sequentially putting them into basis if it won't create a cycle.
I need to find a fast method to check if including an arc will create a cycle. I don't want to "brute force" it, since this approach could take a large amount of time, given big problem instances.
Given that A is an array with a random permutation of the arcs, how would you go around making a selection procedure?
I've been working on this for a couple of hours now, and nothing I've tried has worked, most of it being nonsensical when it came to application...

Kruskals Algorithm is used for finding the minimum spanning tree. The fast-cycle detection is not actually part of Kruskals algorithm. The algorithm will work with a data structure that is able to find cycles fast as well as with a slow naive attempt (however the complexity will be different).
However Kruskals Algorithm is on track here, since it usually uses a so called union-find or disjoint-set datastructure for fast detection of cycles. This is the part of the Kruskals Algorithm page on wikipedia that you will need for your algorithm. This is also linked on wikipedia: http://en.wikipedia.org/wiki/Disjoint-set_data_structure

I found Kruskal's algorithm after long hours of research. I only needed to randomize the order in which I investigated the nodes of the graph.

Related

Difference between Boruvka and Kruskal in finding MST

I would like to know the difference between Boruvkas algorithm and Kruskals algorithm.
What they have in common:
both find the minimum spanning tree (MST) in a undirected graph
both add the shortest edge to the existing tree until the MST is found
both look at the graph in it`s entirety, unlike e.g. Prims algorithm, which adds one node after another to the MST
Both algorithmns are greedy
The only difference seems to be, that Boruvkas perspective is each individual node (from where it looks for the cheapest edge), instead of looking at the entire graph (like Kruskal does).
It therefore seems to be, that Boruvka should be relatively easy to do in parallel (unlike Kruskal). Is that true?
In case of Kruskal's algorithm, first of all we want to sort all edges from the cheapest to the most expensive ones. Then in each step we remove min-weight edge and if it doesn't create a cycle in our graph (which initially consits of |V|-1 separate vertices), then we add it to MST. Otherwise we just remove it.
Boruvka's algorithm looks for nearest neighbour of each component (initially vertex). It keeps selecting cheapest edge from each component and adds it to our MST. When we have only one connected component, it's done.
Finding cheapest outgoing edge from each node/component can be done easily in parallel. Then we can just merge new, obtained components and repeat finding phase till we find MST. That's why this algorithm is a good example for parallelism (in case of finding MST).
Regarding parallel processing using Kruskal's algorithm, we need to keep and check edges in strict order, that's why it's hard to achieve explicit parallelism. It's rather sequential and we can't do much about this (even if we still may consider e.g. parallel sorting). Although there were few approaches to implement this method in parallel way, those papers can be found easily to check their results.
Your description is accurate, but one detail can be clarified: Boruvka's algorithm's perspective is each connected component rather than each individual node.
Your intuition about parallelization is also right -- this paper has more details. Excerpt from the abstract:
In this paper we design and implement four parallel MST algorithms (three variations of Boruvka plus our new approach) for arbitrary sparse graphs that for the first time give speedup when compared with the best sequential algorithm.
The important difference between Boruvka's algorithm and Kruskal's or Prim's is that with Boruvka's you don't need to presort the edges or maintain a priority queue.
Boruvka's still incurs the extra log N factor in the cost, but it does it by requiring O(log N) passes over the edges.
You can parallelize Boruvka's algorithm, but you can also parallelize sorting, so I don't know if Boruvka's has any real advantages over Kruskal's in practice.

Topological sort of cyclic graph with minimum number of violated edges

I am looking for a way to perform a topological sorting on a given directed unweighted graph, that contains cycles. The result should not only contain the ordering of vertices, but also the set of edges, that are violated by the given ordering. This set of edges shall be minimal.
As my input graph is potentially large, I cannot use an exponential time algorithm. If it's impossible to compute an optimal solution in polynomial time, what heuristic would be reasonable for the given problem?
Eades, Lin, and Smyth proposed A fast and effective heuristic for the feedback arc set problem. The original article is behind a paywall, but a free copy is available from here.
There’s an algorithm for topological sorting that builds the vertex order by selecting a vertex with no incoming arcs, recursing on the graph minus the vertex, and prepending that vertex to the order. (I’m describing the algorithm recursively, but you don’t have to implement it that way.) The Eades–Lin–Smyth algorithm looks also for vertices with no outgoing arcs and appends them. Of course, it can happen that all vertices have incoming and outgoing arcs. In this case, select the vertex with the highest differential between incoming and outgoing. There is undoubtedly room for experimentation here.
The algorithms with provable worst-case behavior are based on linear programming and graph cuts. These are neat, but the guarantees are less than ideal (log^2 n or log n log log n times as many arcs as needed), and I suspect that efficient implementations would be quite a project.
Inspired by Arnauds answer and other interesting topological sorting algorithms have I created the cyclic-toposort project and published it on github. cyclic_toposort does exactly what you desire in that it quickly sorts a directed cyclic graph providing the minimal amount of violating edges. It optionally also provides the maximum groupings of nodes that are on the same topological level (and can therefore be activated at the same time) if desired.
If the problem is still relevant to you then I would be happy if you check out my project and let me know what you think!
This project was useful to my own neural network topology research, so I had to create something like this anyway. I am answering your question this late in case anyone else stumbles upon this thread in search for the same question.

search of the K-first short paths algorithm

I have found many algorithms and approaches that talk about finding the shortest path or the best/optimal solution to a problem. However, what I want to do is an algorithm that finds the first K-shortest paths from one point to another. The problem I'm facing is more like searching through a tree, when in each step you take there are multiple options each one with its weight. What kinds of algorithms are used to face this kind of problems?
There is the 2006 paper by Jose Santos
comparing three different K-shortest path finding algorithms.
Yen's algorithm implementation:
http://code.google.com/p/k-shortest-paths/
Easier algorithm & discussion:
Suggestions for KSPA on undirected graph
EDIT: apparently I clicked on a link, because I thought I was answering to a new question; ignore this if - as is very likely - this question isn't important to you anymore.
Given the restricted version of the problem you're dealing with, this becomes a lot simpler to implement. The most important thing to notice is that in trees, shortest paths are the only paths between two nodes. So what you do is solve all pairs shortest paths, which is O(n²) in trees by doing n BFS traversals, and then you get the k minimal values. This probably can be optimized in some way, but the naive approach to do that is sort the O(n²) distances in O(n² log n) time and take the k smallest values; with some book keeping, you can keep track of which distance corresponds to which path without time complexity overhead. This will give you better complexity than using a KSPA algorithm for O(n²) possible s-t-pairs.
If what you actually meant is fixing a source and get the k nodes with the smallest distance from that source, one BFS will do. In case you meant fixing both source and target, one BFS is enough as well.
I don't see how you can use the fact that all edges going from a node to the nodes in the level below have the same weight without knowing more about the structure of the tree.

Finding all shortest paths from every pair of nodes on a graph

I have about 70k nodes, and 250k edges, and the graph isn't necessarily connected. Obviously using an efficient algorithm is crucial. What do you recommend?
As a side note, I would appreciate advice on how to divide the task up between several machines--is that even possible with this kind of problem?
Thanks
You could use the Floyd-Warshall algorithm. It solves exactly this problem.
The complexity is O(V^3).
There is also Johnson's algorithm with a complexity of O(V^2*log V + VE). The latter is also easy to distribute because it runs Dijkstra's algorithm V times, which can be done in parallel.
MapReduce is a great distributed algorithm for this, though it might be a little too high-powered. If you're interested in that, take a look at this lecture or maybe this blog post for inspiration. (In fact, when I was taught MapReduce, this was one of the first examples.)
For 250k edges and 70k, it seems like the graph is relatively sparse, Dijkstra's algorithm runs in O( E + V log V ) for each node, for a full running time (all sources) of O( VE + V^2 log V ). This should be fast enough, but the usual caveats apply for Dijkstra's. (Negative edges.)
You can also take a look at Johnson's algorithm if your problem deals with negative weights, but not negative cycles. Specifically, it can also be distributed, as it takes the reweighted graph and runs Dijkstra's algorithm from each node.
There are two naive ways of parallelizing this problem:
1) Identify subcomponents and distribute these over different computers. The path length between two nodes from two different components is undefined.
2) Load the graph in different computers and give every computers a list of nodes to calculate all shortest paths. The results for one node are not dependent on the results of another node so you can parallelize this problem.
Upside: not too hard to implement but I would only do it like this if you have to solve this once. If this is a recurring issue then you might want to look at distributed algorithsm.
Use igraph, it's written in C, pretty fast and you can use Python as a wrapper language.
Look at papers/publications which have the following keywords: distributed graph search algorithms. Here's one that may be of help.
There's this ACM account only paper as well: Distributed computation on graphs: shortest path algorithms

Graph Isomorphism

Is there an algorithm or heuristics for graph isomorphism?
Corollary: A graph can be represented in different different drawings.
What s the best approach to find different drawing of a graph?
It is a hell of a problem.
In general, the basic idea is to simplify the graph into a canonical form, and then perform comparison of canonical forms. Spanning trees are generated with this objective, but spanning trees are not unique, so you need to have a canonical way to create them.
After you have canonical forms, you can perform isomorphism comparison (relatively) easy, but that's just the start, since non-isomorphic graphs can have the same spanning tree. (e.g. think about a spanning tree T and a single addition of an edge to it to create T'. These two graphs are not isomorph, but they have the same spanning tree).
Other techniques involve comparing descriptors (e.g. number of nodes, number of edges), which can produce false positive in general.
I suggest you to start with the wiki page about the graph isomorphism problem. I also have a book to suggest: "Graph Theory and its applications". It's a tome, but worth every page.
As from you corollary, every possible spatial distribution of a given graph's vertexes is an isomorph. So two isomorph graphs have the same topology and they are, in the end, the same graph, from the topological point of view. Another matter is, for example, to find those isomorph structures enjoying particular properties (e.g. with non crossing edges, if exists), and that depends on the properties you want.
One of the best algorithms out there for finding graph isomorphisms is VF2.
I've written a high-level overview of VF2 as applied to chemistry - where it is used extensively. The post touches on the differences between VF2 and Ullmann. There is also a from-scratch implementation of VF2 written in Java that might be helpful.
A very similar problem - graph automorphism - can be solved by saucy, which is available in source code. This finds all symmetries of a graph. If you have two graphs, join them into one and any isomorphism can be discovered as an automorphism of the join.
Disclaimer: I am one of co-authors of saucy.
There are algorithms to do this -- however, I have not had cause to seriously investigate them as of yet. I believe Donald Knuth is either writing or has written on this subject in his Art of Computing series during his second pass at (re)writing them.
As for a simple way to do something that might work in practice on small graphs, I would recommend counting degrees, then for each vertex, also note the set of degrees for those vertexes that are adjacent. This will then give you a set of potential vertex isomorphisms for each point. Then just try all those (via brute force, but choosing the vertexes in increasing order of potential vertex isomorphism sets) from this restricted set. Intuitively, most graph isomorphism can be practically computed this way, though clearly there would be degenerate cases that might take a long time.
I recently came across the following paper : http://arxiv.org/abs/0711.2010
This paper proposes "A Polynomial Time Algorithm for Graph Isomorphism"
My project - Griso - at sf.net: http://sourceforge.net/projects/griso/ with this description:
Griso is a graph isomorphism testing utility written in C++. It is based on my own POLYNOMIAL-TIME (in this point the salt of the project) algorithm. See Griso's sample input/output on http://funkybee.narod.ru/graphs.htm page.
nauty and Traces
nauty and Traces are programs for computing automorphism groups of graphs and digraphs [*]. They can also produce a canonical label. They are written in a portable subset of C, and run on a considerable number of different systems.
AutGroupGraph command in GRAPE's package of GAP.
bliss: another symmetry and canonical labeling program.
conauto: a graph ismorphism package.
As for heuristics: i've been fantasising about a modified Ullmann's algorithm, where you don't only use breadth first search but mix it with depth first search the way, that first you use breadth first search intensively, than you set a limit for breadth analysis and go deeper after checking a few neighbours, and you lower the breadh every step at some amount. This is practically how i find my way on a map: first locate myself with breadth first search, then search the route with depth first search - largely, and this is the best evolution of my brain has ever invented. :) On the long term some intelligence may be added for increasing breadth first search neighbour count at critical vertexes - for example where there are a large number of neighbouring vertexes with the same edge count. Like checking your actual route sometimes with the car (without a gps).
I've found out that the algorithm belongs in the category of k-dimension Weisfeiler-Lehman algorithms, and it fails with regular graphs. For more here:
http://dabacon.org/pontiff/?p=4148
Original post follows:
I've worked on the problem to find isomorphic graphs in a database of graphs (containing chemical compositions).
In brief, the algorithm creates a hash of a graph using the power iteration method. There might be false positive hash collisions but the probability of that is exceedingly small (i didn't had any such collisions with tens of thousands of graphs).
The way the algorithm works is this:
Do N (where N is the radius of the graph) iterations. On each iteration and for each node:
Sort the hashes (from the previous step) of the node's neighbors
Hash the concatenated sorted hashes
Replace node's hash with newly computed hash
On the first step, a node's hash is affected by the direct neighbors of it. On the second step, a node's hash is affected by the neighborhood 2-hops away from it. On the Nth step a node's hash will be affected by the neighborhood N-hops around it. So you only need to continue running the Powerhash for N = graph_radius steps. In the end, the graph center node's hash will have been affected by the whole graph.
To produce the final hash, sort the final step's node hashes and concatenate them together. After that, you can compare the final hashes to find if two graphs are isomorphic. If you have labels, then add them (on the first step) in the internal hashes that you calculate for each node.
There is more background here:
https://plus.google.com/114866592715069940152/posts/fmBFhjhQcZF
You can find the source code of it here:
https://github.com/madgik/madis/blob/master/src/functions/aggregate/graph.py

Resources