what kind of algorithm is this (stable marriage variation)? - algorithm

I have a set of objects (between 1 and roughly 500). Each object is compatible with certain (zero or more) other objects from that same set.
Can anyone give me some pointers as to how to determine the best way to create pairs of objects that are compatible with each other so that most of the objects in the set are paired?

You're looking for a maximum matching in a general graph. As opposed to the stable marriage problem with which you are familiar, in the maximum matching problem the input graph is not necessarily bipartite. There is no notion of stability (as vertices do not rank their compatible options) and what you're looking for is a subset of the edges of the graph such that no two edges share a common vertex (a.k.a., a matching). You're trying to construct that matching which contains the maximum possible number of edges.
Luckily, the problem of finding a maximum matching in a general graph can be solved in polynomial time using Edmond's matching algorithm (also known as the blossom algorithm because of how it contracts blossoms (odd cycles) into single vertices). The time complexity of Edmond's matching algorithm is O(E•V^2). While not very efficient, I believe this is good enough for the relatively small graphs you're dealing with. You don't even have to implement it from scratch by yourself as there's an open source Java implementation of Edmond's algorithm you can use. However, if you're interested in the state of the art, you can use the most efficient algorithm known for the problem which runs in O(E•sqrt(V)).
If the vertex compatibility of your input is not dichotomous (that is, each vertex has a ranking specifying its preferences among its neighbors), you can add corresponding weights to the edges to accommodate for the preference profile and use the variation of Edmond's algorithm for weighted graphs.

Related

Need Algorithm to Allocating Managers to Stores

Suppose you have "n" managers and "n" stores all located randomly across a geographic area. I need to be able to assign each manager to a store. The managers will travel daily from their homes to their assigned store. In general I'd like to minimize the daily distance travelled. This can be interpreted in two ways:
Minimize the average daily travel distance (which is also the same as minimizing the total travel distance.
Minimize the maximum travel distance for any single manager
Is this a known problem? Are there any obvious algorithms to solve it? It seems similar to the traveling salesman problem but it's not quite the same.
Both can be solved polynomially
I'll quickly cover both ways to define an optimal allocation as described in the question. Note that I won't make any assumption on the traveling times, such as triangle inequality. Of course such a property is likely to hold in practice, and there may be better algorithms that use these properties.
Minimize total distance
For this instance, we consider the managers and the stores to be a weighted complete bipartite graph. We then want a matching that minimizes the sum of the weights.
This is called the Balanced Assignment Problem, which is a specific case of minimum/maximum matching. Because the graph is bipartite, this can be solved polynomially. Wikipedia lists a couple of algorithms for solving this problem, most notably the Hungarian algorithm.
Minimize maximum distance
If we wish to minimize the maximum distance, we can find a solution through a binary search. Specifically, we binary search over the maximum distance and attempt to find a matching that does not violate this maximum distance.
For any given maximum distance x, we create the bipartite graph that has edges between manager M and store S if and only if d(M, S) < x. We then try to create a complete matching on this bipartite graph with any bipartite matching algorithm, and through success and failure complete the binary search for the smallest x that allows for matching, thus minimizing the maximum distance.

Topological sort of cyclic graph with minimum number of violated edges

I am looking for a way to perform a topological sorting on a given directed unweighted graph, that contains cycles. The result should not only contain the ordering of vertices, but also the set of edges, that are violated by the given ordering. This set of edges shall be minimal.
As my input graph is potentially large, I cannot use an exponential time algorithm. If it's impossible to compute an optimal solution in polynomial time, what heuristic would be reasonable for the given problem?
Eades, Lin, and Smyth proposed A fast and effective heuristic for the feedback arc set problem. The original article is behind a paywall, but a free copy is available from here.
There’s an algorithm for topological sorting that builds the vertex order by selecting a vertex with no incoming arcs, recursing on the graph minus the vertex, and prepending that vertex to the order. (I’m describing the algorithm recursively, but you don’t have to implement it that way.) The Eades–Lin–Smyth algorithm looks also for vertices with no outgoing arcs and appends them. Of course, it can happen that all vertices have incoming and outgoing arcs. In this case, select the vertex with the highest differential between incoming and outgoing. There is undoubtedly room for experimentation here.
The algorithms with provable worst-case behavior are based on linear programming and graph cuts. These are neat, but the guarantees are less than ideal (log^2 n or log n log log n times as many arcs as needed), and I suspect that efficient implementations would be quite a project.
Inspired by Arnauds answer and other interesting topological sorting algorithms have I created the cyclic-toposort project and published it on github. cyclic_toposort does exactly what you desire in that it quickly sorts a directed cyclic graph providing the minimal amount of violating edges. It optionally also provides the maximum groupings of nodes that are on the same topological level (and can therefore be activated at the same time) if desired.
If the problem is still relevant to you then I would be happy if you check out my project and let me know what you think!
This project was useful to my own neural network topology research, so I had to create something like this anyway. I am answering your question this late in case anyone else stumbles upon this thread in search for the same question.

Weighted bipartite matching

I'm aware of there's a lot of similar topics. But most of them left me some doubts in my case. What I want to do is find perfect matching (or as close to perfect as possible in case there's no perfect matching of course) and then from all of those matchings where you are able to match k out of n vertexes (where k is highest possible), I want to choose the highest possible total weight.
So simply put what I'm saying is following priority:
Match as many vertexes as possible
Because (non weighted) maximum matching in most cases is unambiguous, I want choose the one that have the biggest sum of weights on edges. If there are several of them with same weight it doesn't matter which would be chosen.
I've heard about Ford-Fulkerson algorithm. Is it working in the way I describe it or I need other algorithm?
If you're implementing this yourself, you probably want to use the Hungarian algorithm. Faster algorithms exist but aren't as easy to understand or implement.
Ford-Fulkerson is a maximum flow algorithm; you can use it easily to solve unweighted matching. Turning it into a weighted matcing algorithm requires an additional trick; with that trick, you wind up with the Hungarian algorithm.
You can also use a min-cost flow algorithm to do weighted bipartite matching, but it might not work quite as well. There's also the network simplex method, but it seems to be mostly of historical interest.

Significance of various graph types

There are a lot of named graph types. I am wondering what is the criteria behind this categorization. Are different types applicable in different context? Moreover, can a business application (from design and programming perspective) benefit anything out of these categorizations? Is this analogous to design patterns?
We've given names to common families of graphs for several reasons:
Certain families of graphs have nice, simple properties. For example, trees have numerous useful properties (there's exactly one path between any pair of nodes, they're maximally acyclic, they're minimally connected, etc.) that don't hold of arbitrary graphs. Directed acyclic graphs can be topologically sorted, which normal graphs cannot. If you can model a problem in terms of one of these types of graphs, you can use specialized algorithms on them to extract properties that can't necessarily be obtained from an arbitrary graph.
Certain algorithms run faster on certain types of graphs. Many NP-hard problems on graphs, which as of now don't have any polynomial-time algorithms, can be solved very easily on certain types of graphs. For example, the maximum independent set problem (choose the largest collection of nodes where no two nodes are connected by an edge) is NP-hard, but can be solved in polynomial time for trees and bipartite graphs. The 4-coloring problem (determine whether the nodes of a graph can be colored one of four different colors without assigning the same color to adjacent nodes) is NP-hard in general, but is immediately true for planar graphs (this is the famous four-color theorem).
Certain algorithms are easier on certain types of graphs. A matching in a graph is a collection of edges in the graph where no two edges share an endpoint. Maximum matchings can be used to represent ways of pairing people up into groups. In a bipartite graph, a maximum matching can be used to represent a way of assigning people to tasks such that no person is assigned two tasks and no task is assigned to two people. There are many fast algorithms for finding maximum matchings in bipartite graphs that work quickly and are easy to understand. The corresponding algorithms for general graphs are significantly more complicated and slightly less efficient.
Certain graphs are historically significant. Many named graphs are named after someone who used the graph to disprove a conjecture about properties of arbitrary graphs. The Petersen graph, for example, is a counterexample to many theorems that seem true about graphs but are actually not.
Certain graphs are useful in theoretical computer science. An expander graph is a graph where, intuitively, any collection of nodes must be connected to a proportionally larger collection of nodes in the graph. Not all graphs are expander graphs. Expander graphs are used in many results in theoretical computer science, such as one proof of the PCP theorem and in the proof that SL = L.
This is not an exhaustive list of why we care about different graph families, but hopefully it helps motivate their usage and study.
Hope this helps!

Graph Isomorphism

Is there an algorithm or heuristics for graph isomorphism?
Corollary: A graph can be represented in different different drawings.
What s the best approach to find different drawing of a graph?
It is a hell of a problem.
In general, the basic idea is to simplify the graph into a canonical form, and then perform comparison of canonical forms. Spanning trees are generated with this objective, but spanning trees are not unique, so you need to have a canonical way to create them.
After you have canonical forms, you can perform isomorphism comparison (relatively) easy, but that's just the start, since non-isomorphic graphs can have the same spanning tree. (e.g. think about a spanning tree T and a single addition of an edge to it to create T'. These two graphs are not isomorph, but they have the same spanning tree).
Other techniques involve comparing descriptors (e.g. number of nodes, number of edges), which can produce false positive in general.
I suggest you to start with the wiki page about the graph isomorphism problem. I also have a book to suggest: "Graph Theory and its applications". It's a tome, but worth every page.
As from you corollary, every possible spatial distribution of a given graph's vertexes is an isomorph. So two isomorph graphs have the same topology and they are, in the end, the same graph, from the topological point of view. Another matter is, for example, to find those isomorph structures enjoying particular properties (e.g. with non crossing edges, if exists), and that depends on the properties you want.
One of the best algorithms out there for finding graph isomorphisms is VF2.
I've written a high-level overview of VF2 as applied to chemistry - where it is used extensively. The post touches on the differences between VF2 and Ullmann. There is also a from-scratch implementation of VF2 written in Java that might be helpful.
A very similar problem - graph automorphism - can be solved by saucy, which is available in source code. This finds all symmetries of a graph. If you have two graphs, join them into one and any isomorphism can be discovered as an automorphism of the join.
Disclaimer: I am one of co-authors of saucy.
There are algorithms to do this -- however, I have not had cause to seriously investigate them as of yet. I believe Donald Knuth is either writing or has written on this subject in his Art of Computing series during his second pass at (re)writing them.
As for a simple way to do something that might work in practice on small graphs, I would recommend counting degrees, then for each vertex, also note the set of degrees for those vertexes that are adjacent. This will then give you a set of potential vertex isomorphisms for each point. Then just try all those (via brute force, but choosing the vertexes in increasing order of potential vertex isomorphism sets) from this restricted set. Intuitively, most graph isomorphism can be practically computed this way, though clearly there would be degenerate cases that might take a long time.
I recently came across the following paper : http://arxiv.org/abs/0711.2010
This paper proposes "A Polynomial Time Algorithm for Graph Isomorphism"
My project - Griso - at sf.net: http://sourceforge.net/projects/griso/ with this description:
Griso is a graph isomorphism testing utility written in C++. It is based on my own POLYNOMIAL-TIME (in this point the salt of the project) algorithm. See Griso's sample input/output on http://funkybee.narod.ru/graphs.htm page.
nauty and Traces
nauty and Traces are programs for computing automorphism groups of graphs and digraphs [*]. They can also produce a canonical label. They are written in a portable subset of C, and run on a considerable number of different systems.
AutGroupGraph command in GRAPE's package of GAP.
bliss: another symmetry and canonical labeling program.
conauto: a graph ismorphism package.
As for heuristics: i've been fantasising about a modified Ullmann's algorithm, where you don't only use breadth first search but mix it with depth first search the way, that first you use breadth first search intensively, than you set a limit for breadth analysis and go deeper after checking a few neighbours, and you lower the breadh every step at some amount. This is practically how i find my way on a map: first locate myself with breadth first search, then search the route with depth first search - largely, and this is the best evolution of my brain has ever invented. :) On the long term some intelligence may be added for increasing breadth first search neighbour count at critical vertexes - for example where there are a large number of neighbouring vertexes with the same edge count. Like checking your actual route sometimes with the car (without a gps).
I've found out that the algorithm belongs in the category of k-dimension Weisfeiler-Lehman algorithms, and it fails with regular graphs. For more here:
http://dabacon.org/pontiff/?p=4148
Original post follows:
I've worked on the problem to find isomorphic graphs in a database of graphs (containing chemical compositions).
In brief, the algorithm creates a hash of a graph using the power iteration method. There might be false positive hash collisions but the probability of that is exceedingly small (i didn't had any such collisions with tens of thousands of graphs).
The way the algorithm works is this:
Do N (where N is the radius of the graph) iterations. On each iteration and for each node:
Sort the hashes (from the previous step) of the node's neighbors
Hash the concatenated sorted hashes
Replace node's hash with newly computed hash
On the first step, a node's hash is affected by the direct neighbors of it. On the second step, a node's hash is affected by the neighborhood 2-hops away from it. On the Nth step a node's hash will be affected by the neighborhood N-hops around it. So you only need to continue running the Powerhash for N = graph_radius steps. In the end, the graph center node's hash will have been affected by the whole graph.
To produce the final hash, sort the final step's node hashes and concatenate them together. After that, you can compare the final hashes to find if two graphs are isomorphic. If you have labels, then add them (on the first step) in the internal hashes that you calculate for each node.
There is more background here:
https://plus.google.com/114866592715069940152/posts/fmBFhjhQcZF
You can find the source code of it here:
https://github.com/madgik/madis/blob/master/src/functions/aggregate/graph.py

Resources