Significance of various graph types - algorithm

There are a lot of named graph types. I am wondering what is the criteria behind this categorization. Are different types applicable in different context? Moreover, can a business application (from design and programming perspective) benefit anything out of these categorizations? Is this analogous to design patterns?

We've given names to common families of graphs for several reasons:
Certain families of graphs have nice, simple properties. For example, trees have numerous useful properties (there's exactly one path between any pair of nodes, they're maximally acyclic, they're minimally connected, etc.) that don't hold of arbitrary graphs. Directed acyclic graphs can be topologically sorted, which normal graphs cannot. If you can model a problem in terms of one of these types of graphs, you can use specialized algorithms on them to extract properties that can't necessarily be obtained from an arbitrary graph.
Certain algorithms run faster on certain types of graphs. Many NP-hard problems on graphs, which as of now don't have any polynomial-time algorithms, can be solved very easily on certain types of graphs. For example, the maximum independent set problem (choose the largest collection of nodes where no two nodes are connected by an edge) is NP-hard, but can be solved in polynomial time for trees and bipartite graphs. The 4-coloring problem (determine whether the nodes of a graph can be colored one of four different colors without assigning the same color to adjacent nodes) is NP-hard in general, but is immediately true for planar graphs (this is the famous four-color theorem).
Certain algorithms are easier on certain types of graphs. A matching in a graph is a collection of edges in the graph where no two edges share an endpoint. Maximum matchings can be used to represent ways of pairing people up into groups. In a bipartite graph, a maximum matching can be used to represent a way of assigning people to tasks such that no person is assigned two tasks and no task is assigned to two people. There are many fast algorithms for finding maximum matchings in bipartite graphs that work quickly and are easy to understand. The corresponding algorithms for general graphs are significantly more complicated and slightly less efficient.
Certain graphs are historically significant. Many named graphs are named after someone who used the graph to disprove a conjecture about properties of arbitrary graphs. The Petersen graph, for example, is a counterexample to many theorems that seem true about graphs but are actually not.
Certain graphs are useful in theoretical computer science. An expander graph is a graph where, intuitively, any collection of nodes must be connected to a proportionally larger collection of nodes in the graph. Not all graphs are expander graphs. Expander graphs are used in many results in theoretical computer science, such as one proof of the PCP theorem and in the proof that SL = L.
This is not an exhaustive list of why we care about different graph families, but hopefully it helps motivate their usage and study.
Hope this helps!

Related

what kind of algorithm is this (stable marriage variation)?

I have a set of objects (between 1 and roughly 500). Each object is compatible with certain (zero or more) other objects from that same set.
Can anyone give me some pointers as to how to determine the best way to create pairs of objects that are compatible with each other so that most of the objects in the set are paired?
You're looking for a maximum matching in a general graph. As opposed to the stable marriage problem with which you are familiar, in the maximum matching problem the input graph is not necessarily bipartite. There is no notion of stability (as vertices do not rank their compatible options) and what you're looking for is a subset of the edges of the graph such that no two edges share a common vertex (a.k.a., a matching). You're trying to construct that matching which contains the maximum possible number of edges.
Luckily, the problem of finding a maximum matching in a general graph can be solved in polynomial time using Edmond's matching algorithm (also known as the blossom algorithm because of how it contracts blossoms (odd cycles) into single vertices). The time complexity of Edmond's matching algorithm is O(E•V^2). While not very efficient, I believe this is good enough for the relatively small graphs you're dealing with. You don't even have to implement it from scratch by yourself as there's an open source Java implementation of Edmond's algorithm you can use. However, if you're interested in the state of the art, you can use the most efficient algorithm known for the problem which runs in O(E•sqrt(V)).
If the vertex compatibility of your input is not dichotomous (that is, each vertex has a ranking specifying its preferences among its neighbors), you can add corresponding weights to the edges to accommodate for the preference profile and use the variation of Edmond's algorithm for weighted graphs.

Efficient Algorithm for subgraph enumeration

I have searched related issues about subgraph enumeration. However, they didn't meet my requirement(*). (If I misunderstood something, please tell me.)
Is there an efficient algorithm or tools for the enumeration of all "connected, and unlabelled" subgraphs of a undirected parent graph.
In my case, the parent graph is an Internet topology so the amount of nodes could be large. And I would like to enumerate all of the connected unlabelled patterns (i.e. subgraphs) of the parent graph.
(*) I have searched Efficiently find all connected subgraphs and Subgraph enumeration but both of them were targeting on vertex-labelled induced and complete subgraphs respectively. But all I want is just the connected unlabelled subgraphs.
A topic name that might be helpful is "frequent subgraph mining", which is what it seems to be one name for this. There are various tools and algorithms in this area, although they may not do exactly what you want, of course.
As other point out in the answers to the two questions in your links, the number of subgraphs of large graphs can be very large. Assuming you actually want to list them, not just count them then it might take a long time.
Edit : OP has pointed out that the input here is ONE large graph, not a set of smaller ones, which will not work with standard graph mining
I still think the general approach can work here. The input set of graphs for mining is some subset of the subgraphs of your data graph. But that subgraph-set is what you want in the first place!
So lets say you pick a size of subgraph that you want (let's say 6 vertices) then you randomly pick starting vertices in your parent (the internet topology) and 'grow' these seeds, weeding out at each growth step those that don't match. Then repeat for different sizes of subgraph.
Of course, this is a probabilistic algorithm, but it could give you some idea.

Real world applications where spanning tree data structure is used

Does anyone of you know any real world applications where spanning tree data structure is used?
In networking, we use Minimum spanning tree algorithm often. So the problem is as stated here, given a graph with weighted edges, find a tree of edges with the minimum total weight that satisfies these three properties: connected, acyclic, and consisting of |V| - 1 edges. (In fact, any two of the three conditions imply the third condition.)
as an example,
For instance, if you have a large local area network with a lot of
switches, it might be useful to find a minimum spanning tree so that
only the minimum number of packets need to be relayed across the
network and multiple copies of the same packet don't arrive via
different paths (remember, any two nodes are connected via only a
single path in a spanning tree).
Other real-world problems include laying out electrical grids,
reportedly the original motivation for Boruvka's algorithm, one of the
first algorithms for finding minimum spanning trees. It shouldn't be
surprising that it would be better to find a minimum spanning tree
than just any old spanning tree; if one spanning tree on a network
would involve taking the most congested, slowest path, it's probably
not going to be ideal!
There are many other applications apart from the computer networks, i listed the references below:
Network design:
– telephone, electrical, hydraulic, TV cable, computer, road
The standard application is to a problem like phone network design. You have a business with several offices; you want to lease phone lines to connect them up with each other; and the phone company charges different amounts of money to connect different pairs of cities. You want a set of lines that connects all your offices with a minimum total cost. It should be a spanning tree, since if a network isn’t a tree you can always remove some edges and save money.
Approximation algorithms for NP-hard problems:
– traveling salesperson problem, Steiner tree
A less obvious application is that the minimum spanning tree can be used to approximately solve the traveling salesman problem. A convenient formal way of defining this problem is to find the shortest path that visits each point at least once.
Note that if you have a path visiting all points exactly once, it’s a special kind of tree. For instance in the example above, twelve of sixteen spanning trees are actually paths. If you have a path visiting some vertices more than once, you can always drop some edges to get a tree. So in general the MST weight is less than the TSP weight, because it’s a minimization over a strictly larger set.
On the other hand, if you draw a path tracing around the minimum spanning tree, you trace each edge twice and visit all points, so the TSP weight is less than twice the MST weight. Therefore this tour is within a factor of two of optimal.
Indirect applications:
– max bottleneck paths
– LDPC codes for error correction
– image registration with Renyi entropy
– learning salient features for real-time face verification
– reducing data storage in sequencing amino acids in a protein
– model locality of particle interactions in turbulent fluid flows
– autoconfig protocol for Ethernet bridging to avoid cycles in a network
Cluster analysis:
k clustering problem can be viewed as finding an MST and deleting the k-1 most
expensive edges.
you can read the details from here, and here, and for a demo check here please.

computing vertex connectivity of graph

Is there an algorithm that, when given a graph, computes the vertex connectivity of that graph (the minimum number of vertices to remove in order to separate the graph into two connected graphs). (Note that the graph may be already be disconnected). Thanks!
See:
Determining if a graph is K-vertex-connected
k-vertex connectivity of a graph
When you combine this with binary search you are done.
This book chapter should have everything you need to get started; it is basically a survey over algorithms to determine the edge connectivity and vertex connectivity of graphs, with pseudo code for the algorithms described in it. Page 12 has an overview over the available algorithms alongside complexity analyses and references. Most of the solutions for the general case are flow-based, with the exception of one randomized algorithm. The different general solutions optimize for different properties of the graph, so you can choose the most asymptotically efficient one beforehand. Also, for some classes of graphs, there exist specialized algorithms with better complexity than the general solutions provide.

Graph Isomorphism

Is there an algorithm or heuristics for graph isomorphism?
Corollary: A graph can be represented in different different drawings.
What s the best approach to find different drawing of a graph?
It is a hell of a problem.
In general, the basic idea is to simplify the graph into a canonical form, and then perform comparison of canonical forms. Spanning trees are generated with this objective, but spanning trees are not unique, so you need to have a canonical way to create them.
After you have canonical forms, you can perform isomorphism comparison (relatively) easy, but that's just the start, since non-isomorphic graphs can have the same spanning tree. (e.g. think about a spanning tree T and a single addition of an edge to it to create T'. These two graphs are not isomorph, but they have the same spanning tree).
Other techniques involve comparing descriptors (e.g. number of nodes, number of edges), which can produce false positive in general.
I suggest you to start with the wiki page about the graph isomorphism problem. I also have a book to suggest: "Graph Theory and its applications". It's a tome, but worth every page.
As from you corollary, every possible spatial distribution of a given graph's vertexes is an isomorph. So two isomorph graphs have the same topology and they are, in the end, the same graph, from the topological point of view. Another matter is, for example, to find those isomorph structures enjoying particular properties (e.g. with non crossing edges, if exists), and that depends on the properties you want.
One of the best algorithms out there for finding graph isomorphisms is VF2.
I've written a high-level overview of VF2 as applied to chemistry - where it is used extensively. The post touches on the differences between VF2 and Ullmann. There is also a from-scratch implementation of VF2 written in Java that might be helpful.
A very similar problem - graph automorphism - can be solved by saucy, which is available in source code. This finds all symmetries of a graph. If you have two graphs, join them into one and any isomorphism can be discovered as an automorphism of the join.
Disclaimer: I am one of co-authors of saucy.
There are algorithms to do this -- however, I have not had cause to seriously investigate them as of yet. I believe Donald Knuth is either writing or has written on this subject in his Art of Computing series during his second pass at (re)writing them.
As for a simple way to do something that might work in practice on small graphs, I would recommend counting degrees, then for each vertex, also note the set of degrees for those vertexes that are adjacent. This will then give you a set of potential vertex isomorphisms for each point. Then just try all those (via brute force, but choosing the vertexes in increasing order of potential vertex isomorphism sets) from this restricted set. Intuitively, most graph isomorphism can be practically computed this way, though clearly there would be degenerate cases that might take a long time.
I recently came across the following paper : http://arxiv.org/abs/0711.2010
This paper proposes "A Polynomial Time Algorithm for Graph Isomorphism"
My project - Griso - at sf.net: http://sourceforge.net/projects/griso/ with this description:
Griso is a graph isomorphism testing utility written in C++. It is based on my own POLYNOMIAL-TIME (in this point the salt of the project) algorithm. See Griso's sample input/output on http://funkybee.narod.ru/graphs.htm page.
nauty and Traces
nauty and Traces are programs for computing automorphism groups of graphs and digraphs [*]. They can also produce a canonical label. They are written in a portable subset of C, and run on a considerable number of different systems.
AutGroupGraph command in GRAPE's package of GAP.
bliss: another symmetry and canonical labeling program.
conauto: a graph ismorphism package.
As for heuristics: i've been fantasising about a modified Ullmann's algorithm, where you don't only use breadth first search but mix it with depth first search the way, that first you use breadth first search intensively, than you set a limit for breadth analysis and go deeper after checking a few neighbours, and you lower the breadh every step at some amount. This is practically how i find my way on a map: first locate myself with breadth first search, then search the route with depth first search - largely, and this is the best evolution of my brain has ever invented. :) On the long term some intelligence may be added for increasing breadth first search neighbour count at critical vertexes - for example where there are a large number of neighbouring vertexes with the same edge count. Like checking your actual route sometimes with the car (without a gps).
I've found out that the algorithm belongs in the category of k-dimension Weisfeiler-Lehman algorithms, and it fails with regular graphs. For more here:
http://dabacon.org/pontiff/?p=4148
Original post follows:
I've worked on the problem to find isomorphic graphs in a database of graphs (containing chemical compositions).
In brief, the algorithm creates a hash of a graph using the power iteration method. There might be false positive hash collisions but the probability of that is exceedingly small (i didn't had any such collisions with tens of thousands of graphs).
The way the algorithm works is this:
Do N (where N is the radius of the graph) iterations. On each iteration and for each node:
Sort the hashes (from the previous step) of the node's neighbors
Hash the concatenated sorted hashes
Replace node's hash with newly computed hash
On the first step, a node's hash is affected by the direct neighbors of it. On the second step, a node's hash is affected by the neighborhood 2-hops away from it. On the Nth step a node's hash will be affected by the neighborhood N-hops around it. So you only need to continue running the Powerhash for N = graph_radius steps. In the end, the graph center node's hash will have been affected by the whole graph.
To produce the final hash, sort the final step's node hashes and concatenate them together. After that, you can compare the final hashes to find if two graphs are isomorphic. If you have labels, then add them (on the first step) in the internal hashes that you calculate for each node.
There is more background here:
https://plus.google.com/114866592715069940152/posts/fmBFhjhQcZF
You can find the source code of it here:
https://github.com/madgik/madis/blob/master/src/functions/aggregate/graph.py

Resources