How to identify loosely-connected components of a graph - algorithm

Imagine that a graph has two relatively densely connected components that are only connected to each other by relatively few edges. How can I identify the components? I don't know the proper terminology, but the intuition is that a relatively densely connected subgraph is hanging onto another subgraph by a few threads. I want to identify these clumps that are only loosely connected to the rest of the graph.

If your graph represents a real-world system, this task is called community detection. You'll find many articles about that, starting with Fortunato's review (2010). He describes, amongst others, the min-cut based methods mentioned in the earlier answers.
There're also many posts on SO, such as :
Are there implementations of algorithms for community detection in graphs?
What are the differences between community detection algorithms in igraph?
Community detection in Networkx
People in Cross Validated also talk about community detection :
How to do community detection in a weighted social network/graph?
How to find communities?
Finally, there's a proposal in Area 51 for a new Network Science site, which would be more directly related to this problem.

You probably want sparsest cut instead of min cut -- unless you can identify several nodes in a component, min cuts have a tendency to be very unbalanced when the degrees are small. One of the common approaches to sparsest cut is to compute an eigenvector of the graph's Laplacian and threshold it.

The answer might be somewhat general, but you could could try to model your problem as a flow problem and generate a minimum cut; see here. The edges could be bidirectional with capacity 1, and a resulting cut would maybe yield the desired partition?

Related

What are the applications of Widest path problem?

I want to know some more applications of the widest path problem. CLICK!
It seems like something that can be used in a multitude of places, but I couldn't get anything constructive from searching on the internet.
Can someone please share as to where else this might be used?
thanks in advance.
(what I searched for included uses in p2p networks and CDN, but I couldn't find exactly how it is used / the papers were too long for me to scout.)
The widest path problem has a variety of applications in areas such as network routing problems, digital compositing and voting theory. Some specific applications include:
Finding the route with maximum transmission speed between two nodes.
This comes almost directly from the widest-path problem definition. We want to find the path between two nodes which maximizes the minimum-weight edge in the path.
Computing the strongest path strengths in Schulze’s method.
Schulze's method is a system in voting theory for finding a single winner among multiple candidates. Each voter provides an ordered preference list. We then construct a weighted graph where vertices represents candidates and the weight of an edge (u, v) represents the number of voters who prefer candidate u over candidate v. Next, we want to find the strength of the strongest path between each pair of candidates. This is the part of Schulze's method that can be solved using the widest-path problem. We simply run an algorithm to solve the widest-path problem for each pair of vertices.
Mosaicking of digital photographic maps. This is a technique for merging two maps into a single bigger map. The challenge is the the two original photos might have different light intensity, colors, etc. One way to do mosaicking is to produce seams where each pixel in the resulting picture is represented entirely by one photo or the other. We want the seam to appear invisible in the final product. The problem of finding the optimal seam can be modeled as a widest-path problem. Details for the modeling are found in the original paper
Metabolic path analysis for living organisms. The objective of this type of analysis is identify critical reactions in living organisms. A network is constructed based on the stoichiometry of the reactions. We wish to find the path which is energetically favored in the production of a particular metabolite, ie the path where the bottleneck between two vertices is the smallest. This corresponds to the widest-path problem.

one-pass force-directed graph drawing algorithm

I'm looking for a one-pass algorithm (or ideas of how to write it myself) that can calculate the two or three dimensional coordinates for a directed, unweighted graph.
The only metadata the vertices have are title and category.
I need to implement this algorithm in a way that vertices can be added/removed without recalculating the entire graph structure.
This algorithm has to be applied to a large (5gb) dataset which is constantly changing.
My Google skills have led me to n-pass algorithms which are not what what I am looking for.
I guess your question might still be an open issue. I know a research project called Tulip (http://tulip.labri.fr/TulipDrupal/) which is a (large-scale) graph viewer. A paper on the method is available at http://dept-info.labri.fr/~auber/documents/publi/auberChapterTulipGDSBook.pdf and surely you can find more algorithms browsing the personal web page of D. Auber and his colleagues.
There is a related question here:
https://cstheory.stackexchange.com/questions/11889/an-algorithm-to-efficiently-draw-a-extremely-large-graph-in-real-time
The top answer has a number of papers that might be of interest. I think one of the keys to the problem is to try and recompute the location of a reduced amount of nodes in your graph.

Edge crossing reduction in graph

Are there any algorithms to minimize edge crossings in a graph? For example if I have a transition matrix of the graph.
I found methods like trying to place the nodes around the other node, but I'd like to know some other ideas.
There's a range of well established algorithms/libraries that have been developed for graph drawing applications, you can get a bit of background here.
To draw undirected graphs a popular choice is the force-based layout algorithm, in which graph edges are treated as springs (attractive forces) while the vertices are treated like charged particles (applying repulsive forces). The algorithm works by updating the vertex positions based on these forces until a steady-state is reached. You can read more about force based methods here. Since these algorithms search for an equilibrium solution they often result in pseudo-optimal layouts, without much edge tangling.
You might be interested in making use of one of the many graph drawing libraries that are available. The Graphviz package is generally pretty good and supports a number of different algorithms for different graph drawing applications.

What are strongly connected components used for?

I have found several algorithms that explain how to find strongly connected components in a directed graph, but none explain why you would want to do this. What are some applications of strongly connected components?
You should check out Tim Roughgarden's Introduction to Algorithms course on Coursera. For every algorithm he goes over, he explains some applications of it. Very useful, and makes one see the value of studying algorithms!
The use of strongly connected components that I remember him saying is that one could use it to find groups of people who are more closely related in a huge set of data. Think of facebook and how they recommend people that might be your friends...
This could also be used to see chunks of a population. Say, "Wow, this huge component all has the hobby of walking backwards and likes eating moldy pizza!," it could show correlation. Advertisers for moldy pizza would use this data to target people who like walking backwards. Who knows!
One example is in model checking:
Finding strongly connected component is done in explicit model checking in formal verification.
In model checking - we have a state machine, which represents the models of our software/hardware, and we try to prove temporal logic1 formulas on it.
For example: The formula EG(p) means: there is a path in the graph, where for each state - the logic formula p yields true.
The algorithm for proving if EG(p) is true on a graph (model) is finding the maximal strongly connected components (SCC), and then checking paths leading to it in the graph.
Note that model checking is applied widely in the industry - especially for proving correctness of hardware components.
(1) The importance of temporal logic to computer science is great, and its inventor Amir Pnueli recieved a turing award for it!
Another application can be found in vehicle routing applications. A road network can be modeled as a directed graph, with vertices being intersections, and arcs being directed road segments or individual lanes. If the graph isn't Strongly Connected, then vehicles can get trapped in a certain part of the graph (i.e they can get in, but not get out).
In many of these vehicle routing applications, you want to generate routes for a specific area (for instance a routing problem within a city). Prior to generate routes, you would have to extract street data, for instance from Google Maps, Here maps, or Open Street maps. These maps do not just cover the area you are interested in, but they cover the entire world. Consequently you end up taking a snapshot of the area of interest, for instance by computing the induced subgraph of all intersections who's geographical coordinates lay within the area of interest. The resulting subgraph is not necessarily strongly connected (for instance, you can have a road that enters and leaves the area, but is not connected to any other road inside the area). One would then preprocess the subgraph by enumerating all strongly connected components, and throwing away all but the largest component.

Fastest Algorithm For Graph Planarization

I'm using Processing to develop a navigation system for complex data and processes. As part of that I have gotten into graph layout pretty deeply. It's all fun and my opinions on layout algorithms are : force-directed is for sissies (just look at it scale...haha), eigenvector projection is cool, Sugiyama layers look good but fail fast on graphy graphs, and although I have relied on eigenvectors thus far, I need to minimize edge crossings to really get to the point of the data. I know, I know NP-complete etc.
I should add that I have some good success from applying xy boxing and using Sugiyama-like permutation to reduce edge crossings across rows and columns. Viz: graph (|V|=90,avg degree log|V|) can go from 11000 crossings -> 1500 (by boxed eigenvectors) -> 300 by alternating row and column permutations to reduce crossings.
But the local minima... whatever it is sticks around this mark, and the result is not as clear as it could be. My research into the lit suggests to me that I really want to use the planarization algorithm like what they do use for VLSI:
Use BFS or something to make the maximal planar subgraph
1.a. Layout the planar subgraph nice-like
Cleverly add outstanding edges to recover the original graph
Please reply with your thoughts on the fastest planarization algorithm, you are welcome to go into some depth about any specific optimizations you have had familiarity with.
Thanks so much!
Given that all graphs are not planar (which you know), it might be better to go for a "next-best" approach rather than a "always provides best answer" approach.
I say this because my roommate in grad school had the same problem you did. He was trying to convert a graph to planar form and all the algorithms that guarantee minimum edge crossings were too slow. What he ended up doing what just implementing a randomized algorithm. Basically, lay out the graph, then jigger nodes that have edges with lots of crossings, and eventually you'll handle the worst clumps of edges.
The advantages were: you can cut it out after a specific amount of time, so if you're time limited this always comes in under a certain amount of time, if you get a crappy graph you can just run it again (over the already laid out graph) to get something better, and it's relatively easy to code up.
Disadvantage is you don't always get the global minima in crossings, and if the graph gets stuck in high crossing areas, his algorithm would sometimes shoot a node way off into the distance to try and resolve it, which sometimes causes really weird looking graphs.
You know already lots of graph layout but I believe your question is still, unfortunately, underspecified.
You can planarize any graph really quickly by doing the following: (1) randomly layout the graph on a plane (2) calculate where edges cross (3) create pseudo-vertices at the crossings (this is what you anyway do when you use planarity based layout for non-planar graphs) (4) the graph extended with the new vertices and split edges is automatically planar.
The first difficulty comes from having an algorithm that calculates a combinatorial embedding that minimizes the number of edge crossings. The second difficulty is using that combinatorial embedding to do layout in Euclidean plane that is visually appealing, e.g. for orthogonal graph layout you might want to minimize the number of bends, maximize the size of faces, minimize the area of the graph as a whole, etc., and these goals might conflict each other.

Resources