Is it possible to visualize the Strongly Connected Components of directed graph in Gephi? - graph-visualization

I want to visualize the strongly connected components of directed graph in Gephi. I can get the no. of SCC in the graph but can't find a way to visualize it. I have used "Force Atlas 2" to layout the graph of nearly 6000 nodes(~20000 edges), but what i get from the visualized graph is just the "out-degree" edges of nodes. Can someone help me how to visualize the strongly connected components either in gephi or by some other means.
Thanks a lot!

I'm not sure I understand what you want, but maybe this can help you.
In the statistics panel, execute DBSCAN.
Then get back to the appearence panel and select node size --> ranking --> number of clusters
This "number of clusters" will appear only after "DBSCAN".
In general, many metrics for "ordering" the graphs, appear only after executing the related statistic, which, in my opinion, while confusing initially, is a good approach, since this way, you get a minimum number of possibilities which, you understand what they mean.

Related

Removing edges in a graph to create disconnected components

Say, I have a graph that is bipartite and connected, i.e., it doesn't contain disconnected components to start with. I need to remove a few edges (no specific number) to make the graph disconnected. Google searches for an algorithm for the same only show results for min k-cut algorithms. In my case, "k" is not known in advance and as such "k" is not an input the algorithm. Meaning, I don't really have a specific number of edges that need to be removed. Any number of edges will do, as long as its minimum. What is an efficient algorithm for this problem? Again, google is not helpful as there seem to be tons of such algorithms. Can someone make some recommendations?

Efficient Algorithm for subgraph enumeration

I have searched related issues about subgraph enumeration. However, they didn't meet my requirement(*). (If I misunderstood something, please tell me.)
Is there an efficient algorithm or tools for the enumeration of all "connected, and unlabelled" subgraphs of a undirected parent graph.
In my case, the parent graph is an Internet topology so the amount of nodes could be large. And I would like to enumerate all of the connected unlabelled patterns (i.e. subgraphs) of the parent graph.
(*) I have searched Efficiently find all connected subgraphs and Subgraph enumeration but both of them were targeting on vertex-labelled induced and complete subgraphs respectively. But all I want is just the connected unlabelled subgraphs.
A topic name that might be helpful is "frequent subgraph mining", which is what it seems to be one name for this. There are various tools and algorithms in this area, although they may not do exactly what you want, of course.
As other point out in the answers to the two questions in your links, the number of subgraphs of large graphs can be very large. Assuming you actually want to list them, not just count them then it might take a long time.
Edit : OP has pointed out that the input here is ONE large graph, not a set of smaller ones, which will not work with standard graph mining
I still think the general approach can work here. The input set of graphs for mining is some subset of the subgraphs of your data graph. But that subgraph-set is what you want in the first place!
So lets say you pick a size of subgraph that you want (let's say 6 vertices) then you randomly pick starting vertices in your parent (the internet topology) and 'grow' these seeds, weeding out at each growth step those that don't match. Then repeat for different sizes of subgraph.
Of course, this is a probabilistic algorithm, but it could give you some idea.

Finding a minimum/maximum weight Steiner tree

I asked this question on reddit, but haven't converged on a solution yet. Since many of my searches bring me to Stack Overflow, I decided I would give this a try. Here is a simple formulation of my problem:
Given a weighted undirected graph G(V,E,w) and a subset of vertices S in G, find the min/max weight tree that spans S. Adding vertices is not allowed. An extension of the basic model is adding edges with 0 weight, and vertices that must be excluded. This seems similar to the question asked here:
Algorithm to find minimum spanning tree of chosen vertices
There is also more insight into what values the edges can take. Each edge is actually a correlation probability, which I can encode in several ways, so the main questions I want to ask the graph are:
Given k vertices that must be connected, what are the top X min/max spanning trees that connect them, and what vertices do they pass through? As I understand it, this is the same question as asking the graph what is the highest probability of connecting all of the k vertices.
Getting more vague, is there a logical way to cluster the nodes?
As for implementation, I have the boost libraries installed, and once I get the framework rolling on this problem, I can deal with how to multi-thread it (if appropriate), what kind of graph to use, and how to store/cache the data, since the number of vertices and edges is going to be quite large.
Update
Looking at the problem I am trying to solve, it makes sense that it would be NP-complete. The real world problem that I am trying to solve involves medical diagnoses; specifically when the medical community is working on a problem with a specific idea in mind, and they need to take a step back and reconsider how they got there. What I want from the program I am trying to design is:
Given several conditions, tests, symptoms, age, gender, season, confirmed diagnosis, timeline, how can you relate them? What cells/tissues/organs/systems are touched? Are they even related?
Along with the defined groups that conditions/symptoms can belong to, is there a way to logically group the conditions/symptoms?
Example
Flu-like symptoms, red eyes, early pneumonia, and some of the signs of diabetes. Is there a way to relate all of the symptoms? Are there some tests that could be done to make it easier to determine? What systems are involved?
It just seemed natural to try and map this to a graph, or several graphs, and use probabilities as the correlation between different symptoms/conditions.
I have seen models for your problem that were mostly based on Bayesian inference and fuzzy logic. Bayesian inference networks express the relation between causes and effects e.g. smoking and lung cancer. Look here for a quick tutorial. You can apply fuzzy logic to that modelling to try to take into account the variablility in real life (as not everyone gets lung cancer).

What is a good measure of strength of a link and influence of a node?

In the context of social networks, what is a good measure of strength of a link between two nodes? I am currently thinking that the following should give me what I want:
For two nodes A and B:
Strength(A,B) = (neighbors(A) intersection neighbors(B))/neighbors(A)
where neighbors(X) gives the total number of nodes directly connected to X and the intersection operation above gives the number of nodes that are connected to both A and B.
Of course, Strength(A,B) != Strength(B,A).
Now knowing this, is there a good way to determine the influence of a node? I was initially using the Degree Centrality of a node to determine its "influence" but I somehow think its not a good idea because just because a node has a lot of outgoing links does not mean anything. Those links should be powerful as well. In that case, maybe using an aggregate of the strengths of each node connected to this node is a good idea to estimate its influence? Am I in the right direction? Does anyone have any suggestions?
My Philosophy (and understanding of the terms):
Strength indicates how far A is
willing to do what B has already done
Influence indicates how far A can make B do something (persuasion perhaps?)
Constraints:
Access to only a subgraph. I mean, I am trying to be realistic here because social networks are huge and having a complete view is not so practical.
you might want to check out some more sophisticated notions of distance.
A really cool one is "resistance distance", which lets you view distance as how likely a random path from one node will lead you to another
there are several days of lecture notes plus references to further reading at http://www.cs.yale.edu/homes/spielman/462/.
Few thoughts on this:
When you talk about influence of a node in a graph one centrality measurement that comes to mind it closeness centrality. Closeness centrality looks at the number of shortest paths in a graph the node is on. From an influence point of view, the node that is on the most shortest paths is the node that can share information the easiest, ie its nearer to more nodes than any other.
You also mention using the strengths of each node connected to a node. Maybe you should look at eigenvector centrality which ranks a node highly if its connected to other high degree nodes. This is an undirected version of PageRank.
Some questions that might affect you choice here are:
Is you graph directed?
Do you edges have weight? You mention strength... do you mean weights of some kind?
If you do have weights maybe the next step from a simple degree centrality would be to try a weighted degree centrality approach. Thus, just having a high number of connections doesn't automatically make you the most influential.

Algorithm to computer the optimal layout of n-ary tree?

I am looking for an algorithm that will automatically arrange all the nodes in an n-tree so that no nodes overlap, and not too much space is wasted. The user will be able to add nodes at runtime and the tree must auto arrange itself. Also note it is possible that the tree's could get fairly large ( a few thousand nodes ).
The algorithm has to work in real time, meaning the user cannot notice any pausing.
I have tried Google but I haven't found any substantial resources, any help is appreciated!
I took a look at this problem a while back and decided ultimately to change my goals from a Directed acyclic graph (DAG) to a general graph only due to complexities of what I encountered.
That being said, have you looked at the Sugiyama algorithm for graph layout?
If you're not looking to roll your own, I came across yFiles that did the job quite nicely (a bit on the pricy side though, so I did end up doing exactly that - rolling my own).

Resources