How to cluster similar item together in a 2d Graph - algorithm

well I am not looking for how to draw items on 2d Graph, Its just a pictorial representation of what the expected output need to be
I have a list like
a=[]
b=['c','d','e']
c=['a','b','d']
d=['a']
e=['b','a']
l=['g','r','p']
g=['r']
r=['g']
p=['l']
now from above it is clear that b is pointing to c , d ,e
a,b,c,d are closely linked , while the l,g,r,p are linked
can any one tell me an algo ( keeping 2d Picture in mind) how these similar items can be reprsented together.
Above is just an example.
The list will be dynamically created

Have you come across Graphviz? It has algorithms for various different forms of graph layout which I imagine would do a nice job of laying out your small example above. It also includes some simple GUIs to allow you to experiment with the different layouts it supports.
Edit: in response to some clarifications:
If you need to find dense subgraphs within your graph, even if it is fully connected, then you are looking for algorithms that find communities in networks. An example of a recently-developed algorithm doing such on large graphs (2 million+ nodes, representing a social network) efficiently can be found in this paper.

Just to extend Alex's answer, here is an example of graphviz use for your graph:
graph.dot:
digraph G
{
b -> c;
b -> d;
b -> e;
c -> a;
c -> b;
c -> d;
d -> a;
e -> b;
e -> a;
l -> g;
l -> r;
l -> p;
g -> r;
r -> g;
p -> l;
}
Output of Graphviz:
If you just want to know what are the clusters in your graph without drawing it, just use this algorithm.

Related

What is a data structure suited for representing railways with turnouts?

I'm trying to represent the paths in a railroad as a data structure but I am having a hard time representing turnouts.
This feels like a graph problem, but there is a difference compared to regular graphs.
A railway turnout is a vertex connected to three other vertices. A, B and C.
But, in a railway system the graph is traversed with a direction.
So, you are able to take the path B -> turnout -> A and C -> turnout -> A, but are not able to take the path B -> turnout -> C.
Is there a (graph) data structure which allows for representing paths with directions?
This data structure would provide the base for a software system to automate a small model railroad.
You can represent the turnout as 2 vertices - one for each state of the turnout. So if you have source A and destinations B and C and turnout which can switch between B and C - you will have 2 vertices for this turnout: TB and TC. Also you will have following edges: A->TB, TB->B, A->TC, TC->C
This allows you to travel from A -> TB -> B and from A -> TC -> C. And since you will have no edge between TB and TC - you will not be able to travel from B -> C directly
Each path can be considered as a vertice and a connection between two paths as an edge.
B ->
A
C ->
This can be represented as a graph in a Go map,
Take a look at the following,
In your example directional connection exists from B -> A and C -> A. This can be represented in a map as follows.
graph := map[string][]string{
"B": []string{"A"},
"C": []string{"A"},
}
Each key in the map represents the starting path of a directional connection. Each value in the array of the corresponding key is the destination path.

Change edge placement from beneath to above nodes in Graphviz

It took me some time to make the graph below look like it does right now, and I'm almost satisfied. The one thing that still bothers me is that the connection between D and B should be above all nodes for the sake of aesthetics.
The funny thing is, that supplying the ports for the edge doesn't impress dot which just makes the edge cross the connected nodes.
Do you have an idea on how to avoid this?
digraph {
graph [splines=ortho, nodesep=0.2, fontname="DejaVu Sans", rankdir=LR]
node [shape=box, fontsize=8]
edge [arrowsize=0.5]
subgraph cluster {
style=invis;
A -> B -> C;
A -> B -> C;
A -> B -> C -> D;
D -> E;
D:nw -> B:ne;
}
{
D -> F -> { C; E };
}
}
PS: You need the latest Graphviz version in order to get orthogonal edges.
It may be a function of the version of the engine you use. I'm not sure what version of dot the GraphViz Workspace http://graphviz-dev.appspot.com/ uses but it does run your problem connector across the top.

How can I give a graph nodes fixed position in graphviz and how can i make the edges not overlapped?

I have seen some similar questions here, but the answers dont solve my problem.
I want to draw a graph. I write some code like this:
digraph {
{rank = same a b c d e f }
a -> b -> c -> d -> e -> f
a -> f
b -> d -> f
b -> f
}
but the result is that some of the edges overlapped each other.
So my question is how can I fix the edge to make it not overlap
and I also wanna know how can I give the node a fixed position? There is no problem this graph. But some times when I wanna a graph with a sequence of
a b c d e f
but when i create some edges and the sequence will change like:
a->e b c d f
You can use the attribute pos of a node or edge to specify coordinates. To see where dot places your nodes and edges you can simply run dot myinputfile.dot without any output parameter. This will produce the dot file with added coordinates (among other additions).
Based on this you can force dot to place some or all nodes at certain coordinates.

How can I reverse the direction of every edge in a Graphviz (dot language) graph?

I have a directed graph specified in Graphviz's dot language, e.g.
digraph G { A -> B [label="foo"]; A -> B [label="bar"]; B -> A; C; }
I want to automatically process this into a graph with its edges reversed, i.e.
digraph G { B -> A [label="foo"]; B -> A [label="bar"]; A -> B; C; }
I would like to use a robust solution (i.e. one that understands the graph and therefore probably doesn't use sed) that preserves any existing edge labels and other attributes. Note that I am not merely talking about getting dot to render my graph with the arrows pointing backward; I really need a graph whose edges are reversed. (In this case, I intend to reverse the edges, apply prune, then reverse the edges again.)
How can I reverse the direction of every edge in a Graphviz (dot-language) graph?
Easiest way is to include a graph-level dir statement where you reverse the direction of the arrows. By default, the direction is forward. If you reverse it at the top of your graph, then without changing a single other line, the graph will show up the way you want.
What you have now is this:
digraph G
{
edge [dir="forward"]; /* implied */
A -> B [label="foo"];
A -> B [label="bar"];
B -> A;
C;
}
What you want is this:
digraph G
{
edge [dir="back"]; /* note the change to this line */
A -> B [label="foo"];
A -> B [label="bar"];
B -> A;
C;
}
The best I've come up with so far is
BEG_G {
graph_t g = graph($.name + " reversed", "D");
int edge_id = 0;
}
N {
clone(g, $);
}
E {
node_t newHead = clone(g, $.head);
node_t newTail = clone(g, $.tail);
edge_t newEdge = edge_sg(g, newHead, newTail, edge_id);
copyA($, newEdge);
edge_id++;
}
END_G {
$O = g;
}
which I then invoke with gvpr.
This does add a "key" attribute to all resultant edges, but I'm not sure how to avoid that and still preserve multiple edges between the same pair of nodes.
When I do echo 'digraph G { A -> B [label="foo"]; A -> B [label="bar"]; B -> A; C; }' | gvpr -f reverseAllEdges.gvpr, I get:
digraph "G reversed" {
A -> B [key=2];
B -> A [key=0, label=foo];
B -> A [key=1, label=bar];
C;
}
I don't know how robust this will prove to be, but it looks promising.
The Python library NetworkX has a directed multigraph type, MultiDiGraph, which has a reverse() method. It also uses pydot for loading and writing DOT files.

Why does Graphviz no longer minimise edge lengths when subgraphs are introduced

I have this Graphviz graph:
digraph
{
rankdir="LR";
overlap = true;
Node[shape=record, height="0.4", width="0.4"];
Edge[dir=none];
A B C D E F G H I
A -> B -> C
D -> E -> F
G -> H -> I
Edge[constraint=false]
A -> D -> G
subgraph clusterX
{
A
B
}
subgraph clusterY
{
E
H
F
I
}
}
which produces this output:
I would have expected the length of the edge between A and D to be minimised so that the nodes would be arranged as:
A B C
D E F
G H I
rather than
D E F
G H I
A B C
This works as expected if I remove the subgraph definitions.
Why does Graphviz place A B C at the bottom when the subgraphs are introduced?
This is not really about minimizing edge lengths, especially since in the example the edges are defined with the attribute constraint=false.
While this is not a complete answer, I think it can be found somewhere within the following two points:
The order of appearance of nodes in the graph is important.
Changing rankdir to LR contains unpredictable (or at least difficult to predict) behaviour, and/or probably still a bug or two (search rankdir).
I'll try to explain as good as I can and understand graphviz, but you may want to go ahead and read right away this reply of Emden R. Gansner on the graphviz mailing list as well as the following answer of Stephen North - they ought to know, so I will cite some of it...
Why is the order of appearance of nodes important? By default, in a top-down graph, first mentioned nodes will appear on the left of the following nodes unless edges and constraints result in a better layout.
Therefore, without clusters and rankdir=LR, the graphs appears like this (no surprises):
A D G
B E H
C F I
So far, so good. But what happens when rankdir=LR is applied?
ERG wrote:
Dot handles rankdir=LR by a normal TB layout and then rotating the
layout counterclockwise by 90 degrees (and then, of course, handling
node rotation, edge direction, etc.). Thus, subgraph one is
positioned to the left of subgraph two in the TB layout as you would
expect, and then ends up lower than it after rotation. If you want
subgraph one to be on top, list it second in the graph.
So if that would be correct, without clusters, the nodes should appear like this:
G H I
D E F
A B C
In reality, they do appear like this:
A B C
D E F
G H I
Why? Stephen North replied:
At some point we decided that top-to-bottom should be the default,
even if the graph is rotated, so there's code that flips the flat
edges internally.
So, the graph is layed out TB, rotated counterclock wise and flat edges flipped:
A D G G H I A B C
B E H --> D E F --> D E F
C F I A B C G H I
While this works quite well for simple graphs, it seems that when clusters are involved, things are a little different. Usually edges are also flipped within clusters (as in clusterY), but there are cases where the flat edge flipping does not work as one would think. Your example is one of those cases.
Why is the error or limitation in the flipping of those edges? Because the same graphs usually display correctly when using rankdir=TB.
Fortunately, workarounds are often easy - for example, you may use the order of appearance of the nodes to influence the layout:
digraph
{
rankdir="LR";
node[shape=record, height="0.4", width="0.4"];
edge[dir=none];
E; // E is first node to appear
A -> B -> C;
D -> E -> F;
G -> H -> I;
edge[constraint=false]
A -> D -> G;
subgraph clusterX { A; B; }
subgraph clusterY { E; F; H; I; }
}

Resources