Venn Diagram Drawing Algorithms - algorithm

Someone asked about overlapping subclusters in GraphViz and got the following response:
Sorry, no. General subgraphs can share nodes without implying subset
containment but not clusters. The problem is in the drawing.
If clusters can overlap arbitrarily, drawing them becomes the problem
of drawing Venn diagrams, for which there are no good algorithms.
What is a formal definition or example of the "problem of drawing Venn diagrams"?, and why is it (I assume NP-complete/hard) hard ? (Extra points: Sketch a reduction to a well-known NP-complete problem)

You have N points and a binary relation R on them, and you need to represent the relation graphically so that every node is represented by a circle on Euclidean plane so that two circles overlap if and only if for the corresponding nodes n and n' it holds that n R n'.

Instead of a Venn diagram as such, we can in many cases use GraphViz for the same purpose using the dual graph, which is the Boolean lattice of the intersections of the sets. Each node represents a unique choice of sets to include and sets to exclude. Nodes that differ only by the inclusion/exclusion of a single set are connected.
For increasing numbers of sets, of course there are in general many, many nodes and edges. But in many practical settings there will be many sets that do not intersect at all, so that those intersection nodes and any edges from them to other nodes may be omitted. By this method the number of nodes and edges may be reduced to a practical number.
When laying out the resulting graph it may be best to select the GraphViz algorithm "neato" and to ask to avoid overlapping the nodes. One way to make those settings is by writing, inside the opening curly brace for the graph, layout=neato,overlap=prism; .

Related

Rearranging a graph so that certain nodes are not adjacent?

EDIT: Precisely, I am trying to find two disjoint independent sets of known size in a graph shaped like a triangular grid, which may have holes and has a variable perimeter shape.
I'm not very well versed in graph theory, so I'm not sure if there exists an efficient solution for this problem. Consider the following graphs:
The colors of any two nodes can be swapped. The goal is to ensure that no two red nodes are adjacent, and no two green nodes are adjacent. The edges marked with exclamation points are invalid. Basically, I need to write two algorithms:
Determine that the nodes in a given graph can be arranged so that red and green nodes are not adjacent to nodes of the same color.
Actually rearrange the nodes.
I'm a little lost on how to implement this. It's not too difficult to separate the nodes of one color, but repeating the process for the second color may mess up the first color. Without a way to determine whether the graph can actually be arranged properly, this process could loop forever.
Is there some kind of algorithm that I can use/write for this? I'm mainly interested in the first image's graph (a triangular grid), but a generic algorithm would work as well.
First, let's note that the problem is a variant of graph coloring.
Now, if you only dealing with 2 colors (red,green) - coloring a graph with 2 colors is fairly easy, and is basically done by finding out if the graph is bipartite, and coloring each "side" of the graph in one color. Finding if a graph is bipartite is fairly simple.
However, if you want more than two colors, the problem becomes NP-Complete, and is actually a variant of the Graph Coloring Problem.
Graph Coloring Problem:
Given a graph G=(V,E) and a number k determine if there is a
function c:V->{1,2.,,,.k} such that c(v) = v(u) -> (v,u) is not an
edge.
Informally, you can color the graph in k colors, and you need to determine if there is some coloring such that you never color 2 nodes that share an edge with the same color.
Note that while it seems your problem is slightly easier, since you already know what is the number of nodes in each color, it doesn't really make a difference.
Assume you have a polynomial time algorithm A that solves your problem.
Now, given an instance (G,k) of graph coloring - there are only O(n^3) possibilities to #color1,#color2,#color3 - so by examining each of these and invoking A on it, you can find a polynomial time solution to Graph-Coloring. This will mean P=NP, which is most likely (according to most CS researchers) not the case.
tl;dr:
For 2 colors: find out if the graph is bipartite - and give one color to each side of the graph.
For 3 or more colors: There is no known efficient solution, and the general belief is one does not exist.
I thought this problem would be easier for planar graph, but unfortunately it's not the case. Best match for this problem I was able to find is minimum sum coloring and largest bipartite subgraph.
For largest bipartite subgraph, assume that number of reds + number of greens exactly match the size of largest bipartite subgraph. Then this problem is equivalent to your. Paper claims that it's still NP-hard even for planar graphs.
For minimum sum coloring, assume that red color has weight 1, green color has color 2, and we have infinitely many blue* colors with some weight of >graph size. Then if answer is exactly minimal sum coloring, there is no polynomial algorithm to find it (although paper referes to such algorithm for chordal graphs).
Anyway, it seems that the closer your red+green count to the 'optimal' in some sense subgraph, the more difficult problem is.
If you can afford inexact solution, or relaxed solution then you only spearate, say, reds, you have an option. As I said in comment, approximate solution of maximum independent set problem for planar graph. Then color this set into red and blue colors, if it is large enough.
If you know that red+green is much less than total number of vertices, another approximation can work. Look at introduction chapter of this article. It claims that:
For graphs which are promised to have small chromatic number, a better
guarantee is known: given a k-colorable graph, an algorithm due to
Karger et al. [12] uses semidefinite programming (SDP) to find an
independent set of size about Ω(n/∆^(1−2/k)).
Your graph is for sure 4-colorable, so you can count on large enough independent set. The same article states that greedy solution already can find large enough independent set.

Introduction to algorithms A creative approach Exercise 5.25

Below is the exercise 5.25 in 《Introduction to algorithms, a creative approach》. After reading it several times, I still can't understand what it means. I can color a tree with 2 colors very easily and directly using the method it described, not 1+LogN colors.
《Begin》
This exercise is related to the wrong algorithm for determining whether a graph is bipartite, described in Section 5.11.In some sense, this exercise shows that not only is the algorithm wrong, but also the simple approach can not work. Consider the more general problem of graph coloring: Given an undirected graph G=(V,E), a valid coloring of G is an assignment of colors to the vertices such that no two adjacent vertices have the same color. The problem is to find a valid coloring, using as few colors as possible. (In general, this is a very difficult problem; it is discussed in Chapter 11.)
Thus, a graph is bipartite if it can be colored with two colors.
A. Prove by induction that trees are always bipartite.
B. We assume that the graph is a tree(which means that the graph is bipartite). We want to find a partition of the vertices into the two subsets such that there are no edges connecting vertices within one subset.
Consider again the wrong algorithm for determining whether a graph is bipartite, given in Section 5.11: We take an arbitrary vertex, remove it, color the rest(by induction), and then color the vertex in the best possible way. That is, we color the vertex with the oldest possible color, and add a new color only if the vertex is connected to vertices of all the old colors. Prove that, if we color one vertex at a time regardless of the global connections, we may need up to 1+logN colors.
You should design a construction that maximizes the number of colors for every order of choosing vertices. The construction can depend on the order in the following way.
The algorithm picks a vertex as a next vertex and starts checking the vertex’s edges. At that point, you are allowed to add edges incident to this vertex as you desire, provided that the graph remains a tree, such that, at the end, the maximal number of colors will be required. You can not remove an edge after it is put in(that would be cleanining the algorithm, which has already seen the edge). The best way to achieve this construction is by induction. Assume that you know a construction that requires<=k colors with few vertices, and build one that requires k+1 colors without adding too many new vertices.
《End》

How to find certain sized clusters of points

Given a list of points, I'd like to find all "clusters" of N points. My definition of cluster is loose and can be adjusted to whatever allows an easiest solution: it could be N points within a certain size circle or N points that are all within a distance of each other or something else that makes sense. Heuristics are acceptable.
Where N=2, and we're just looking for all point pairs that are close together, it's pretty easy to do ~efficiently with a k-d tree (e.g. recursively break the space into octants or something, where each area is a different branch on the tree and then for each point, compare it to other points with the same parent (if near the edge of an area, check up the appropriate number of levels as well)). I recognize that inductively with a solution for N=N', I can find solution for N=N'+1 by taking the intersections between different N' solutions, but that's super inefficient.
Anyone know a decent way to go about this?
You start by calculating the Euclidean minimum spanning tree, e.g CGAL can do this. From there the precise algorithm depends on your specific requirements, but it goes roughly like this: You sort the edges in that graph by length. Then delete edges, starting with the longest one. It's a singly connected graph, so with each deleted edge you split the graph into two sub-graphs. Check each created sub-graph if it forms a cluster according to your conditions. If not, continue deleting edges.

Looking for an algorithm by possibly Dijkstra

I am looking for an algorithm that distributes nodes on a plane, such that the edges are
all the same size. I think it is by Dijkstra, but I cannot remember.
Anyone heard of this algorithm?
In general this will be impossible. Effectively you want something similar to the finite pictures in tilings of the plane.
There are some simple cases - regular polygons and a few graphs which include joined polygons, but even something as simple as the complete graph for 4 points (tetrahedron) is impossible.
If you want something that tries to balance the impossible constraints, try graphviz and its neato program.
Well if you want to create any graph with such property, then there are number of graphs that may help you with that, for instance: a line, a ring, a tree etc .. but in here, you are the one who decide what edges to include or exclude.
If you have a certain graph, and you want to have all edges of the same size then this is impossible (because of some cases) - such as: a complete graph of more than 3 nodes, a star topology with one master and more than 5 slaves, and slaves that are directly close to each other are neighbors. [I believe the cases in the other posts tells you more]
A special case, is given a graph $G(V,E)$, draw $G$ such that the length of each edge in $e \in E$ is less than a unit. This is an NP-Hard problem. [That is, you cannot decide whether an arbitrary graph $G$ is a unit disk graph]

Finding equal subgraphs

Given:
a directed Graph
Nodes have labels
the same label can appear more than once
edges don't have labels
I want to find the set of largest (connected) subgraphs which are equal taking the labels of the nodes into account.
The graph could be huge (millions of nodes) does anyone know an efficient solution for this?
I'm looking for algorithm and ideally a Java implementation.
Update: Since this problem is most likely NP-complete. I would also be interested in an algorithm that produces an approximated solution.
This seems to be close at least:
Frequent Subgraphs
I strongly suspect that's NP-hard.
Even if all the labels are the same that's at least as hard as graph isomorphism. (Join the two graphs together as a single disconnected graph; are the largest equal subgraphs the two original graphs?)
If identical labels are relatively rare it might be tractable.

Resources