Weighted n-coloring problem algorithms - algorithm

I have 100 vertices and a function f(x,y) that computes the weight of the edge between vertex x and vertex y. f is not particularly expensive, so I could generate an indexed adjacency list with weights, if necessary.
What are some efficient, tractable methods for optimizing the n-coloring of these vertices by minimizing or maximizing the sum of the weights of all edges connecting vertices of the same color?
I imagine simulated annealing could be useful in this circumstance.
Links to code packages would also be super useful so I don't have to rewrite the wheel!
Thanks!

A very handy python package for experimenting with graphs is NetworkX. If you prefer C++ there's also boost, but using graphs in boost will seem ridiculously clumsy after NetworkX.
Simulated annealing isn't a bad idea. You can do a regular coloring first to find a lower bound which will help direct your search. You should define your problem more precisely, though. Do you mean to pick some pivot value for the sum of incoming edges and try to partition the colors around the pivot?

Related

Create N-Clusters out of Min spanning tree?

Let say I created a Minimum Spanning Tree out of Graph with M nodes. Is there an algorithm to create N number of clusters.
I'm looking to cut some of the links such as that I end up with N clusters and label them i.e. given a node X I can query in which cluster it belongs.
What I think is once I have the MST, I cut the top/max M-N edges of the MST and I will get N clusters ?
Is my logic correct ?
That seems a good way to me. You ask whether it's "correct" -- that I can't say, since I don't know what other unstated criteria you have in mind. All you have actually stated that you want is to create N clusters -- which you could also achieve by throwing away the MST, putting vertex 1 in the first cluster, vertex 2 in the second, ..., vertex N-1 in the (N-1)th, and all remaining vertices in the Nth.
If you're using Kruskal's algorithm to build the MST, you can achieve what you're suggesting by simply stopping the algorithm early, as soon as only N components remain.
A tree is a (very sparse) subset of edges of a graph, if you cut based on them you are not taking into consideration a (possible) vast majority of edges in your graph.
Based on the fact that you want to use a M(inimum)ST algorithm to create clusters, it would seem you want to minimize the set of edges that lie in the n-way cut induced by your clustering. Using an MST as a proxy with a graph with very similar weight edges will produce likely terrible results.
Graph clustering is a heavily studied topic, have you considered using an existing library to accomplish this? If you insist on implementing your own algorithm, I would recommend spectral clustering as a starting point as it will produce decent results without much effort.
Edit based on feedback in coments:
If your main bottleneck is the similarity matrix then the following should be considered:
Investigate sparse matrix/graph representation while implementing something like spectral clustering which is probably going to give much more robust results than single-linkage clustering
Investigate pruning edges from the similarity matrix which you think are unimportant. If pruning is combined with a sparse representation of the similarity matrix, this should yield comparable performance to the MST approach while giving a smooth continuum to tune performance vs quality.

Is there a 2D-layout algorithm for DAGs that allows the positions on one axis to be fixed?

I've got a DAG of around 3.300 vertices which can be laid out quite successfully by dot as a more or less simple tree (things get complicated because vertices can have more than one predecessor from a whole different rank, so crossovers are frequent). Each vertex in the graph came into being at a specific time in the original process and I want one axis in the layout to represent time: An edge relation like a -> v, b -> v means that a and b came into being at some specific time before v.
Is there a layout algorithm for DAGs which would allow me to specify the positions (or at least the distances) on one axis and come up with an optimal layout regarding edge crossovers on the other?
You can make a topological sorting of the DAG to have the vertices sorted in a way that for every edge x->y, vertex x comes before than y.
Therefore, if you have a -> v, b -> v, you will get something like a, b, v or b, a, v.
Using this you can easily represents DAGs like this:
Yes, as #Arturo-Menchaca said a topological sorting may help to reduce overlapping count of edges. But it may be not optimal. There is no good algorithm for edge crossing minimization. Problem for crossing minimization is NP-complete. The heuristics are applied for solving this problem.
This StackOverflow link may help you: Drawing Directed Acyclic Graphs: Minimizing edge crossing?
I suppose your problem is related to an aesthetically pleasing way of the graph layout. Some heuristics are described in the articles Overview of algorithms for graph drawing, Force-Directed Drawing Algorithms. May be information about planar graph or almost planar graph can help you also.
Some review of the algorithms for checking and drawing planar graphs are described in the Wiki pages Planar graph, Crossing number (graph theory). The libraries and algorithms for planar graph drawing are described in the StackOverflow question How to check if a Graph is a Planar Graph or not? For example the author in the article GA for straight-line grid drawings of maximal planar graphs uses genetic algorithms for straight-line grid drawing.
Good descriptions for almost planar graphs are given in the articles Straight-Line Drawability of a Planar Graph Plus an Edge, On the Crossing Number of Almost Planar Graphs.
Try to modify the original algoritms using your condition with one axis alignment.
If I understood you correctly then you want to minimize the number of edge-crossings in your graph layout. If so, then the answer is "No", because this problem is proved to be NP-complete in the general case. See this, "Crossing Number is NP-Complete, Garey, Johnson".
If you need a not an optimal but just good enough solution, there are multiple articles on this topic because it is heavily related with circuit layouts. Probably googling "crossing number heuristics" and looking through the abstracts of some papers will solve your task better then me trying to guess blindly your requirements.

What clustering algorithms can I consider for graph?

Suppose I have an undirected weighted connected graph. I want to group vertices that have highest edges' value all together (vertices degree). Using clustering algorithms is one way. What clustering algorithms can I consider for this task? I hope it is clear; any question for clarification, please ask. Thanks.
There are two main approach - giving your graph as an input to existing tool, or using expert knowledge you have on this graph (and its domain) in order to create a representation, and then apply machine learning methods on it.
I'll start with the second approach:
If you have only the nodes and edges (no farther data for each node), you first need to think of a representation for each node\edge. I going to explain about nodes, but it should should be similar for edges' case.
The simplest approach is to represent each node n as a connectivity vector:
Every node will be represented as n=(Ia(n),Ib(n),Ic(n),Id(n),Ie(n)), where Ii(n)=1 in case node n is a 'friend' (neighbor) of node i, and 0 otherwise. (e.g.a=(0,1,1,0,1))
Note that you can decide if a node is a friend of itself.
Second approach, which is quite similar to the first one, is to use edges' weights vector:
n=(W(a,n),W(b,n),W(c,n),W(d,n),W(e,n)) , where W(i,n) is the weight of the edge (i,n).
There are a few more ways to represent nodes, but this is enough in order to run some calculations on it.
After you have this presentation, you can start applying some clustering algorithms on it.
kmeans is considered great for this task, and sklearn has a great implementation. It has some parameters you can (and should) configure (i.e. the distance measure).
The product of kmeans, is k different non-intersecting groups of nodes.
If you want to pass you graph to an algorithm and get some measures, there are more advanced algorithms you can apply. community detection is used to find communities in a graph. Again, there is a nice python implementation in the networkxpackage.

Bottom Up DP from Top Down DP

I was thinking if there was some generalized way of converting top down Dynamic Programing to Bottom Up Programming.
Can we think of some mechanism which gives formal way by which top down DP can we converted to bottom up DP.
NOTE: I am beginner in Dynamic Programing, and few problems I have seen in which a top down approach is converted to bottom up approach are very different. So I am not sure if a generalized way is possible.
By generalized I would mean, how should arrays be initialized, what should be size of array and how many dimensions should array have.
The execution of a dynamic program can be viewed as a directed acyclic graph, where each vertex is a subproblem, and arcs indicate that the solution to a particular subproblem was required to compute the solution for another subproblem. What a top-down recursive program with memoization is doing is, in essence, topologically sorting the subgraph of this graph reachable from the root problem via depth-first search. To convert it to a bottom-up approach, you need to work out a suitable topological order yourself, which will vary from problem to problem.
Top Down solution is usually better, because it only solves the necessary subproblems. Converting a bottom up solution to top down is pretty straightforward, you just need to calculate and store the subproblems on-demand instead of precalculating all off them. The other way can be tricky, because you need to know which subproblems to solve. Depending on the problem, the difficulty of finding the subproblems without inspecting the upper problems can range from easy to impossible. For example consider the following: you have an infinite colored, weighted graph, and its verticles are colored with 10 colors. Whats the distance to the closest blue verticle from a given A verticle. Solving it up-down is possible, but going bottom-up is impossible as you would need to start from all blue verticles of the graph.

How to assign consecutive numbers to nodes of directed graph?

There's a graph with a lot of nodes, and very few edges between them - the problem is assigning numbers to nodes, so that most nodes are from i to i+1 or otherwise close.
My problem is about printing graph data nicely, but an algorithm just like that is part of pretty much every compiler (intermediate code is just a graph, produced object code gets memory locations).
I thought it was just straightforward depth-first search, but results of that aren't that great - it seems to minimize number of links back well enough, but ones it leaves tend to be horrible (like 1 -> 500 -> 1).
Any better ideas?
This paper discusses this problem, if you use Eyal Schneider's formulation of minimizing the sum of the edge deltas (absolute value of the difference between the endpoints' labels). It's under #2, Optimal Linear Arrangements.
Sadly, there's no algorithm given for achieving an optimal ordering (or labeling), and the general problem is NP-complete. There are references to some polynomial-time algorithms for trees, though.
If you want to get into the academic stuff, google gives lots of hits for "Optimal Linear Arrangements".

Resources