partitioning tree nodes with clustering algorithm - algorithm

I have an undirected tree structure where each edge has length associated. Given a random selection of nodes, I need to partition them into subsets S1, S2, ..., Sk (all may be equal or close in size) such that the path length between any two nodes within a subset is <= those in two different subsets.
My understanding is that this is not a standard graph partitioning problem.
To me it seemed closest to k-means algorithm, but that does not take arbitrary distance measures.
Googling around, I found that k-medoid algorithm allows arbitrary distance metric, which looks promising. I am thinking of creating pairwise distance matrix (D[i,j] = path length from node i to node j) and putting that into a k-medoid algorithm. Is that a reasonable approach or are there standard algorithms that I am missing?

Related

Finding size of largest connected component of a graph

Consider we have a random undirected graph G = (V,E) with n vertices, now suppose for any two vertices u and v ∈ V, the probability that the edge between u and v ∈ E is 1/n. We need to figure out the size of the largest connected component in the undirected graph C(n).
C(n) should be equal to Θ(n**a), we need to run some experiments to give an estimate of a.
I am a bit confused on how to link the probability 1/n to the largest connected component, is there any way I can do so?
The process you're simulating here is called the Erdős–Rényi model. You have a collection of n nodes, and each pair of nodes has probability p of being linked. The (expected) shape of the resulting graph depends heavily on the choice of p, and there are a lot of famous results about this.
As for how to do this: one option would be to create a collection of n nodes, iterate over all pairs of nodes, and link them with probability 1/n. You can then run an algorithm like BFS or DFS over the graph to find and size the connected components.
Another would be to use the above approach, except that instead of doing a BFS or DFS to use a disjoint-set forest to perform the links and find the largest connected component.
Alternatively, because each edge is absent or present with equal probability and independently of every other edge, the number of edges you have is binomially distributed and pretty tightly packed around n total edges. You could therefore generate n random edges, add them into the graph, then use the above techniques. (This will be much faster, as this does O(n) work rather than O(n2) work to process the edges.)
Once you've gotten this worked out, you can vary n over a large range and run some sort of polynomial regression on it to find the best-first curve. That's something you could either code up yourself, or which you could do by importing your data into Excel and using its regression tools.
As a spoiler, when you're done you'll find that the number of nodes in the largest connected component is Θ(n2/3). If you search for "Erdős–Rényi critical case," you can find online proofs of this result. It's not a trivial result to prove (and definitely isn't obvious!), but it'll drop out of your empirical analysis.

Divide a graph into same size disjoint sets with minimum cut

Is there any algorithm or code that divide graph nodes into two or more disjoint sets that satisfy following conditions:
first, only edges allowed to remove.
second, edges are weighted and those that would be removed must has minimum weight(minimum cut algorithm).
third,desired disjoint sets have same size as long as possible.
It looks like you're trying to solve the Min-bisection problem in which given a graph G you would like to partition V[G] into two disjoint subsets A and B of equal size such that the sum of weights of the edges between A and B is minimized. Unfortunately, the Min-bisection problem is known to be NP-hard. However, Kernighan–Lin algorithm is a very simple O(n^2*logn) heuristic algorithm for the problem. While very little is known about the algorithm theoretically (we do not have a proven bound on its performance relative to an optimal solution), the algorithm is shown to be quite effective in experiments.

Algorithm to Partition Graph into groups

I'm looking for an algorithm to partition a graph into groups of vertices (each of which is connected if it were own graph) of maximum size n while keeping the number of groups minimized. I need this algorithm to partition a delaunay triangulation into regions with equal number of vertices in each region. If anyone has a better idea for tackling this problem, let me know!
It seems you're looking for a solution to the uniform k-way graph partitioning problem, where, given a graph G(V,E), the goal is to partition the vertex-set V into a series of k disjoint subsets V1, V2, ..., Vk such that the size of each subset Vi is approximately |V|/k. Additionally, it's typical to require "nice" partitions, where the sum of the edge weights between any two subsets Vi and Vj is minimised.
Firstly, it's well known that this problem is NP-complete, precluding the existence of efficient exact algorithms. On the up side, a number of effective heuristics have been developed that perform pretty well on many practical problems.
Specifically, schemes based on an iterative multi-level approach have been very successful in practice. In such methods, a hierarchy of graphs is created by incrementally merging adjacent vertices to form a smaller "coarse" graph at each level of the hierarchy. An initial partition is formed when the coarse graph becomes small enough, with this partition then "mapped" back down the hierarchy onto the original graph. An iterative refinement of the partition is typically performed as part of this mapping process, generally leading to pseudo-optimal partitions.
The implementation of such algorithms is non-trivial, but a number of existing packages support such routines. Specifically, the METIS package is used extensively.

Algorithm for finding optimal node pairs in hexagonal graph

I'm searching for an algorithm to find pairs of adjacent nodes on a hexagonal (honeycomb) graph that minimizes a cost function.
each node is connected to three adjacent nodes
each node "i" should be paired with exactly one neighbor node "j".
each pair of nodes defines a cost function
c = pairCost( i, j )
The total cost is then computed as
totalCost = 1/2 sum_{i=1:N} ( pairCost(i, pair(i) ) )
Where pair(i) returns the index of the node that "i" is paired with. (The sum is divided by two because the sum counts each node twice). My question is, how do I find node pairs that minimize the totalCost?
The linked image should make it clearer what a solution would look like (thick red line indicates a pairing):
Some further notes:
I don't really care about the outmost nodes
My cost function is something like || v(i) - v(j) || (distance between vectors associated with the nodes)
I'm guessing the problem might be NP-hard, but I don't really need the truly optimal solution, a good one would suffice.
Naive algos tend to get nodes that are "locked in", i.e. all their neighbors are taken.
Note: I'm not familiar with the usual nomenclature in this field (is it graph theory?). If you could help with that, then maybe that could enable me to search for a solution in the literature.
This is an instance of the maximum weight matching problem in a general graph - of course you'll have to negate your weights to make it a minimum weight matching problem. Edmonds's paths, trees and flowers algorithm (Wikipedia link) solves this for you (there is also a public Python implementation). The naive implementation is O(n4) for n vertices, but it can be pushed down to O(n1/2m) for n vertices and m edges using the algorithm of Micali and Vazirani (sorry, couldn't find a PDF for that).
This seems related to the minimum edge cover problem, with the additional constraint that there can only be one edge per node, and that you're trying to minimize the cost rather than the number of edges. Maybe you can find some answers by searching for that phrase.
Failing that, your problem can be phrased as an integer linear programming problem, which is NP-complete, which means that you might get dreadful performance for even medium-sized problems. (This does not necessarily mean that the problem itself is NP-complete, though.)

Is there a good algorithm for this graph problem?

There's an undirected graph with weights on edges (weights are non-negative integers and their sum isn't big, most are 0). I need to divide it into some number of subgraphs (let's say graph of 20 nodes to 4 subgraphs of 5 nodes each) in a way that minimizes sum of weights of edges between different subgraphs.
This sounds vaguely like the minimum cut problem, but not quite close enough.
In alternative formulation - there's a bunch of buckets, all items belong to exactly two buckets, and I need to partition buckets into bucket groups in a way that minimizes number of items in more than one bucket group. (nodes map to buckets, edge weights map to duplicate item counts)
This is the minimum k-cut problem, and is NP hard. Here's a greedy heuristic that will guarantee you a 2-1/k approximation:
While the graph has fewer than k components:
1) Find a min-cut in each component
2) Split the component with the smallest weight min-cut.
The problem is studied in this paper: http://www.cc.gatech.edu/~vazirani/k-cut.ps

Resources