There's an undirected graph with weights on edges (weights are non-negative integers and their sum isn't big, most are 0). I need to divide it into some number of subgraphs (let's say graph of 20 nodes to 4 subgraphs of 5 nodes each) in a way that minimizes sum of weights of edges between different subgraphs.
This sounds vaguely like the minimum cut problem, but not quite close enough.
In alternative formulation - there's a bunch of buckets, all items belong to exactly two buckets, and I need to partition buckets into bucket groups in a way that minimizes number of items in more than one bucket group. (nodes map to buckets, edge weights map to duplicate item counts)
This is the minimum k-cut problem, and is NP hard. Here's a greedy heuristic that will guarantee you a 2-1/k approximation:
While the graph has fewer than k components:
1) Find a min-cut in each component
2) Split the component with the smallest weight min-cut.
The problem is studied in this paper: http://www.cc.gatech.edu/~vazirani/k-cut.ps
Related
Given an unoriented tree with weightless edges with N vertices and N-1 edges and a number K find K nodes so that every node from a tree is within S distance of at least one of the K nodes. Also, S has to be the smallest possible S, so that if there were S' < S at least one node would be unreachable in S' steps.
I tried solving this problem, however, I feel that my supposed solution is not very fast.
My solution:
set x=1
find nodes which are x distance from every node
let the node which has the most nodes in its distance be one of the K nodes.
recompute for every node whilst not counting already covered nodes.
do this till I find K number of K nodes. Then if every node is covered we are done else increase x.
This problem is called p-center, and you can find several papers online about it such as this. It is indeed NP for general graphs, but polynomial on trees, both weighted and unweighted.
For me it looks like a clustering problem. Try it with the k-Means (wikipedia) algorithm where k equals to your K. Since you have a tree and all vertices are connected, you can use as distance measurement the distance/number of edges between your vertices.
When the algorithm converts you get the K nodes which should be found. Then you can determine S by iterating through all k clusters. There you calculate the maximum distance for every node in the cluster to the center node. And the overall max should be S.
Update: But actually I see that the k-means algorithm does not produce a global optimum, so this algorithm wouldn't also produce the best result ...
You say N nodes and N-1 vertices so your graph is a tree. You are actually looking for a connected K-subset of nodes minimizing the longest edge.
A polynomial algorithm may be:
Sort all your edges increasing distance.
Then loop on edges:
if none of the 2 nodes are in a group, create a new group.
else if one node is in 1 existing goup, add the other to the group
else both nodes are in 2 different groups, then fuse the groups
When a group reach K, break the loop and you have your connected K-subset.
Nevertheless, you have to note that your group can contain more than K nodes. You can imagine the problem of having 4 nodes, closed two by two. There would be no exact 3-subset solution of your problem.
I have an undirected tree structure where each edge has length associated. Given a random selection of nodes, I need to partition them into subsets S1, S2, ..., Sk (all may be equal or close in size) such that the path length between any two nodes within a subset is <= those in two different subsets.
My understanding is that this is not a standard graph partitioning problem.
To me it seemed closest to k-means algorithm, but that does not take arbitrary distance measures.
Googling around, I found that k-medoid algorithm allows arbitrary distance metric, which looks promising. I am thinking of creating pairwise distance matrix (D[i,j] = path length from node i to node j) and putting that into a k-medoid algorithm. Is that a reasonable approach or are there standard algorithms that I am missing?
I have a set of N objects and N * N distances between them. I want to cluster this set on subsets, such that in each cluster there are all objects has the same distance and mean(cluster_size) on all clusters is maximized.
I tried to solve this task by such algorithm:
Lets enumerate all unique distances between objects.
For each unique distance X lets build graph based with objects as nodes and adjacency matrix, where edge between A and B is present if distance between objects A and B is exactly X
Lets find maximum clique in this graph. If size of this clique is bigger than current maximum - update maximum and store clique as Result
Delete objects stored in Result from set of objects
Repeat until set of objects is not empty
Is there are any more efficient [approximate] solution?
Mean(cluster size) = total number of points / number of clusters
The only way to maximize this is to minimize the number of clusters. That seems a fairly bad choice as optimization target. You may want to reconsider this objective.
Apart from that I think your algorithm is fairly sensible. As the problem likely is NP hard, you do want to use a greedy approximation.
I suggest to be more lazy in recomputing, and add some bounds.
Build subgraphs for every unique distance.
Sort subgraphs by size, descending.
Unless you have a cached value from a previous iteration, find the largest clique in each subgraph. Remember the overall largest clique. Stop if the current largest is larger than the remaining subgraphs.
Output best subgraph found.
Remove the contained nodes from all graphs, and forget those best cliques that contain any of the nodes just found. Go back to 2.
Here's the problem.
A weighted undirected connected graph G is given. The weights are constant. The task is to come up with an algorithm that would find the total weight of a spanning tree for G that fulfills these two conditions (ordered by priority):
The spanning tree has to have maximum number of edges with the same weight (the actual repeated weight value is irrelevant);
The total spanning tree weight should be minimized. That means, for example, that the spanning tree T1 with weight 120 that has at most 4 edges with the same weight (and the weight of each of those four is 15) should be preferred over the spanning tree T2 with weight 140 that has at most 4 edges with the same weight (and the weight of each of those four is 8).
I've been stuck on that for quite a while now. I've already implemented Boruvka's MST search algorithm for the graph, and now I'm not sure whether I should perform any operations after MST is found, or it's better to modify MST-search algorithm itself.
Any suggestions welcome!
This can be done naively in O(m^2), and without much effort in O(mn). It seems like it is possible to do it even faster, maybe something like O(m log^2(n)) or even O(m log(n)) with a little work.
The basic idea is this. For any weight k, let MST(k) be a spanning tree which contains the maximum possible number of edges of weight k, and has minimum overall weight otherwise. There may be multiple such trees, but they will all have the same total weight, so it doesn't matter which one you pick. MST(k) can be found by using any MST algorithm and treating all edges of weight k as having weight -Infinity.
The solution to this problem will be MST(k) for some k. So the naive solution is to generate all the MST(k) and picking the one with the max number of identical edge weights and then minimum overall weight. Using Kruskal's algorithm, you can do this in O(m^2) since you only have to sort the edges once.
This can be improved to O(mn) by first finding the MST using the original weights, and then for each k, modifying the tree to MST(k) by reducing the weight of each edge of weight k to -Infinity. Updating an MST for a reduced edge weight is an O(n) operation, since you just need to find the maximum weight edge in the corresponding fundamental cycle.
To do better than O(mn) using this approach, you would have to preprocess the original MST in such a way that these edge weight reductions can be performed more quickly. It seems that something like a heavy path decomposition should work here, but there are some details to work out.
We are given N pairs. Each pair contains two numbers. We have to find maximum number K such that if we take any combination of J (1<=J<=K) pairs from the given N pairs, we have at least J different numbers in all those selected J pairs. We can have more than one pair same.
For example, consider the pairs
(1,2)
(1,2)
(1,2)
(7,8)
(9,10)
For this case K = 2, because for K > 2, if we select three pairs of (1,2), we have only two different numbers i.e 1 and 2.
Checking for each possible combination starting from one will take a very large amount of time. What would be an efficient algorithm for solving the problem?
Create a graph with one vertex for each number and one edge for each pair.
If this graph is a chain or a tree, we have the number of "numbers", equal to number of "pairs" plus one, After removing any number of edges from this graph, we never get less vertexes than edges.
Now add a single cycle to this chain/tree. There is equal number of vertexes and edges. After removing any number of edges from this graph, again we never get less vertexes than edges.
Now add any number of disconnected components, each should not contain more than one cycle. Once again, we never get less vertexes than edges after removing any number of edges.
Now add a second cycle to any of disconnected components. After removing all other components. at last we have more edges than vertexes (more pairs than numbers).
All this leads to the conclusion that K+1 is exactly the number of edges in the smallest possible subgraph, consisting of two cycles and, possibly, a chain, connecting these cycles.
Algorithm:
For each connected component, find the shortest cycle going through every node with Floyd-Warshall algorithm.
Then for each non-overlapping pair of cycles (in single component), use Dijkstra’s algorithm, starting from any node with at least 3 edges in one cycle, to find shortest path to other cycle; and compute a sum of lengths of both cycles and a shortest path, connecting them. For each overlapping pair of cycles, just compute the number of their edges.
Now find the minimum length of all these subgraphs. And subtract 1.
The above algorithm computes K if there is at least one double-cycle component in the graph. If there are no such components, K = N.
Seems related to MinCut/MaxFlow. Here is a try to reduce it to MinCut/MaxFlow:
- Produce one vertex for each number
- Produce one vertex for each pair
- Produce an edge from number i to a pair if the number is present in the pair, weight 1
- Produce a source node and connect it to all numbers, weight 1 for each connection
- Produce a sink node and connect it to all numbers, weight 1 for each connection
Running MaxFlow on this should give you the number K, since any set of three pairs which only contains two numbers in total, will be "blocked" by the constrains on the outgoing edges from the number.
I am not sure whether this is the fastest solution. There might also be a matroid hidden in there somewhere, I think. In that case there is a greedy approach. But I cannot find a proof for the matroid properties of the sets you are constructing.
I made some progress on it, but not yet an efficient solution. However it may point the way.
Make a graph whose points are pairs, and connect any pair of points if they share a number. Then for any subgraph, the number of numbers in it is the number of vertices minus the number of edges. Therefore your problem is the same as locating the smallest subgraph (if any) that has more edges than vertices.
A minimal subgraph that has the same number of edges and vertices is a cycle. Therefore the graphs we're looking for are either 2 cycles that share one or more vertices, or else 2 cycles which are connected by a path. There are no other minimal types possible.
You can locate and enumerate cycles fairly easily with a breadth-first search. There may be a lot of them, but this is doable. Armed with that you can look for subgraphs of these subtypes. (Enumerate minimal cycles, look for either pairs that share points, or which are connected.) But that isn't guaranteed to be polynomial. I suspect it will be something where on average it is pretty good, but the worst case is very bad. However that may be more efficient than what you're doing now.
I keep on thinking that some kind of breadth-first search can find these in polynomial time, but I keep failing to see exactly how to do it.
This is equivalent to finding the chord that chords the smallest cycle in the graph. A very naive algorithm would be:
Check if removal of an edge results in a cycle containing the vertices corresponding to the edge. If yes, then note down the length of the smallest cycle.