I have an undirected planar graph where each node has a weight. I want to split the graph into as many connected disjoint subgraphs as possible (EDIT: or to reach a minimum mean weight of the subgraphs possible), given the condition that each subgraph has to reach a fixed minimum weight (which is a sum of weights of its nodes). A subgraph containing only a single node is OK as well (if the node's weight is larger than the fixed minimum).
What I have found out so far is a heuristic:
create a subgraph out of every node
while there is an underweight subgraph:
select the subgraph S with the lowest weight
find a subgraph N that has the lowest weight among the neighbouring subgraphs of S
merge S to N
Clearly this is not optimal. Has anyone got a better solution? (Maybe I'm just ignorant and this is not a complex issue, but I have never studied graph theory...)
EDIT (more background details): The nodes in this graph are low-scale administrative units for which statistical data are to be provided. However, the units need to have a certain minimum population size to avoid conflicts with personal data legislation. My objective is to create aggregates so that as little information as possible is lost on the way. The neighbourhood relationships serve as graph edges, as the resulting units must be contiguous.
Most of the units (nodes) in the set are well above the minimum threshold. About 5-10 % of them is below the threshold with varying sizes, as seen on the example (minimum size 50):
This is an NP-hard optimization problem. For example, the Partition problem can be reduced into this easily (the planarity property does not cause a problem). So an algorithm that calculates an optimal solution (and you seem to ask for an optimal solution in your comment) is unlikely to be practical for "tens of thousands of nodes".
If you don't actually need an optimal solution but a good one, I would use local optimization methods, such as tabu search or simulated annealing.
Because the mean weight of your subgraphs is just the weight of the total graph divided by the number of subgraphs, the only thing that matters is to find the maximum number of subgraphs you can attain. Guess this number, N, form an initial partitioning into N subgraphs, and then, for example, use local moves of (1) moving a node from one subgraph to another and (2) exchanging two nodes between two adjacent subgraphs, in search for an acceptable solution where every subgraph has the required minimum weight. If you can't find an acceptable solution at all, decrease N (e.g. by one) and restart until you find a solution.
The details of the instance change everything.
Call a vertex heavy if it's over threshold by itself. It never makes sense to have a subgraph with two heavy vertices, because we could split it in two. Thus, the number of subgraphs in the optimal solution is related additively to the number of subgraphs containing no heavy vertex.
As a result, we can delete the heavy vertices and focus on making as many valid subgraphs as possible out of what's left. Judging by your map, what's left will consist of many connected components, each with only a handful of vertices. Valid subgraphs must be connected, so these components can be solved independently. On small instances, one can (e.g.) enumerate all valid subgraphs and then run Algorithm X to find a maximum-cardinality packing.
The problem is NP-hard then I will highlight an exponential solution. This solution Have many improve points, I will highlight somes.
The whole idea is: each partition of vertex are connected by some edges then you can ensure that if you try with all the possible sets of edges that makes a correct partition of the graph. You can find the best case counting the number of sets of each partition (optimal condition).
In your previous approach you don't have Domain to expand the search. For the solution were used the following:
- Disjoint Sets: Partition representation
- Power Sets: For find all the possible edges sets
public Partition Solve(Graph g, int min)
{
int max = 0;
Partition best;
// Find all the possible partitions for Edges
foreach(var S in PowerSet(g.Edges))
{
// Build the Vertexes partition
var partition = BuildPartition(S);
// Test the min condition foreach component
if (IsInvalid(partition, min))
continue;
// Continue if we already have something better
if (max >= partition.Length)
continue;
// Update
max = partition.Length;
best = partition;
}
return best;
}
public Partition BuildPartition(Graph g, IEnumerable<Edge> edges)
{
// Initially Every Vertex in alone in his partition
var partition = new DisjointSet(g.Vertexes);
foreach (var edge in edges)
{
// If the Vertexes of this edge are already in the same partition DO NOTHING
if (partition.Find(edge.V1) == partition.Find(edge.V2))
continue;
// Join both subsets
partition.Union(edge.V1, edge.V2);
}
return parition;
}
public bool IsInvalid(Partition p, int min)
{
return p.Sets.Any(t => t.Sum(v => v.Weight) < min);
}
You can improve the solution in the following aspects:
- Add parallelism to the PowerSet and IsInvalid condition
- Find a better way of generate valid Edge sets
- Have some starting case for Vertex with more weight than the minimun (Always will be in a separate subgraph)
The order of the algorithm is given by the Power Set.
- Power Set: in this case for N vertex graph you will have in the worst case 3N-6 edges then O(2^N). - Build Partition: V + E*LogV then is O(NLogN)
- IsInvalid : O(V)
Finally the Solve is O(2^N * N * LogN)
Use this last formula to calculate the number of operations
Hope this Help!
The problem: finding the path to the closest of multiple goals on a rectangular grid with obstacles. Only moving up/down/left/right is allowed (no diagonals). I did see this question and answers, and this, and that, among others. I didn't see anyone use or suggest my particular approach. Do I have a major mistake in my approach?
My most important constraint here is that it is very cheap for me to represent the path (or any list, for that matter) as a "stack", or a "singly-linked-list", if you want. That is, constant time access to the top element, O(n) for reversing.
The obvious (to me) solution is to search the path from any of the goals to the starting point, using a manhattan distance heuristic. The first path from the goal to the starting point would be a shortest path to the closest goal (one of many, possibly), and I don't need to reverse the path before following it (it would be in the "correct" order, starting point on top and goal at the end).
In pseudo-code:
A*(start, goals) :
init_priority_queue(start, goals, p_queue)
return path(start, p_queue)
init_priority_queue(start, goals, q_queue) :
for (g in goals) :
h = manhattan_distance(start, g)
insert(h, g, q_queue)
path(start, p_queue) :
h, path = extract_min(q_queue)
if (top(path) == start) :
return path
else :
expand(start, path, q_queue)
return path(start, q_queue)
expand(start, path, q_queue) :
this = top(path)
for (n in next(this)) :
h = mahnattan_distance(start, n)
new_path = push(n, path)
insert(h, new_path, p_queue)
To me it seems only natural to reverse the search in this way. Is there a think-o in here?
And another question: assuming that my priority queue is stable on elements with the same priority (if two elements have the same priority, the one inserted later will come out earlier). I have left my next above undefined on purpose: randomizing the order in which the possible next tiles on a rectangular grid are returned seems a very cheap way of finding an unpredictable, rather zig-zaggy path through a rectangular area free of obstacles, instead of going along two of the edges (a zig-zag path is just statistically more probable). Is that correct?
It's correct and efficient in the big O as far as I can see (N log N as long as the heuristic is admissible and consistent, where N = number of cells of the grid, assuming you use a priority queue whose operations work in log N). The zig-zag will also work.
p.s. For these sort of problem there is a more efficient "priority queue" that works in O(1). By these sort of problem I mean the case where the effective distance between every pair of nodes is a very small constant (3 in this problem).
Edit: as requested in the comment, here are the details for a constant time "priority queue" for this problem.
First, transform the graph into the following graph: Let the potential of nodes in the graph (i.e., cell in a grid) be the Manhattan Distance from the node to the goal (i.e., the heuristic). We call the potential of node i as P(i). Previously, there is an edge between adjacent cells and its weight is 1. In the modified graph, the weight w(i, j) is changed into w(i, j) - P(i) + P(j). This is exactly the same graph as in the proof to why A* is optimal and terminates in polynomial time in the case the heuristic is admissible and consistent. Note that Manhattan Distance heuristic for this problem is both admissible and consistent.
The first key observation is that A* in the original graph is exactly the same with Dijkstra in the modified graph. This is since the "value" of node i in the modified graph is exactly the distance from the origin node plus P(i). The second key observation is that the weight of every edge in our transformed graph is either 0 or 2. Thus, we can simulate the A* by using a "deque" (or a bidirectional linked list) instead of an ordinary queue: whenever we encounter an edge with weight 0, push it to the front of the queue, and whenever we encounter an edge with weight 2, push it to the end of the queue.
Thus, this algorithm simulates A* and works in linear time in the worst case.
We have a directed weighted graph where an edge between two nodes can have more than one possible cost value (more precisely, at most 2 costs). I need to use a time-dependent variant of the Dijkstra's algorithm that can handle two possible ways of getting from one node to another, the cost between the nodes (edge cost) being dependant on the time at which we arrive at the source node and the type of edge we are about to use. When traversing from one node to the other only one of these edges is picked and its cost is added to the same total cost.
I currently model the two possible costs for an edge as two separate edges between the same nodes.
There is a similar problem I found here and it was suggested to augment the graph by duplicating the nodes. However, this does not allow returning to the original graph and implies the overhead of, well, duplicating all the nodes and possibly edges between them and original nodes.
Do you have any suggestions as to how to tackle this problem with as little overhead as possible? (The original graph is expected to be huge)
Thanks
Edit:
I provided more details about the problem in the first paragraph
You can safely ignore the largest of the two costs for algorithm purposes.
Assume there is a shortest path the uses the largest cost between two vertices, you can change it to use the smallest cost and the path will cost less, and that contradicts the assumption.
I think you can hack step 3 of Dijsktra's algorithm :
For the current node, consider all of its unvisited neighbors and calculate their tentative distances. Compare the newly calculated tentative distance to the current assigned value and assign the smaller one. For example, if the current node A is marked with a distance of 6, and the edge connecting it with a neighbor B has length 2, then the distance to B (through A) will be 6 + 2 = 8. If B was previously marked with a distance greater than 8 then change it to 8. Otherwise, keep the current value.
In your setup, you have two distances from A to B, depending on how late it is. You use the second one if your current distance to A is above your time treshold.
This step becomes :
if current distance to A above threshold :
current distance to B = min(current distance to B, current distance to A + d2(A, B))
else:
current distance to B = min(current distance to B, current distance to A + d1(A, B))
There is problem, I reduce it to a question as below:
In a connected undirected graph, edge weight is the time to go from one end to another. some people stand on some vertex. Now, they want to meet together, find a place(vertice) that within certain time T, all the people will arrive this assembly point. Try to minimise this T.
More information if you need for margin cases: No negative edge; cycle may exist; More than one person can stay on the same vertice; vertice may have no person; undirected edge, weight measures both u->v or v->u; people start from their initial location;
How to efficiently find it? Should I for every node v, calculate max(SPD(ui, v)) where ui are other people's locations, then choose the minimum one among these max times? Is there a better way?
I believe it could be done within a polynomial runtime bound as follows. In a first pass solve the All-Pairs Shortest Path problem to obtain a matrix with corresponding lengths of shortest paths for all vertices; afterwards iterate over the rows (or columns) and select a column where the maximum entry of all indices on which users are located.
It can be done by making parallel Dijkstra from all vertices, and stopping when sets of visited nodes intersect in one node. Intersection can be checked by counting. Algorithm sketch:
node_count = [1, 1, ...] * number_of_nodes # Number of visited sets node is in
dijkstras = set of objects D_n performing Dijsktra's algorithm starting from node n
queue = priority queue that stores tuples (first_in_queue_n, D_n).
first_in_queue_n is next node that will be visited by D_n
initialized by D_n.first_in_queue()
while:
first_in_queue_n, D_n = queue.pop_min()
node_count[first_in_queue_n] += 1
if node_count[first_in_queue_n] == number_of_nodes:
return first_in_queue_n
D_n.visite_node(first_in_queue_n)
queue.add( D_n.first_in_queue() )
Given an undirected graph, I want to generate all subgraphs which are trees of size N, where size refers to the number of edges in the tree.
I am aware that there are a lot of them (exponentially many at least for graphs with constant connectivity) - but that's fine, as I believe the number of nodes and edges makes this tractable for at least smallish values of N (say 10 or less).
The algorithm should be memory-efficient - that is, it shouldn't need to have all graphs or some large subset of them in memory at once, since this is likely to exceed available memory even for relatively small graphs. So something like DFS is desirable.
Here's what I'm thinking, in pseudo-code, given the starting graph graph and desired length N:
Pick any arbitrary node, root as a starting point and call alltrees(graph, N, root)
alltrees(graph, N, root)
given that node root has degree M, find all M-tuples with integer, non-negative values whose values sum to N (for example, for 3 children and N=2, you have (0,0,2), (0,2,0), (2,0,0), (0,1,1), (1,0,1), (1,1,0), I think)
for each tuple (X1, X2, ... XM) above
create a subgraph "current" initially empty
for each integer Xi in X1...XM (the current tuple)
if Xi is nonzero
add edge i incident on root to the current tree
add alltrees(graph with root removed, N-1, node adjacent to root along edge i)
add the current tree to the set of all trees
return the set of all trees
This finds only trees containing the chosen initial root, so now remove this node and call alltrees(graph with root removed, N, new arbitrarily chosen root), and repeat until the size of the remaining graph < N (since no trees of the required size will exist).
I forgot also that each visited node (each root for some call of alltrees) needs to be marked, and the set of children considered above should only be the adjacent unmarked children. I guess we need to account for the case where no unmarked children exist, yet depth > 0, this means that this "branch" failed to reach the required depth, and cannot form part of the solution set (so the whole inner loop associated with that tuple can be aborted).
So will this work? Any major flaws? Any simpler/known/canonical way to do this?
One issue with the algorithm outlined above is that it doesn't satisfy the memory-efficient requirement, as the recursion will hold large sets of trees in memory.
This needs an amount of memory that is proportional to what is required to store the graph. It will return every subgraph that is a tree of the desired size exactly once.
Keep in mind that I just typed it into here. There could be bugs. But the idea is that you walk the nodes one at a time, for each node searching for all trees that include that node, but none of the nodes that were searched previously. (Because those have already been exhausted.) That inner search is done recursively by listing edges to nodes in the tree, and for each edge deciding whether or not to include it in your tree. (If it would make a cycle, or add an exhausted node, then you can't include that edge.) If you include it your tree then the used nodes grow, and you have new possible edges to add to your search.
To reduce memory use, the edges that are left to look at is manipulated in place by all of the levels of the recursive call rather than the more obvious approach of duplicating that data at each level. If that list was copied, your total memory usage would get up to the size of the tree times the number of edges in the graph.
def find_all_trees(graph, tree_length):
exhausted_node = set([])
used_node = set([])
used_edge = set([])
current_edge_groups = []
def finish_all_trees(remaining_length, edge_group, edge_position):
while edge_group < len(current_edge_groups):
edges = current_edge_groups[edge_group]
while edge_position < len(edges):
edge = edges[edge_position]
edge_position += 1
(node1, node2) = nodes(edge)
if node1 in exhausted_node or node2 in exhausted_node:
continue
node = node1
if node1 in used_node:
if node2 in used_node:
continue
else:
node = node2
used_node.add(node)
used_edge.add(edge)
edge_groups.append(neighbors(graph, node))
if 1 == remaining_length:
yield build_tree(graph, used_node, used_edge)
else:
for tree in finish_all_trees(remaining_length -1
, edge_group, edge_position):
yield tree
edge_groups.pop()
used_edge.delete(edge)
used_node.delete(node)
edge_position = 0
edge_group += 1
for node in all_nodes(graph):
used_node.add(node)
edge_groups.append(neighbors(graph, node))
for tree in finish_all_trees(tree_length, 0, 0):
yield tree
edge_groups.pop()
used_node.delete(node)
exhausted_node.add(node)
Assuming you can destroy the original graph or make a destroyable copy I came up to something that could work but could be utter sadomaso because I did not calculate its O-Ntiness. It probably would work for small subtrees.
do it in steps, at each step:
sort the graph nodes so you get a list of nodes sorted by number of adjacent edges ASC
process all nodes with the same number of edges of the first one
remove those nodes
For an example for a graph of 6 nodes finding all size 2 subgraphs (sorry for my total lack of artistic expression):
Well the same would go for a bigger graph, but it should be done in more steps.
Assuming:
Z number of edges of most ramificated node
M desired subtree size
S number of steps
Ns number of nodes in step
assuming quicksort for sorting nodes
Worst case:
S*(Ns^2 + MNsZ)
Average case:
S*(NslogNs + MNs(Z/2))
Problem is: cannot calculate the real omicron because the nodes in each step will decrease depending how is the graph...
Solving the whole thing with this approach could be very time consuming on a graph with very connected nodes, however it could be paralelized, and you could do one or two steps, to remove dislocated nodes, extract all subgraphs, and then choose another approach on the remainder, but you would have removed a lot of nodes from the graph so it could decrease the remaining run time...
Unfortunately this approach would benefit the GPU not the CPU, since a LOT of nodes with the same number of edges would go in each step.... and if parallelization is not used this approach is probably bad...
Maybe an inverse would go better with the CPU, sort and proceed with nodes with the maximum number of edges... those will be probably less at start, but you will have more subgraphs to extract from each node...
Another possibility is to calculate the least occuring egde count in the graph and start with nodes that have it, that would alleviate the memory usage and iteration count for extracting subgraphs...
Unless I'm reading the question wrong people seem to be overcomplicating it.
This is just "all possible paths within N edges" and you're allowing cycles.
This, for two nodes: A, B and one edge your result would be:
AA, AB, BA, BB
For two nodes, two edges your result would be:
AAA, AAB, ABA, ABB, BAA, BAB, BBA, BBB
I would recurse into a for each and pass in a "template" tuple
N=edge count
TempTuple = Tuple_of_N_Items ' (01,02,03,...0n) (Could also be an ordered list!)
ListOfTuple_of_N_Items ' Paths (could also be an ordered list!)
edgeDepth = N
Method (Nodes, edgeDepth, TupleTemplate, ListOfTuples, EdgeTotal)
edgeDepth -=1
For Each Node In Nodes
if edgeDepth = 0 'Last Edge
ListOfTuples.Add New Tuple from TupleTemplate + Node ' (x,y,z,...,Node)
else
NewTupleTemplate = TupleTemplate + Node ' (x,y,z,Node,...,0n)
Method(Nodes, edgeDepth, NewTupleTemplate, ListOfTuples, EdgeTotal
next
This will create every possible combination of vertices for a given edge count
What's missing is the factory to generate tuples given an edge count.
You end up with a list of possible paths and the operation is Nodes^(N+1)
If you use ordered lists instead of tuples then you don't need to worry about a factory to create the objects.
If memory is the biggest problem you can use a NP-ish solution using tools from formal verification. I.e., guess a subset of nodes of size N and check whether it's a graph or not. To save space you can use a BDD (http://en.wikipedia.org/wiki/Binary_decision_diagram) to represent the original graph's nodes and edges. Plus you can use a symbolic algorithm to check if the graph you guessed is really a graph - so you don't need to construct the original graph (nor the N-sized graphs) at any point. Your memory consumption should be (in big-O) log(n) (where n is the size of the original graph) to store the original graph, and another log(N) to store every "small graph" you want.
Another tool (which is supposed to be even better) is to use a SAT solver. I.e., construct a SAT formula that is true iff the sub-graph is a graph and supply it to a SAT solver.
For a graph of Kn there are approximately n! paths between any two pairs of vertices. I haven't gone through your code but here is what I would do.
Select a pair of vertices.
Start from a vertex and try to reach the destination vertex recursively (something like dfs but not exactly). I think this would output all the paths between the chosen vertices.
You could do the above for all possible pairs of vertices to get all simple paths.
It seems that the following solution will work.
Go over all partitions into two parts of the set of all vertices. Then count the number of edges which endings lie in different parts (k); these edges correspond to the edge of the tree, they connect subtrees for the first and the second parts. Calculate the answer for both parts recursively (p1, p2). Then the answer for the entire graph can be calculated as sum over all such partitions of k*p1*p2. But all trees will be considered N times: once for each edge. So, the sum must be divided by N to get the answer.
Your solution as is doesn't work I think, although it can be made to work. The main problem is that the subproblems may produce overlapping trees so when you take the union of them you don't end up with a tree of size n. You can reject all solutions where there is an overlap, but you may end up doing a lot more work than needed.
Since you are ok with exponential runtime, and potentially writing 2^n trees out, having V.2^V algorithms is not not bad at all. So the simplest way of doing it would be to generate all possible subsets n nodes, and then test each one if it forms a tree. Since testing whether a subset of nodes form a tree can take O(E.V) time, we are potentially talking about V^2.V^n time, unless you have a graph with O(1) degree. This can be improved slightly by enumerating subsets in a way that two successive subsets differ in exactly one node being swapped. In that case, you just have to check if the new node is connected to any of the existing nodes, which can be done in time proportional to number of outgoing edges of new node by keeping a hash table of all existing nodes.
The next question is how do you enumerate all the subsets of a given size
such that no more than one element is swapped between succesive subsets. I'll leave that as an exercise for you to figure out :)
I think there is a good algorithm (with Perl implementation) at this site (look for TGE), but if you want to use it commercially you'll need to contact the author. The algorithm is similar to yours in the question but avoids the recursion explosion by making the procedure include a current working subtree as a parameter (rather than a single node). That way each edge emanating from the subtree can be selectively included/excluded, and recurse on the expanded tree (with the new edge) and/or reduced graph (without the edge).
This sort of approach is typical of graph enumeration algorithms -- you usually need to keep track of a handful of building blocks that are themselves graphs; if you try to only deal with nodes and edges it becomes intractable.
This algorithm is big and not easy one to post here. But here is link to reservation search algorithm using which you can do what you want. This pdf file contains both algorithms. Also if you understand russian you can take a look to this.
So you have a graph with with edges e_1, e_2, ..., e_E.
If I understand correctly, you are looking to enumerate all subgraphs which are trees and contain N edges.
A simple solution is to generate each of the E choose N subgraphs and check if they are trees.
Have you considered this approach? Of course if E is too large then this is not viable.
EDIT:
We can also use the fact that a tree is a combination of trees, i.e. that each tree of size N can be "grown" by adding an edge to a tree of size N-1. Let E be the set of edges in the graph. An algorithm could then go something like this.
T = E
n = 1
while n<N
newT = empty set
for each tree t in T
for each edge e in E
if t+e is a tree of size n+1 which is not yet in newT
add t+e to newT
T = newT
n = n+1
At the end of this algorithm, T is the set of all subtrees of size N. If space is an issue, don't keep a full list of the trees, but use a compact representation, for instance implement T as a decision tree using ID3.
I think problem is under-specified. You mentioned that graph is undirected and that subgraph you are trying to find is of size N. What is missing is number of edges and whenever trees you are looking for binary or you allowed to have multi-trees. Also - are you interested in mirrored reflections of same tree, or in other words does order in which siblings are listed matters at all?
If single node in a tree you trying to find allowed to have more than 2 siblings which should be allowed given that you don't specify any restriction on initial graph and you mentioned that resulting subgraph should contain all nodes.
You can enumerate all subgraphs that have form of tree by performing depth-first traversal. You need to repeat traversal of the graph for every sibling during traversal. When you'll need to repeat operation for every node as a root.
Discarding symmetric trees you will end up with
N^(N-2)
trees if your graph is fully connected mesh or you need to apply Kirchhoff's Matrix-tree theorem