optimal way to calculate all nodes at distance less than k from m given nodes - algorithm

A graph of size n is given and a subset of size m of it's nodes is given . Find all nodes which are at a distance <=k from ALL nodes of the subset .
eg . A->B->C->D->E is the graph , subset = {A,C} , k = 2.
Now , E is at distance <=2 from C , but not from A , so it should not be counted .
I thought of running Breadth First Search from each node in subset , and taking intersection of the respective answers .
Can it be further optimized ?
I went through many posts on SO , but they all direct to kd-trees which i don't understand , so is there any other way ?

I can think of two non-asymptotic (I believe) optimizations:
If you're done with BFS from one of the subset nodes, delete all nodes that have distance > k from it
Start with the two nodes in the subset whose distance is largest to get the smallest possible leftover graph
Of course this doesn't help if k is large (close to n), I have no idea in that case. I am positive however that k/d trees are not applicable to general graphs :)

Nicklas B's optimizations can be applied to both of the following optimizations.
Optimization #1: Modify BFS to do the intersection as it runs rather than afterwords.
The BFS and intersection seems to be the way to go. However, there is redudant work being done by the BFS. Specicially, it is expanding nodes that it doesn't need to expand (after the first BFS). This can be resolved by merging the intersection aspect into the BFS.
The solution seems to be to keep two sets of nodes, call them "ToVisit" and "Visited", rather than label nodes visited or not.
The new rules of the BFS are as followed:
Only nodes in ToVisit are expanded upon by the BFS. They are then moved from ToVisit to Visited to prevent being expanded twice.
The algorithm returns the Visited set as it's result and any nodes left in the ToVisit are discarded. This is then used as the ToVisit set for the next node.
The first node either uses a standard BFS algorithm or ToVisit is the list of all nodes. Either way, the result becomes the second ToVisit set for the second node.
It works better if The ToVisit set is small on average, which tends to be the case of m and k are much less than N.
Optimization #2: Pre-compute the distances if there are enough queries so queries just do intersections.
Although, this is incompatible with the first optimization. If there are a sufficient number of queries on differing subsets and k values, then it is better to find the distances between every pair of nodes ahead of time at a cost of O(VE).
This way you only need to do the intersections, which is O(V*M*Q), where Q is the number of queries, M is the average size of the subset over the queries and V is the number of nodes. If it is expected to the be case that O(M*Q) > O(E), then this approach should be less work. Noting the two most distant nodes are useful as any k equal or higher will always return the set of all vertices, resulting in just O(V) for the query cost in that case.
The distance data should then be stored in four forms.
The first is "kCount[A][k] = number of nodes with distance k or less from A". This provides an alternative to Niklas B.'s suggestion of "Start with the two nodes in the subset whose distance is largest to get the smallest possible leftover graph" in the case that O(m) > O(sqrt(V)) since finding the smallest is O(m^2) and it may be better to avoid trying to find the best choice for the starting pair and just pick a good choice. You can start with the two nodes in the subset with the smallest value for the given k in this data structure. You could also just sort the nodes in the subset by this metric and do the intersections in that order.
The second is "kMax[A] = max k for A", which can be done using a hashmap/dictionary. If the k >= this value, then this this one can be skipped unless kCount[A][kMax[A]] < (number of vertices), meaning not all nodes are reachable from A.
The third is "kFrom[A][k] = set of nodes k distance from A", since k is valid from 0 to the max distance, an hashmap/dictionary to an array/list could be used here rather than a nested hashmap/dictionary. This allows for space and time efficient*** creating the set of nodes with distance <= k from A.
The fourth is "dist[A][B] = distance from A to B", this can be done using a nested hashmap/dictionary. This allows for handling the intersection checks fairly quickly.
* If space isn't an issue, then this structure can store all the nodes k or less distance from A, but that requires O(V^3) space and thus time. The main benefit however is that it allow for also storing a separate list of nodes that are greater than k distance. This allows the algorithm use the smaller of the sets, dist > k or dist <= k. Using an intersection in the case of dist <= k and set subtraction in the case of dist <= k or intersection then set subtraction if the main set has the minimize size.

Add a new node (let's say s) and connect it to all the m given nodes.
Then, find all the nodes which are at a distance less than or equal to k+1 from s and subtract m from it. T(n)=O(V+E)

Related

Find the shortest path in a graph which visits all node types

I can't figure out how to proceed with the following problem.
Say I have an unoriented graph with an end node and a start node, I need to find the shortest path between these two nodes, but the path must include all mustpass node types.
There can be up to 10 of these types. This means that I should visit at least one node of each type (marked with a letter in the image) and then go to the end. Once I visit one of the nodes of type B, I may, but need not, visit other nodes of type B. The nodes that are marked with a number simply form a path and do not need to be visited.
This question is very similar to this. There it was suggested to find the shortest path between all the crucial nodes and then use the DFS algorithm. Can I apply the same algorithm to this problem?
Let N be the number of vertices and M be the number of edges.
Break down the solution into two phases.
Phase 1:
Compute the distance between each pair of edges. If the edges are not weighted, this can be easily done by starting a BFS from each node, spending a total of O(N(N+M)) time. If the edges are weighted, you can you the Dijkstra's algorithm on each node, spending a total of O(N(NlogN+M)) time.
After this phase, we have computed dist(x,y) for any pair of nodes x,y.
Phase 2:
Now that we can query the distance between any pair of nodes in O(1) using the precomputed values in phase 1, it is time to put everything together. I can propose two possibilities here
Possibility 1:
Us a similar approach as in the thread you linked. There are 1! factorial orders in which you could visit each node. Let's say we have fixed one possible order [s,t1,t2,...,t10,e] (where s and e are the start / end nodes, and ti represents a type) and we are trying to find out what would be the most optimal way to visit the nodes from start to finish in that order. Since each type can have multiple nodes belonging to it, it is not as simple as querying distances for every consecutive types t_i and t_{i+1}.
What we should do instead if, for every node x, compute the fastest way to reach the end node from node x while respecting the order of types [s,t1,...,t10,e]. So if x is of type t_i, then we are looking for a way to reach e from x while visiting nodes of types t_{i+1}, t_{i+2}, ... t_{10} in that order. Call this value dp[x] (where dp stands for dynamic-programming).
We are looking to find dp[s]. How do we compute dp[x] for a given node x? Simple - iterate through all nodes y of type t_{i+1} and consider d(x,y) + dp[y]. Then we have dp[x] = min{dist(x,y) + dp[y] for all y of type t_{i+1}}. Note that we need to compute dp[x] starting from the nodes of type t_10 all the way back to the nodes of type t_1.
The complexity here is O(10! * N^2)
Possibility 2:
There is actually a much faster way to find the answer and reduce the complexity to O(2^10 * N^3) (which can give massive gains for large N, and especially for larger number of node types (like 20 instead of 10)).
To accomplish this we do the following. For each subset S of the set of types {1,2,...10}, and for each pair of nodes x, y with types in S define
dp[S][x][y], which represents the fastest way to traverse the graph starting from node x, ending in node y and visiting all at least one node for every type in S. Note that we don't care about the actual order. To compute dp[S][x][y] for a given (S,x,y), all we need to do is go over all the possibilities for the second type to visit (node z with type t3). then we update dp[S][x][y] according dist(x,z) + dp[S-t1][z][y] (where t1 is the type of the node x). The number of all the possible subsets along with start and end nodes is 2^10 * N^2. To compute each dp, we consider N possibilities for the second node to visit. So overall we get O(2^10 * N^3)
Note: in all of my analysis above, you can replace the value 10, with a more general K, representing the number of different types possible.

Center of a graph

Given an unoriented tree with weightless edges with N vertices and N-1 edges and a number K find K nodes so that every node from a tree is within S distance of at least one of the K nodes. Also, S has to be the smallest possible S, so that if there were S' < S at least one node would be unreachable in S' steps.
I tried solving this problem, however, I feel that my supposed solution is not very fast.
My solution:
set x=1
find nodes which are x distance from every node
let the node which has the most nodes in its distance be one of the K nodes.
recompute for every node whilst not counting already covered nodes.
do this till I find K number of K nodes. Then if every node is covered we are done else increase x.
This problem is called p-center, and you can find several papers online about it such as this. It is indeed NP for general graphs, but polynomial on trees, both weighted and unweighted.
For me it looks like a clustering problem. Try it with the k-Means (wikipedia) algorithm where k equals to your K. Since you have a tree and all vertices are connected, you can use as distance measurement the distance/number of edges between your vertices.
When the algorithm converts you get the K nodes which should be found. Then you can determine S by iterating through all k clusters. There you calculate the maximum distance for every node in the cluster to the center node. And the overall max should be S.
Update: But actually I see that the k-means algorithm does not produce a global optimum, so this algorithm wouldn't also produce the best result ...
You say N nodes and N-1 vertices so your graph is a tree. You are actually looking for a connected K-subset of nodes minimizing the longest edge.
A polynomial algorithm may be:
Sort all your edges increasing distance.
Then loop on edges:
if none of the 2 nodes are in a group, create a new group.
else if one node is in 1 existing goup, add the other to the group
else both nodes are in 2 different groups, then fuse the groups
When a group reach K, break the loop and you have your connected K-subset.
Nevertheless, you have to note that your group can contain more than K nodes. You can imagine the problem of having 4 nodes, closed two by two. There would be no exact 3-subset solution of your problem.

Updating a tree and keeping track of the change in the nodes of some subtree

Problem:
You are given a rooted tree where each node is numbered from 1 to N. Initially each node contains some positive value, say X. Now we are to perform two type of operations on the tree. Total 100000 operation.
First Type:
Given a node nd and a positive integer V, you need to decrease the value of all the nodes by some amount. If a node is at a distance of d from the given node then decrease its value by floor[v/(2^d)]. Do this for all the nodes.
That means value of node nd will be decreased by V (i.e, floor[V/2^0]). Values of its nearest neighbours will be decreased by floor[V/2] . And so on.
Second Type:
You are given a node nd. You have to tell the number of nodes in the subtree rooted at nd whose value is positive.
Note: Number of nodes in the tree may be upto 100000 and the initial values, X, in the nodes may be upto 1000000000. But the value of V by which the the decrement operation is to performed will be at most 100000.
How can this be done efficiently? I am stuck with this problem for many days. Any help is appreciated.
My Idea : I am thinking to solve this problem offline. I will store all the queries first. then, if somehow I can find the time[After which operation] when some node nd's value becomes less than or equal to zero(say it death time, for each and every node. Then we can do some kind of binary search (probably using Binary Indexed Trees/ Segment Trees) to answer all the queries of second type. But the problem is I am unable to find the death time for each node.
Also I have tried to solve it online using Heavy Light Decomposition but I am unable to solve it using it either.
Thanks!
Given a tree with vertex weights, there exists a vertex that, when chosen as the root, has subtrees whose weights are at most half of the total. This vertex is a "balanced separator".
Here's an O((n + k) polylog(n, k, D))-time algorithm, where n is the number of vertices and k is the number of operations and D is the maximum decrease. In the first phase, we compute the "death time" of each vertex. In the second, we count the live vertices.
To compute the death times, first split each decrease operation into O(log(D)) decrease operations whose arguments are powers of two between 1 and 2^floor(lg(D)) inclusive. Do the following recursively. Let v be a balanced separator, where the weight of a vertex is one plus the number of decrease operations on it. Compute distances from v, then determine, for each time and each power of two, the cumulative number of operations on v with that effective argument (i.e., if a vertex at distance 2 from v is decreased by 2^i, then record a -1 change in the 2^(i - 2) coefficient for v). Partition the operations and vertices by subtree. For each subtree, repeat this cumulative summary for operations originating in the subtree, but make the coefficients positive instead of negative. By putting the summary for a subtree together with v's summary, we determine the influence of decrease operations originating outside of the subtree. Finally, we recurse on each subtree.
Now, for each vertex w, we compute the death time using binary search. The decrease operations affecting w are given in a logarithmic number of summaries computed in the manner previously described, so the total cost for one vertex is log^2.
It sounds as though you, the question asker, know how the next part goes, but for the sake of completeness, I'll describe it. Do a preorder traversal to assign new labels to vertices and also compute for each vertex the interval of labels that comprises its subtree. Initialize a Fenwick tree mapping each vertex to one (live) or zero (dead), initially one. Put the death times and queries in a priority queue. To process a death, decrease the value of that vertex by one. To process a query, sum the values of vertices in the subtree interval.

Finding number of nodes within a certain distance in a rooted tree

In a rooted and weighted tree, how can you find the number of nodes within a certain distance from each node? You only need to consider down edges, e.g. nodes going down from the root. Keep in mind each edge has a weight.
I can do this in O(N^2) time using a DFS from each node and keeping track of the distance traveled, but with N >= 100000 it's a bit slow. I'm pretty sure you could easily solve it with unweighted edges with DP, but anyone know how to solve this one quickly? (Less than N^2)
It's possible to improve my previous answer to O(nlog d) time and O(n) space by making use of the following observation:
The number of sufficiently-close nodes at a given node v is the sum of the numbers of sufficiently-close nodes of each of its children, less the number of nodes that have just become insufficiently-close.
Let's call the distance threshold m, and the distance on the edge between two adjacent nodes u and v d(u, v).
Every node has a single ancestor that is the first ancestor to miss out
For each node v, we will maintain a count, c(v), that is initially 0.
For any node v, consider the chain of ancestors from v's parent up to the root. Call the ith node in this chain a(v, i). Notice that v needs to be counted as sufficiently close in some number i >= 0 of the first nodes in this chain, and in no other nodes. If we are able to quickly find i, then we can simply decrement c(a(v, i+1)) (bringing it (possibly further) below 0), so that when the counts of a(v, i+1)'s children are added to it in a later pass, v is correctly excluded from being counted. Provided we calculate fully accurate counts for all children of a node v before adding them to c(v), any such exclusions are correctly "propagated" to parent counts.
The tricky part is finding i efficiently. Call the sum of the distances of the first j >= 0 edges on the path from v to the root s(v, j), and call the list of all depth(v)+1 of these path lengths, listed in increasing order, s(v). What we want to do is binary-search the list of path lengths s(v) for the first entry greater than the threshold m: this would find i+1 in log(d) time. The problem is constructing s(v). We could easily build it using a running total from v up to the root -- but that would require O(d) time per node, nullifying any time improvement. We need a way to construct s(v) from s(parent(v)) in constant time, but the problem is that as we recurse from a node v to its child u, the path lengths grow "the wrong way": every path length x needs to become x + d(u, v), and a new path length of 0 needs to be added at the beginning. This appears to require O(d) updates, but a trick gets around the problem...
Finding i quickly
The solution is to calculate, at each node v, the total path length t(v) of all edges on the path from v to the root. This is easily done in constant time per node: t(v) = t(parent(v)) + d(v, parent(v)). We can then form s(v) by prepending -t to the beginning of s(parent(v)), and when performing the binary search, consider each element s(v, j) to represent s(v, j) + t (or equivalently, binary search for m - t instead of m). The insertion of -t at the start can be achieved in O(1) time by having a child u of a node v share v's path length array, with s(u) considered to begin one memory location before s(v). All path length arrays are "right-justified" inside a single memory buffer of size d+1 -- specifically, nodes at depth k will have their path length array begin at offset d-k inside the buffer to allow room for their descendant nodes to prepend entries. The array sharing means that sibling nodes will overwrite each other's path lengths, but this is not a problem: we only need the values in s(v) to remain valid while v and v's descendants are processed in the preorder DFS.
In this way we gain the effect of O(d) path length increases in O(1) time. Thus the total time required to find i at a given node is O(1) (to build s(v)) plus O(log d) (to find i using the modified binary search) = O(log d). A single preorder DFS pass is used to find and decrement the appropriate ancestor's count for each node; a postorder DFS pass then sums child counts into parent counts. These two passes can be combined into a single pass over the nodes that performs operations both before and after recursing.
[EDIT: Please see my other answer for an even more efficient O(nlog d) solution :) ]
Here's a simple O(nd)-time, O(n)-space algorithm, where d is the maximum depth of any node in the tree. A complete tree (a tree in which every node has the same number of children) with n nodes has depth d = O(log n), so this should be much faster than your O(n^2) DFS-based approach in most cases, though if the number of sufficiently-close descendants per node is small (i.e. if DFS only traverses a small number of levels) then your algorithm should not be too bad either.
For any node v, consider the chain of ancestors from v's parent up to the root. Notice that v needs to be counted as sufficiently close in some number i >= 0 of the first nodes in this chain, and in no other nodes. So all we need to do is for each node, climb upwards towards the root until such time as the total path length exceeds the threshold distance m, incrementing the count at each ancestor as we go. There are n nodes, and for each node there are at most d ancestors, so this algorithm is trivially O(nd).

Find all subtrees of size N in an undirected graph

Given an undirected graph, I want to generate all subgraphs which are trees of size N, where size refers to the number of edges in the tree.
I am aware that there are a lot of them (exponentially many at least for graphs with constant connectivity) - but that's fine, as I believe the number of nodes and edges makes this tractable for at least smallish values of N (say 10 or less).
The algorithm should be memory-efficient - that is, it shouldn't need to have all graphs or some large subset of them in memory at once, since this is likely to exceed available memory even for relatively small graphs. So something like DFS is desirable.
Here's what I'm thinking, in pseudo-code, given the starting graph graph and desired length N:
Pick any arbitrary node, root as a starting point and call alltrees(graph, N, root)
alltrees(graph, N, root)
given that node root has degree M, find all M-tuples with integer, non-negative values whose values sum to N (for example, for 3 children and N=2, you have (0,0,2), (0,2,0), (2,0,0), (0,1,1), (1,0,1), (1,1,0), I think)
for each tuple (X1, X2, ... XM) above
create a subgraph "current" initially empty
for each integer Xi in X1...XM (the current tuple)
if Xi is nonzero
add edge i incident on root to the current tree
add alltrees(graph with root removed, N-1, node adjacent to root along edge i)
add the current tree to the set of all trees
return the set of all trees
This finds only trees containing the chosen initial root, so now remove this node and call alltrees(graph with root removed, N, new arbitrarily chosen root), and repeat until the size of the remaining graph < N (since no trees of the required size will exist).
I forgot also that each visited node (each root for some call of alltrees) needs to be marked, and the set of children considered above should only be the adjacent unmarked children. I guess we need to account for the case where no unmarked children exist, yet depth > 0, this means that this "branch" failed to reach the required depth, and cannot form part of the solution set (so the whole inner loop associated with that tuple can be aborted).
So will this work? Any major flaws? Any simpler/known/canonical way to do this?
One issue with the algorithm outlined above is that it doesn't satisfy the memory-efficient requirement, as the recursion will hold large sets of trees in memory.
This needs an amount of memory that is proportional to what is required to store the graph. It will return every subgraph that is a tree of the desired size exactly once.
Keep in mind that I just typed it into here. There could be bugs. But the idea is that you walk the nodes one at a time, for each node searching for all trees that include that node, but none of the nodes that were searched previously. (Because those have already been exhausted.) That inner search is done recursively by listing edges to nodes in the tree, and for each edge deciding whether or not to include it in your tree. (If it would make a cycle, or add an exhausted node, then you can't include that edge.) If you include it your tree then the used nodes grow, and you have new possible edges to add to your search.
To reduce memory use, the edges that are left to look at is manipulated in place by all of the levels of the recursive call rather than the more obvious approach of duplicating that data at each level. If that list was copied, your total memory usage would get up to the size of the tree times the number of edges in the graph.
def find_all_trees(graph, tree_length):
exhausted_node = set([])
used_node = set([])
used_edge = set([])
current_edge_groups = []
def finish_all_trees(remaining_length, edge_group, edge_position):
while edge_group < len(current_edge_groups):
edges = current_edge_groups[edge_group]
while edge_position < len(edges):
edge = edges[edge_position]
edge_position += 1
(node1, node2) = nodes(edge)
if node1 in exhausted_node or node2 in exhausted_node:
continue
node = node1
if node1 in used_node:
if node2 in used_node:
continue
else:
node = node2
used_node.add(node)
used_edge.add(edge)
edge_groups.append(neighbors(graph, node))
if 1 == remaining_length:
yield build_tree(graph, used_node, used_edge)
else:
for tree in finish_all_trees(remaining_length -1
, edge_group, edge_position):
yield tree
edge_groups.pop()
used_edge.delete(edge)
used_node.delete(node)
edge_position = 0
edge_group += 1
for node in all_nodes(graph):
used_node.add(node)
edge_groups.append(neighbors(graph, node))
for tree in finish_all_trees(tree_length, 0, 0):
yield tree
edge_groups.pop()
used_node.delete(node)
exhausted_node.add(node)
Assuming you can destroy the original graph or make a destroyable copy I came up to something that could work but could be utter sadomaso because I did not calculate its O-Ntiness. It probably would work for small subtrees.
do it in steps, at each step:
sort the graph nodes so you get a list of nodes sorted by number of adjacent edges ASC
process all nodes with the same number of edges of the first one
remove those nodes
For an example for a graph of 6 nodes finding all size 2 subgraphs (sorry for my total lack of artistic expression):
Well the same would go for a bigger graph, but it should be done in more steps.
Assuming:
Z number of edges of most ramificated node
M desired subtree size
S number of steps
Ns number of nodes in step
assuming quicksort for sorting nodes
Worst case:
S*(Ns^2 + MNsZ)
Average case:
S*(NslogNs + MNs(Z/2))
Problem is: cannot calculate the real omicron because the nodes in each step will decrease depending how is the graph...
Solving the whole thing with this approach could be very time consuming on a graph with very connected nodes, however it could be paralelized, and you could do one or two steps, to remove dislocated nodes, extract all subgraphs, and then choose another approach on the remainder, but you would have removed a lot of nodes from the graph so it could decrease the remaining run time...
Unfortunately this approach would benefit the GPU not the CPU, since a LOT of nodes with the same number of edges would go in each step.... and if parallelization is not used this approach is probably bad...
Maybe an inverse would go better with the CPU, sort and proceed with nodes with the maximum number of edges... those will be probably less at start, but you will have more subgraphs to extract from each node...
Another possibility is to calculate the least occuring egde count in the graph and start with nodes that have it, that would alleviate the memory usage and iteration count for extracting subgraphs...
Unless I'm reading the question wrong people seem to be overcomplicating it.
This is just "all possible paths within N edges" and you're allowing cycles.
This, for two nodes: A, B and one edge your result would be:
AA, AB, BA, BB
For two nodes, two edges your result would be:
AAA, AAB, ABA, ABB, BAA, BAB, BBA, BBB
I would recurse into a for each and pass in a "template" tuple
N=edge count
TempTuple = Tuple_of_N_Items ' (01,02,03,...0n) (Could also be an ordered list!)
ListOfTuple_of_N_Items ' Paths (could also be an ordered list!)
edgeDepth = N
Method (Nodes, edgeDepth, TupleTemplate, ListOfTuples, EdgeTotal)
edgeDepth -=1
For Each Node In Nodes
if edgeDepth = 0 'Last Edge
ListOfTuples.Add New Tuple from TupleTemplate + Node ' (x,y,z,...,Node)
else
NewTupleTemplate = TupleTemplate + Node ' (x,y,z,Node,...,0n)
Method(Nodes, edgeDepth, NewTupleTemplate, ListOfTuples, EdgeTotal
next
This will create every possible combination of vertices for a given edge count
What's missing is the factory to generate tuples given an edge count.
You end up with a list of possible paths and the operation is Nodes^(N+1)
If you use ordered lists instead of tuples then you don't need to worry about a factory to create the objects.
If memory is the biggest problem you can use a NP-ish solution using tools from formal verification. I.e., guess a subset of nodes of size N and check whether it's a graph or not. To save space you can use a BDD (http://en.wikipedia.org/wiki/Binary_decision_diagram) to represent the original graph's nodes and edges. Plus you can use a symbolic algorithm to check if the graph you guessed is really a graph - so you don't need to construct the original graph (nor the N-sized graphs) at any point. Your memory consumption should be (in big-O) log(n) (where n is the size of the original graph) to store the original graph, and another log(N) to store every "small graph" you want.
Another tool (which is supposed to be even better) is to use a SAT solver. I.e., construct a SAT formula that is true iff the sub-graph is a graph and supply it to a SAT solver.
For a graph of Kn there are approximately n! paths between any two pairs of vertices. I haven't gone through your code but here is what I would do.
Select a pair of vertices.
Start from a vertex and try to reach the destination vertex recursively (something like dfs but not exactly). I think this would output all the paths between the chosen vertices.
You could do the above for all possible pairs of vertices to get all simple paths.
It seems that the following solution will work.
Go over all partitions into two parts of the set of all vertices. Then count the number of edges which endings lie in different parts (k); these edges correspond to the edge of the tree, they connect subtrees for the first and the second parts. Calculate the answer for both parts recursively (p1, p2). Then the answer for the entire graph can be calculated as sum over all such partitions of k*p1*p2. But all trees will be considered N times: once for each edge. So, the sum must be divided by N to get the answer.
Your solution as is doesn't work I think, although it can be made to work. The main problem is that the subproblems may produce overlapping trees so when you take the union of them you don't end up with a tree of size n. You can reject all solutions where there is an overlap, but you may end up doing a lot more work than needed.
Since you are ok with exponential runtime, and potentially writing 2^n trees out, having V.2^V algorithms is not not bad at all. So the simplest way of doing it would be to generate all possible subsets n nodes, and then test each one if it forms a tree. Since testing whether a subset of nodes form a tree can take O(E.V) time, we are potentially talking about V^2.V^n time, unless you have a graph with O(1) degree. This can be improved slightly by enumerating subsets in a way that two successive subsets differ in exactly one node being swapped. In that case, you just have to check if the new node is connected to any of the existing nodes, which can be done in time proportional to number of outgoing edges of new node by keeping a hash table of all existing nodes.
The next question is how do you enumerate all the subsets of a given size
such that no more than one element is swapped between succesive subsets. I'll leave that as an exercise for you to figure out :)
I think there is a good algorithm (with Perl implementation) at this site (look for TGE), but if you want to use it commercially you'll need to contact the author. The algorithm is similar to yours in the question but avoids the recursion explosion by making the procedure include a current working subtree as a parameter (rather than a single node). That way each edge emanating from the subtree can be selectively included/excluded, and recurse on the expanded tree (with the new edge) and/or reduced graph (without the edge).
This sort of approach is typical of graph enumeration algorithms -- you usually need to keep track of a handful of building blocks that are themselves graphs; if you try to only deal with nodes and edges it becomes intractable.
This algorithm is big and not easy one to post here. But here is link to reservation search algorithm using which you can do what you want. This pdf file contains both algorithms. Also if you understand russian you can take a look to this.
So you have a graph with with edges e_1, e_2, ..., e_E.
If I understand correctly, you are looking to enumerate all subgraphs which are trees and contain N edges.
A simple solution is to generate each of the E choose N subgraphs and check if they are trees.
Have you considered this approach? Of course if E is too large then this is not viable.
EDIT:
We can also use the fact that a tree is a combination of trees, i.e. that each tree of size N can be "grown" by adding an edge to a tree of size N-1. Let E be the set of edges in the graph. An algorithm could then go something like this.
T = E
n = 1
while n<N
newT = empty set
for each tree t in T
for each edge e in E
if t+e is a tree of size n+1 which is not yet in newT
add t+e to newT
T = newT
n = n+1
At the end of this algorithm, T is the set of all subtrees of size N. If space is an issue, don't keep a full list of the trees, but use a compact representation, for instance implement T as a decision tree using ID3.
I think problem is under-specified. You mentioned that graph is undirected and that subgraph you are trying to find is of size N. What is missing is number of edges and whenever trees you are looking for binary or you allowed to have multi-trees. Also - are you interested in mirrored reflections of same tree, or in other words does order in which siblings are listed matters at all?
If single node in a tree you trying to find allowed to have more than 2 siblings which should be allowed given that you don't specify any restriction on initial graph and you mentioned that resulting subgraph should contain all nodes.
You can enumerate all subgraphs that have form of tree by performing depth-first traversal. You need to repeat traversal of the graph for every sibling during traversal. When you'll need to repeat operation for every node as a root.
Discarding symmetric trees you will end up with
N^(N-2)
trees if your graph is fully connected mesh or you need to apply Kirchhoff's Matrix-tree theorem

Resources