find a node of a tree in another tree - algorithm

My problem is:
Consider two trees P and R. I need to match the node at the deepest possible level of P with the node at deepest possible level of tree R. That means, all nodes in a tree are like a hierarchical relation from most general to most specific. The most specific match from tree P with tree R should be found.
The most optimal method is needed.
For example, lets have a Reviewers' panel. Each reviewer has his own tree of interests going from general interest to specific like from Energy to Biogas plant. Now there's a paper to be matched with the reviewer's interests. The reviewer with the most specific match with the paper's category is to be found. Each paper also has its category tree from most general category to exactly specific category.

EDIT: Fixed expression for depth difference
Decide how much importance you want to give to matching nodes on their depth, versus on their "similarity". Use this to make a scoring function s(x, y) = a*(-|depth(x) - depth(y)|) + (1-a)*(similarity(x, y)). (similarity(x, y) can be any function of x and y -- e.g. it might be the length of their longest common subsequence, if x and y are strings.)
(Conceptually) create a vertex for each node in tree 1, a vertex for each node in tree 2, and an edge for every pair of vertices (x, y) with x in the first tree and y in the second. Set the weight of this edge to s(x, y).
You now have a bipartite maximum weighted matching problem, a.k.a. Assignment Problem. Apply the Hungarian algorithm to find the optimal solution in O(n^3) time.

You can solve it using Trie, where the root has the most generalized categories and its children has more specific categories and their children have more and more specific. You need to find the longest matching starting from the root.

Related

Find the shortest path in a graph which visits all node types

I can't figure out how to proceed with the following problem.
Say I have an unoriented graph with an end node and a start node, I need to find the shortest path between these two nodes, but the path must include all mustpass node types.
There can be up to 10 of these types. This means that I should visit at least one node of each type (marked with a letter in the image) and then go to the end. Once I visit one of the nodes of type B, I may, but need not, visit other nodes of type B. The nodes that are marked with a number simply form a path and do not need to be visited.
This question is very similar to this. There it was suggested to find the shortest path between all the crucial nodes and then use the DFS algorithm. Can I apply the same algorithm to this problem?
Let N be the number of vertices and M be the number of edges.
Break down the solution into two phases.
Phase 1:
Compute the distance between each pair of edges. If the edges are not weighted, this can be easily done by starting a BFS from each node, spending a total of O(N(N+M)) time. If the edges are weighted, you can you the Dijkstra's algorithm on each node, spending a total of O(N(NlogN+M)) time.
After this phase, we have computed dist(x,y) for any pair of nodes x,y.
Phase 2:
Now that we can query the distance between any pair of nodes in O(1) using the precomputed values in phase 1, it is time to put everything together. I can propose two possibilities here
Possibility 1:
Us a similar approach as in the thread you linked. There are 1! factorial orders in which you could visit each node. Let's say we have fixed one possible order [s,t1,t2,...,t10,e] (where s and e are the start / end nodes, and ti represents a type) and we are trying to find out what would be the most optimal way to visit the nodes from start to finish in that order. Since each type can have multiple nodes belonging to it, it is not as simple as querying distances for every consecutive types t_i and t_{i+1}.
What we should do instead if, for every node x, compute the fastest way to reach the end node from node x while respecting the order of types [s,t1,...,t10,e]. So if x is of type t_i, then we are looking for a way to reach e from x while visiting nodes of types t_{i+1}, t_{i+2}, ... t_{10} in that order. Call this value dp[x] (where dp stands for dynamic-programming).
We are looking to find dp[s]. How do we compute dp[x] for a given node x? Simple - iterate through all nodes y of type t_{i+1} and consider d(x,y) + dp[y]. Then we have dp[x] = min{dist(x,y) + dp[y] for all y of type t_{i+1}}. Note that we need to compute dp[x] starting from the nodes of type t_10 all the way back to the nodes of type t_1.
The complexity here is O(10! * N^2)
Possibility 2:
There is actually a much faster way to find the answer and reduce the complexity to O(2^10 * N^3) (which can give massive gains for large N, and especially for larger number of node types (like 20 instead of 10)).
To accomplish this we do the following. For each subset S of the set of types {1,2,...10}, and for each pair of nodes x, y with types in S define
dp[S][x][y], which represents the fastest way to traverse the graph starting from node x, ending in node y and visiting all at least one node for every type in S. Note that we don't care about the actual order. To compute dp[S][x][y] for a given (S,x,y), all we need to do is go over all the possibilities for the second type to visit (node z with type t3). then we update dp[S][x][y] according dist(x,z) + dp[S-t1][z][y] (where t1 is the type of the node x). The number of all the possible subsets along with start and end nodes is 2^10 * N^2. To compute each dp, we consider N possibilities for the second node to visit. So overall we get O(2^10 * N^3)
Note: in all of my analysis above, you can replace the value 10, with a more general K, representing the number of different types possible.

Build top-list from player-player duel result

I have a list of duels between players. The data consists of 2 user IDs, where the first one is the winner.
How can I build a graph of this list to find the best players?
Furthermore, how do I decide what it means to be best?
Perhaps players should be ranked by the number of opponents beaten, and the rank of those opponents (recursively).
I have previously tried doing this using the PageRank algorithm, but it does not account for losses in a good way (i.e. the rank should go down from a loss).
For example:
1 won against 3
1 won against 4
1 won against 5
2 won against 1
This list should put 2 at the top, because it beat 1.
This presents one problem - it should be required to duel those with a high rank.
Those who have not dueled players above a certain rank should be told to do so, in order to be in the top-list.
Define player X beating player Y as a relation such that there exists vertices X and Y and there exists an edge from Y to X.
Then, after processing all game information, you may run DFS on the graph, recording in some array A the nodes from which you can not traverse deeper.
As you did not specify that the given graph is a tree, considering also that the edges are directed, there are no guarantees that DFS starting from any node will converge to a single root, so you need to keep some sort of a list of such nodes that beat others.
Once that initial traversal is done, reverse all the edges, and run a DFS for each tree in the forest that is your graph and root of each is an element in A. As you traverse the tree rooted in A[i], record in each node the depth it is located at, relative to the root node, A[i].
Then on, depending on your definition of top players, you may traverse the roots in A and go as deep as that definition allows you to, picking every element you encounter. If the final list you require should actually sort the nodes descendant of different roots in A, you may sort the final structure you will have your list in, using the depth as the comparison criterion. Aside from the final sorting I mentioned, as all we did so far is DFS, this approach is O(V+E), V being the number of vertices and E, the number of edges. If you take into account the sorting of elements in different trees, then you'd have an overall complexity of O((V+E) + VlogV).
If you are willing to sacrifice a bit more of the performance, then you may connect the roots in A to a global root R (i.e. add node R to graph and edges from R to each A[i]) and run Dijkstra's algorithm, visiting the nodes with less depth first, and basically appending each visited node to your list until you consider you list is large enough, based on your definition of top players.
Note that this solution does not work if you have cycles in the graph, regardless of whether you use DFS or Dijkstra's for the final traversal. However it may be adapted to players having multiple matches by using edges with positive weights. An edge from X to Y with weight k would then indicate the number of times X defeated Y, which you will take into account while updating the depth of node during your traversal with DFS.

How can the shortest path between two nodes in a balanced binary tree be affected by path 'weight'?

I am following an online introductory algorithms course with Udacity.
In the final assessment there is a question as follows:
In the shortest-path oracle described in Andrew Goldberg's interview,
each node has a label, which is a list of some other nodes in the
network and their distance to these nodes. These lists have the
property that:
(1) for any pair of nodes (x,y) in the network, their lists will have
at least one node z in common
(2) the shortest path from x to y will go through z. Given a graph G
that is a balanced binary tree, preprocess the graph to create such
labels for each node. Note that the size of the list in each label
should not be larger than log n for a graph of size n.
The full question can be found here.
Given the constraint of a balanced binary tree and the hint that the size should not be larger than log n, intuitively it seems that the label for a particular node would consist of all its parents (and optionally itself, if it isn't a leaf).
However some additional instructor notes in the question adds:
Write your solution to work on weighted graphs. Note that the test
given, all the edges have a weight of one - which isn't particularly
interesting.
So my question is:
How can the shortest path between two nodes in a binary tree be affected by whether the paths have weights or not?
Surely in a binary tree, the shortest path between two nodes is the unique simple path, and is unaffected by any weighting?
(unless weights can be negative and the path doesn't have to be simple in which case there is no shortest path?)
My basic solution works with the simple test provided in the question, but fails to pass the automatic grader which gives no feedback.
I'm obviously misunderstanding something, but what...
Ok, so I think my initial reaction and the obvious answer is correct:
Positive weights cannot affect the shortest path between two nodes in a binary tree.
On the other hand, weights obviously do affect the shortest 'distance' between two nodes in a binary tree as compared to simply calculating 'distance' between nodes as the number of hops.
This is what the udacity instruction was getting at.
It seems that this instruction to work with weighted binary trees was simply to enable correct automatic grading of the code which relied on using the labels to calculate the exact shortest 'distance' (which is affected by weight) as opposed to the shortest path (list of nodes) which is not.
Once I modified my algorithm to take this into account and output the correct distance, it passed the grader.

Is there a way to keep direction priorities in A*? (ie. Generating the same path as breadth-first)

I have an application that would benefit from using A*; however, for legacy reasons, I need it to continue generating exactly the same paths it did before when there are multiple best-paths to choose from.
For example, consider this maze
...X
FX.S
....
S = start
F = finish
X = wall
. = empty space
with direction-priorities Up; Right; Down; Left. Using breadth-first, we will find the path DLLLU; however, using A* we immediately go left, and end up finding the path LULLD.
I've tried making sure to always expand in the correct direction when breaking ties; and overwriting the PreviousNode pointers when moving from a more important direction, but neither works in that example. Is there a way to do this?
If the original algorithm was BFS, you are looking for the smallest of the shortest paths where "smallest" is according to the lexicographic order induced by some total order Ord on the edges (and of course "shortest" is according to path length).
The idea of tweaking weights suggested by amit is a natural one, but I don't think it is very practical because the weights would need to have a number of bits comparable to the length of a path to avoid discarding information, which would make the algorithm orders of magnitude slower.
Thankfully this can still be done with two simple and inexpensive modifications to A*:
Once we reach the goal, instead of returning an arbitrary shortest path to the goal, we should continue visiting nodes until the path length increases, so that we visit all nodes that belong to a shortest path.
When reconstructing the path, we build the set of nodes that contribute to the shortest paths. This set has a DAG structure when considering all shortest path edges, and it is now easy to find the lexicography smallest path from start to goal in this DAG, which is the desired solution.
Schematically, classic A* is:
path_length = infinity for every node
path_length[start] = 0
while score(goal) > minimal score of unvisited nodes:
x := any unvisited node with minimal score
mark x as visited
for y in unvisited neighbors of x:
path_length_through_x = path_length[x] + d(x,y)
if path_length[y] > path_length_through_x:
path_length[y] = path_length_through_x
ancestor[y] = x
return [..., ancestor[ancestor[goal]], ancestor[goal], goal]
where score(x) stands for path_length[x] + heuristic(x, goal).
We simply turn the strict while loop inequality into a non-strict one and add a path reconstruction phase:
path_length = infinity for every node
path_length[start] = 0
while score(goal) >= minimal score of unvisited nodes:
x := any unvisited node with minimal score
mark x as visited
for y in unvisited neighbors of x:
path_length_through_x = path_length[x] + d(x,y)
if path_length[y] > path_length_through_x:
path_length[y] = path_length_through_x
optimal_nodes = [goal]
for every x in optimal_nodes: // note: we dynamically add nodes in the loop
for y in neighbors of x not in optimal_nodes:
if path_length[x] == path_length[y] + d(x,y):
add y to optimal_nodes
path = [start]
x = start
while x != goal:
z = undefined
for y in neighbors of x that are in optimal_nodes:
if path_length[y] == path_length[x] + d(x,y):
z = y if (x,y) is smaller than (x,z) according to Ord
x = z
append x to path
return path
Warning: to quote Knuth, I have only proven it correct, not tried it.
As for the performance impact, it should be minimal: the search loop only visits nodes with a score that is 1 unit higher than classic A*, and the reconstruction phase is quasi-linear in the number of nodes that belong to a shortest path. The impact is smaller if, as you imply, there is only one shortest path in most cases. You can even optimize for this special case e.g. by remembering an ancestor node as in the classic case, which you set to a special error value when there is more than one ancestor (that is, when path_length[y] == path_length_through_x). Once the search loop is over, you attempt to retrieve a path through ancestor as in classic A*; you only need to execute the full path reconstruction if an error value was encountered when building the path.
i would build in the preference on the path order directly into the heuristic function
i would look at the bread-first algorithm first
define a function for every path that the bread-first algorithm chooses:
consider we are running a depth-first algorithm, and it's at n-th depth
the previously done decisions by the algo: x_i \in {U,R,D,L}
assign U=0,R=1,D=2,L=3
then define:
g(x_1,..,x_n) = sum_{i=1}^n x_i * (1/4)^i
let's fix this step's g value as g'
at every step when the algorithm visites a more deeper node than this one, the g() function would be greater.
at every future step when on of {1..n} x_i is changed, it will be greater hence it's true that the g function always raises while running depth-first.
note:if the depth-first algorithm is successfull, it selects the path with the minimal g() value
note: g() < 1 beacuse max(L,R,U,D)=3
adding g to the A*'s heuristic function won't interfere with the shortest path length because min edge length>=1
the first solution an A* modified like this would found would be the one that the depth-first would find
for you example:
h_bread=g(DLLLU) = (23330)_4 * c
h_astar=g(LULLD) = (30332)_4 * c
()_4 is base4
c is a constant (4^{-5})
for you example: h_bread < h_astar
I've come up with two ways of doing this. Both require continuing the algorithm while the top of the queue has distance-to-start g-value <= g(end-node). Since the heuristic used in A* is admissable, this guarantees that every node that belongs to some best-path will eventually be expanded.
The first method is, when we come to a conflict (ie. we find two nodes with the same f-value that could potentially both be the parent of some node along the best-path), we resolve it by backtracking to the first point along the path that they meet (we can do this easily in O(path-length)). We then simply check the direction priorities of both paths, and go with whichever path would have the higher priority in a BFS-search.
The second method only works for grids where each node touches the horizonally- and vertically- (and possibly diagonally-) adjacent nodes (ie. 4-connected grid-graphs). We do the same thing as before, but instead of backtracking to resolve a conflict, we compare the nodes along the paths from the start, and find the first place they differ. The first place they differ will be the same critical node as before, from which we can check direction-priorities.
We do this by storing the best path so far for each node. Normally this would be cumbersome, but since we have a 4-connected graph, we can do this pretty efficiently by storing each direction taken along the path. This will take only 2-bits per node. Thus, we can essentially encode the path using integers - with 32-bit registers we can compare 16 nodes at a time; 32 nodes with 64-bit registers; and 64(!) nodes at a time with 128-bit registers (like the SSE registers in x86 and x64 processors), making this search very inexpensive even for paths with 100's of nodes.
I implemented both of these, along with #generic human's algorithm, to test the speed. On a 50x50 grid with 400 towers,
#generic human's algorithm ran about 120% slower than normal A*
my backtracking algorithm ran about 55% slower than normal A*
The integer-encoding algorithm only ran less than 10% slower than A*
Thus, since my application uses 4-connected graphs, it seems the integer-encoding algorithm is best for me.
I've copied an email I wrote to a professor here. It includes more detailed descriptions of the algorithms, along with sketches of proofs that they work.
In general, there is no non-trivial way to do this:
Breadth-first search finds the shortest path of lowest order determined by the order in which vertices are considered. And this order must take precedence over any other factor when breaking ties between paths of equal length.
Example: If the nodes are considered in the order A, B, C, then Node A < Node C. Thus if there is a tie between a shortest path beginning with A and one beginning with C, the one with A will found.
On the other hand, A* search will find the shortest path of lowest order determined by the heuristic value of the node. Thus the heuristic must take into account the lowest lexicographic path to each node. And the only way to find that is BFS.

Find all subtrees of size N in an undirected graph

Given an undirected graph, I want to generate all subgraphs which are trees of size N, where size refers to the number of edges in the tree.
I am aware that there are a lot of them (exponentially many at least for graphs with constant connectivity) - but that's fine, as I believe the number of nodes and edges makes this tractable for at least smallish values of N (say 10 or less).
The algorithm should be memory-efficient - that is, it shouldn't need to have all graphs or some large subset of them in memory at once, since this is likely to exceed available memory even for relatively small graphs. So something like DFS is desirable.
Here's what I'm thinking, in pseudo-code, given the starting graph graph and desired length N:
Pick any arbitrary node, root as a starting point and call alltrees(graph, N, root)
alltrees(graph, N, root)
given that node root has degree M, find all M-tuples with integer, non-negative values whose values sum to N (for example, for 3 children and N=2, you have (0,0,2), (0,2,0), (2,0,0), (0,1,1), (1,0,1), (1,1,0), I think)
for each tuple (X1, X2, ... XM) above
create a subgraph "current" initially empty
for each integer Xi in X1...XM (the current tuple)
if Xi is nonzero
add edge i incident on root to the current tree
add alltrees(graph with root removed, N-1, node adjacent to root along edge i)
add the current tree to the set of all trees
return the set of all trees
This finds only trees containing the chosen initial root, so now remove this node and call alltrees(graph with root removed, N, new arbitrarily chosen root), and repeat until the size of the remaining graph < N (since no trees of the required size will exist).
I forgot also that each visited node (each root for some call of alltrees) needs to be marked, and the set of children considered above should only be the adjacent unmarked children. I guess we need to account for the case where no unmarked children exist, yet depth > 0, this means that this "branch" failed to reach the required depth, and cannot form part of the solution set (so the whole inner loop associated with that tuple can be aborted).
So will this work? Any major flaws? Any simpler/known/canonical way to do this?
One issue with the algorithm outlined above is that it doesn't satisfy the memory-efficient requirement, as the recursion will hold large sets of trees in memory.
This needs an amount of memory that is proportional to what is required to store the graph. It will return every subgraph that is a tree of the desired size exactly once.
Keep in mind that I just typed it into here. There could be bugs. But the idea is that you walk the nodes one at a time, for each node searching for all trees that include that node, but none of the nodes that were searched previously. (Because those have already been exhausted.) That inner search is done recursively by listing edges to nodes in the tree, and for each edge deciding whether or not to include it in your tree. (If it would make a cycle, or add an exhausted node, then you can't include that edge.) If you include it your tree then the used nodes grow, and you have new possible edges to add to your search.
To reduce memory use, the edges that are left to look at is manipulated in place by all of the levels of the recursive call rather than the more obvious approach of duplicating that data at each level. If that list was copied, your total memory usage would get up to the size of the tree times the number of edges in the graph.
def find_all_trees(graph, tree_length):
exhausted_node = set([])
used_node = set([])
used_edge = set([])
current_edge_groups = []
def finish_all_trees(remaining_length, edge_group, edge_position):
while edge_group < len(current_edge_groups):
edges = current_edge_groups[edge_group]
while edge_position < len(edges):
edge = edges[edge_position]
edge_position += 1
(node1, node2) = nodes(edge)
if node1 in exhausted_node or node2 in exhausted_node:
continue
node = node1
if node1 in used_node:
if node2 in used_node:
continue
else:
node = node2
used_node.add(node)
used_edge.add(edge)
edge_groups.append(neighbors(graph, node))
if 1 == remaining_length:
yield build_tree(graph, used_node, used_edge)
else:
for tree in finish_all_trees(remaining_length -1
, edge_group, edge_position):
yield tree
edge_groups.pop()
used_edge.delete(edge)
used_node.delete(node)
edge_position = 0
edge_group += 1
for node in all_nodes(graph):
used_node.add(node)
edge_groups.append(neighbors(graph, node))
for tree in finish_all_trees(tree_length, 0, 0):
yield tree
edge_groups.pop()
used_node.delete(node)
exhausted_node.add(node)
Assuming you can destroy the original graph or make a destroyable copy I came up to something that could work but could be utter sadomaso because I did not calculate its O-Ntiness. It probably would work for small subtrees.
do it in steps, at each step:
sort the graph nodes so you get a list of nodes sorted by number of adjacent edges ASC
process all nodes with the same number of edges of the first one
remove those nodes
For an example for a graph of 6 nodes finding all size 2 subgraphs (sorry for my total lack of artistic expression):
Well the same would go for a bigger graph, but it should be done in more steps.
Assuming:
Z number of edges of most ramificated node
M desired subtree size
S number of steps
Ns number of nodes in step
assuming quicksort for sorting nodes
Worst case:
S*(Ns^2 + MNsZ)
Average case:
S*(NslogNs + MNs(Z/2))
Problem is: cannot calculate the real omicron because the nodes in each step will decrease depending how is the graph...
Solving the whole thing with this approach could be very time consuming on a graph with very connected nodes, however it could be paralelized, and you could do one or two steps, to remove dislocated nodes, extract all subgraphs, and then choose another approach on the remainder, but you would have removed a lot of nodes from the graph so it could decrease the remaining run time...
Unfortunately this approach would benefit the GPU not the CPU, since a LOT of nodes with the same number of edges would go in each step.... and if parallelization is not used this approach is probably bad...
Maybe an inverse would go better with the CPU, sort and proceed with nodes with the maximum number of edges... those will be probably less at start, but you will have more subgraphs to extract from each node...
Another possibility is to calculate the least occuring egde count in the graph and start with nodes that have it, that would alleviate the memory usage and iteration count for extracting subgraphs...
Unless I'm reading the question wrong people seem to be overcomplicating it.
This is just "all possible paths within N edges" and you're allowing cycles.
This, for two nodes: A, B and one edge your result would be:
AA, AB, BA, BB
For two nodes, two edges your result would be:
AAA, AAB, ABA, ABB, BAA, BAB, BBA, BBB
I would recurse into a for each and pass in a "template" tuple
N=edge count
TempTuple = Tuple_of_N_Items ' (01,02,03,...0n) (Could also be an ordered list!)
ListOfTuple_of_N_Items ' Paths (could also be an ordered list!)
edgeDepth = N
Method (Nodes, edgeDepth, TupleTemplate, ListOfTuples, EdgeTotal)
edgeDepth -=1
For Each Node In Nodes
if edgeDepth = 0 'Last Edge
ListOfTuples.Add New Tuple from TupleTemplate + Node ' (x,y,z,...,Node)
else
NewTupleTemplate = TupleTemplate + Node ' (x,y,z,Node,...,0n)
Method(Nodes, edgeDepth, NewTupleTemplate, ListOfTuples, EdgeTotal
next
This will create every possible combination of vertices for a given edge count
What's missing is the factory to generate tuples given an edge count.
You end up with a list of possible paths and the operation is Nodes^(N+1)
If you use ordered lists instead of tuples then you don't need to worry about a factory to create the objects.
If memory is the biggest problem you can use a NP-ish solution using tools from formal verification. I.e., guess a subset of nodes of size N and check whether it's a graph or not. To save space you can use a BDD (http://en.wikipedia.org/wiki/Binary_decision_diagram) to represent the original graph's nodes and edges. Plus you can use a symbolic algorithm to check if the graph you guessed is really a graph - so you don't need to construct the original graph (nor the N-sized graphs) at any point. Your memory consumption should be (in big-O) log(n) (where n is the size of the original graph) to store the original graph, and another log(N) to store every "small graph" you want.
Another tool (which is supposed to be even better) is to use a SAT solver. I.e., construct a SAT formula that is true iff the sub-graph is a graph and supply it to a SAT solver.
For a graph of Kn there are approximately n! paths between any two pairs of vertices. I haven't gone through your code but here is what I would do.
Select a pair of vertices.
Start from a vertex and try to reach the destination vertex recursively (something like dfs but not exactly). I think this would output all the paths between the chosen vertices.
You could do the above for all possible pairs of vertices to get all simple paths.
It seems that the following solution will work.
Go over all partitions into two parts of the set of all vertices. Then count the number of edges which endings lie in different parts (k); these edges correspond to the edge of the tree, they connect subtrees for the first and the second parts. Calculate the answer for both parts recursively (p1, p2). Then the answer for the entire graph can be calculated as sum over all such partitions of k*p1*p2. But all trees will be considered N times: once for each edge. So, the sum must be divided by N to get the answer.
Your solution as is doesn't work I think, although it can be made to work. The main problem is that the subproblems may produce overlapping trees so when you take the union of them you don't end up with a tree of size n. You can reject all solutions where there is an overlap, but you may end up doing a lot more work than needed.
Since you are ok with exponential runtime, and potentially writing 2^n trees out, having V.2^V algorithms is not not bad at all. So the simplest way of doing it would be to generate all possible subsets n nodes, and then test each one if it forms a tree. Since testing whether a subset of nodes form a tree can take O(E.V) time, we are potentially talking about V^2.V^n time, unless you have a graph with O(1) degree. This can be improved slightly by enumerating subsets in a way that two successive subsets differ in exactly one node being swapped. In that case, you just have to check if the new node is connected to any of the existing nodes, which can be done in time proportional to number of outgoing edges of new node by keeping a hash table of all existing nodes.
The next question is how do you enumerate all the subsets of a given size
such that no more than one element is swapped between succesive subsets. I'll leave that as an exercise for you to figure out :)
I think there is a good algorithm (with Perl implementation) at this site (look for TGE), but if you want to use it commercially you'll need to contact the author. The algorithm is similar to yours in the question but avoids the recursion explosion by making the procedure include a current working subtree as a parameter (rather than a single node). That way each edge emanating from the subtree can be selectively included/excluded, and recurse on the expanded tree (with the new edge) and/or reduced graph (without the edge).
This sort of approach is typical of graph enumeration algorithms -- you usually need to keep track of a handful of building blocks that are themselves graphs; if you try to only deal with nodes and edges it becomes intractable.
This algorithm is big and not easy one to post here. But here is link to reservation search algorithm using which you can do what you want. This pdf file contains both algorithms. Also if you understand russian you can take a look to this.
So you have a graph with with edges e_1, e_2, ..., e_E.
If I understand correctly, you are looking to enumerate all subgraphs which are trees and contain N edges.
A simple solution is to generate each of the E choose N subgraphs and check if they are trees.
Have you considered this approach? Of course if E is too large then this is not viable.
EDIT:
We can also use the fact that a tree is a combination of trees, i.e. that each tree of size N can be "grown" by adding an edge to a tree of size N-1. Let E be the set of edges in the graph. An algorithm could then go something like this.
T = E
n = 1
while n<N
newT = empty set
for each tree t in T
for each edge e in E
if t+e is a tree of size n+1 which is not yet in newT
add t+e to newT
T = newT
n = n+1
At the end of this algorithm, T is the set of all subtrees of size N. If space is an issue, don't keep a full list of the trees, but use a compact representation, for instance implement T as a decision tree using ID3.
I think problem is under-specified. You mentioned that graph is undirected and that subgraph you are trying to find is of size N. What is missing is number of edges and whenever trees you are looking for binary or you allowed to have multi-trees. Also - are you interested in mirrored reflections of same tree, or in other words does order in which siblings are listed matters at all?
If single node in a tree you trying to find allowed to have more than 2 siblings which should be allowed given that you don't specify any restriction on initial graph and you mentioned that resulting subgraph should contain all nodes.
You can enumerate all subgraphs that have form of tree by performing depth-first traversal. You need to repeat traversal of the graph for every sibling during traversal. When you'll need to repeat operation for every node as a root.
Discarding symmetric trees you will end up with
N^(N-2)
trees if your graph is fully connected mesh or you need to apply Kirchhoff's Matrix-tree theorem

Resources