What is the total number of nodes generated by Depth-First Search - algorithm

Assume: 'd' is the finite depth of a tree ; 'b' is a branching factor ; 'g' is the shallowest goal node.
From what I know, the worst-case is when the goal node is at the very last right-bottomed node in a tree.
Thus, supposedly the total number of nodes generated is O(bg), right?
However, my instructor told me it was wrong since the worst-case is when all the tree are explored except the subtree rooted at the goal node.
He mentioned something about O(bd) - O(b(g-d)) .... I'm not entirely sure.
I don't really get what he means, so can somebody tell me which answer is correct?

I recommend drawing a tree, marking the nodes that are explored, and counting how many there are.
Your reasoning is correct if you use breadth first search because you will only have reached a depth of g for each branch (O(b**g) nodes explored in total).
Your instructor's reasoning is correct if you use depth first search because you reach a depth of d for all parts of the tree except the one with the goal (O(b**d - b**(d-g)) nodes explored).
The goal is the green circle.
The blue nodes are explored.
The red nodes are not explored.
To count the number explored we count the total in the tree, and take away the red ones.
Depth = 2 = d
Goal at depth = 1 = g
Branching factor = b = 3
Note that I have called the total number of nodes in the tree O(b**d). Strictly speaking, the total is b**d + b**(d-1) + b**(d-2) + ... + 1, but this is O(b**d).

Related

How to find minimum possible height of tree?

The fan-out of a node in a tree is defined to be the number of children the node has. The fan-out of a tree is defined to be the maximum fan-out of any node in the tree. Assume tree T has n nodes and a fan-out f > 1. What is the minimum possible height of T?
I have no idea how to begin this problem. I solved the first part which is to find the maximum number of nodes that can be T in terms of height h and fan-out f > 1. I got (f^(h+1) -1)/(f-1). I'm thinking you can use this to solve for the question above. Can someone please point me in the correct direction?
Thank you!
I would approach this problem by turning it around and trying to find the maximum number of nodes you can pack in a tree with a given height and fan-out T_max(h,f). This way, every other tree T(h,f) is guaranteed to have as much, or less nodes than T_max(h,f). Therefore, if you find such T_max(h,f), that
total_nodes( T_max(h,f) ) > n > total_nodes( T_max(h-1,f))
h will be guaranteed to be the minimum height of a tree with n nodes and f fan-out.
In order to find such a tree, you need to maximize the number of nodes in every layer of a tree. In other words, every node of such a tree needs to have fan-out of f, no less. Therefore you start inserting nodes in a tree, one level at a time. After a layer is full, you start adding another layer. After n nodes are inserted in such a tree, you stop and check the height of the tree. This will be the minimal height you are looking for.
Or, you can do a calculation instead:
nodes_in_level(1) = 1
nodes_in_level(2) = f
nodes_in_level(3) = f * f
...
nodes_in_level(x) = f ^ (x - 1)
This is a standard geometric progression. The maximum nodes of a given tree with height x and fan-out f is therefore the sum of such geometric progression and it shouldn't be too much trouble to figure out the smallest x, for which the number of nodes is greater than n.

Relationship between number of nodes and height

I am reading The Algorithm Design Manual. The author states that the height of a tree is:
h = log n,
where
h is height
n = number of leaf nodes
log is log to base d, where d is the maximum number of children allowed per node.
He then goes on to say that the height of a perfectly balanced binary search tree, would be:
h = log n
I wonder if n in this second statement denotes 'total number of leaf nodes' or 'total number of nodes'.
Which brings up a bigger question, is there a mathematical relationship between total number of nodes and the height of a perfectly balanced binary search tree?
sure, n = 2^h where h, n denote height of the tree and the number of its nodes, respectively.
proof sketch:
a perfectly balanced binary tree has
an actual branching factor of 2 at each inner node.
equal root path lengths for each leaf node.
about the leaf nodes in a perfectly balanced binary tree:
as the number of leafs is the number of nodes minus the number of nodes in a perfectly balanced binary tree with a height decremented by one, the number of leafs is half the number of all nodes (to be precise, half of n+1).
so h just varies by 1, which usually doesn't make any real difference in complexity considerations. that claim can be illustrated by remembering that it amounts to the same variations as defining the height of a single node tree as either 0 (standard) or 1 (unusual, but maybe handy in distinguishing it from an empty tree).
It doesn't really matter if you talk of all nodes or just leaf nodes: either is bound by above and below by the other multiplied by a constant factor. In a perfectly balanced binary tree the number of nodes on a full level is the number of all nodes in levels above plus one.
In a complete binary tree number of nodes (n) and height of tree (h) have a relationship like this in below.
n = 2^(h+1) -1
this is the all the nodes of the tree

A linear-time algorithm for finding the longest distance between two nodes in a free tree?

Given a free tree, find an algorithm to find the longest path between two nodes that runs in linear time. Is this possible to do if the nodes don't store their level? If yes, how?
If the nodes do store their level then I would move the lower node up the tree to the same level as the other. Than I would keep moving up the tree until the nodes overlap. The distance would be the sum of each time a node was moved up the tree.
If all the edges between the two nodes could not be used more than once, the path is fixed. So the problem is to find the lowest common ancestor, you can read here: http://en.wikipedia.org/wiki/Lowest_common_ancestor
There's a famous algorithm to solve it, and it's here:
http://en.wikipedia.org/wiki/Tarjan%27s_off-line_least_common_ancestors_algorithm
I solved http://www.spoj.pl/problems/PT07Z/ with the following code as an exercise to learn python:
def func(node):
global M
if (len(node)==0):
return 0
else:
s=[func(nodes[n]) for n in node]
s.sort()
m1=s[-1]+1
m2=0
if len(s)>1:
m2=s[-2]+1
M=max(M,m1+m2)
return m1
t=input()
nodes={}
for node in range(1,t+1):
nodes[node]=[]
for i in range(t-1):
s=raw_input().split()
a,b=int(s[0]),int(s[1])
nodes[a].append(b)
M=0
func(nodes[1])
print M
Note you can sort the nodes in linear time because you know the nodes go from 0 to N, so you move node 0 to position 0.. node 5 to position 5 etc.

IOI 2003 : how to calculate the node that has the minimum balance in a tree?

here is the Balancing Act problem that demands to find the node that has the minimum balance in a tree. Balance is defined as :
Deleting any node
from the tree yields a forest : a collection of one or more trees. Define the balance of a node to be the size of the largest tree in the forest T created by deleting that node from T
For the sample tree like :
2 6 1 2 1 4 4 5 3 7 3 1
Explanation is :
Deleting node 4 yields two trees whose member nodes are {5} and {1,2,3,6,7}. The
larger of these two trees has five nodes, thus the balance of node 4 is five. Deleting node
1 yields a forest of three trees of equal size: {2,6}, {3,7}, and {4,5}. Each of these trees
has two nodes, so the balance of node 1 is two.
What kind of algorithm can you offer to this problem?
Thanks
I am going to assume that you have had a looong look at this problem: reading the solution does not help, you only get better at solving these problems by solving them yourself.
So one thing to observe is, the input is a tree. That means that each edge joins 2 smaller trees together. Removing an edge yields 2 disconnected trees (a forest of 2 trees).
So, if you calculate the size of the tree on one side of the edge, and then on the other, you should be able to look at a node's edges and ask "What is the size of the tree on the other side of this edge?"
You can calculate the sizes of trees using dynamic programming - your recurrence state is "What edge am I on? What side of the edge am I on?" and it calculates the size of the tree "hung" at that node. That is the crux of the problem.
Having that data, it is sufficient to iterate through all the nodes, look at their edges and ask "What is the size of the tree on the other side of this edge?" From there, you just pick the minimum.
Hope that helps.
You basically want to check 3 things for every node:
The size of its left subtree.
The size of its right subtree.
The size of the rest of the tree. (size of tree - left - right)
You can use this algorithm and expand it to any kind of tree (different number of subnodes).
Go over the tree in an in-order sequence.
Do this recursively:
Every time you just before you back up from a node to the "father" node, you need to add 1+size of node's total sub trees, to the "father" node.
Then store a value, let's call it maxTree, in the node that holds the maximum between all its subtrees, and the (sum of all subtrees)-(size of tree).
This way you can calculate all the subtree sizes in O(N).
While traversing the tree, you can hold a variable that hold the minimum value found so far.

Find all subtrees of size N in an undirected graph

Given an undirected graph, I want to generate all subgraphs which are trees of size N, where size refers to the number of edges in the tree.
I am aware that there are a lot of them (exponentially many at least for graphs with constant connectivity) - but that's fine, as I believe the number of nodes and edges makes this tractable for at least smallish values of N (say 10 or less).
The algorithm should be memory-efficient - that is, it shouldn't need to have all graphs or some large subset of them in memory at once, since this is likely to exceed available memory even for relatively small graphs. So something like DFS is desirable.
Here's what I'm thinking, in pseudo-code, given the starting graph graph and desired length N:
Pick any arbitrary node, root as a starting point and call alltrees(graph, N, root)
alltrees(graph, N, root)
given that node root has degree M, find all M-tuples with integer, non-negative values whose values sum to N (for example, for 3 children and N=2, you have (0,0,2), (0,2,0), (2,0,0), (0,1,1), (1,0,1), (1,1,0), I think)
for each tuple (X1, X2, ... XM) above
create a subgraph "current" initially empty
for each integer Xi in X1...XM (the current tuple)
if Xi is nonzero
add edge i incident on root to the current tree
add alltrees(graph with root removed, N-1, node adjacent to root along edge i)
add the current tree to the set of all trees
return the set of all trees
This finds only trees containing the chosen initial root, so now remove this node and call alltrees(graph with root removed, N, new arbitrarily chosen root), and repeat until the size of the remaining graph < N (since no trees of the required size will exist).
I forgot also that each visited node (each root for some call of alltrees) needs to be marked, and the set of children considered above should only be the adjacent unmarked children. I guess we need to account for the case where no unmarked children exist, yet depth > 0, this means that this "branch" failed to reach the required depth, and cannot form part of the solution set (so the whole inner loop associated with that tuple can be aborted).
So will this work? Any major flaws? Any simpler/known/canonical way to do this?
One issue with the algorithm outlined above is that it doesn't satisfy the memory-efficient requirement, as the recursion will hold large sets of trees in memory.
This needs an amount of memory that is proportional to what is required to store the graph. It will return every subgraph that is a tree of the desired size exactly once.
Keep in mind that I just typed it into here. There could be bugs. But the idea is that you walk the nodes one at a time, for each node searching for all trees that include that node, but none of the nodes that were searched previously. (Because those have already been exhausted.) That inner search is done recursively by listing edges to nodes in the tree, and for each edge deciding whether or not to include it in your tree. (If it would make a cycle, or add an exhausted node, then you can't include that edge.) If you include it your tree then the used nodes grow, and you have new possible edges to add to your search.
To reduce memory use, the edges that are left to look at is manipulated in place by all of the levels of the recursive call rather than the more obvious approach of duplicating that data at each level. If that list was copied, your total memory usage would get up to the size of the tree times the number of edges in the graph.
def find_all_trees(graph, tree_length):
exhausted_node = set([])
used_node = set([])
used_edge = set([])
current_edge_groups = []
def finish_all_trees(remaining_length, edge_group, edge_position):
while edge_group < len(current_edge_groups):
edges = current_edge_groups[edge_group]
while edge_position < len(edges):
edge = edges[edge_position]
edge_position += 1
(node1, node2) = nodes(edge)
if node1 in exhausted_node or node2 in exhausted_node:
continue
node = node1
if node1 in used_node:
if node2 in used_node:
continue
else:
node = node2
used_node.add(node)
used_edge.add(edge)
edge_groups.append(neighbors(graph, node))
if 1 == remaining_length:
yield build_tree(graph, used_node, used_edge)
else:
for tree in finish_all_trees(remaining_length -1
, edge_group, edge_position):
yield tree
edge_groups.pop()
used_edge.delete(edge)
used_node.delete(node)
edge_position = 0
edge_group += 1
for node in all_nodes(graph):
used_node.add(node)
edge_groups.append(neighbors(graph, node))
for tree in finish_all_trees(tree_length, 0, 0):
yield tree
edge_groups.pop()
used_node.delete(node)
exhausted_node.add(node)
Assuming you can destroy the original graph or make a destroyable copy I came up to something that could work but could be utter sadomaso because I did not calculate its O-Ntiness. It probably would work for small subtrees.
do it in steps, at each step:
sort the graph nodes so you get a list of nodes sorted by number of adjacent edges ASC
process all nodes with the same number of edges of the first one
remove those nodes
For an example for a graph of 6 nodes finding all size 2 subgraphs (sorry for my total lack of artistic expression):
Well the same would go for a bigger graph, but it should be done in more steps.
Assuming:
Z number of edges of most ramificated node
M desired subtree size
S number of steps
Ns number of nodes in step
assuming quicksort for sorting nodes
Worst case:
S*(Ns^2 + MNsZ)
Average case:
S*(NslogNs + MNs(Z/2))
Problem is: cannot calculate the real omicron because the nodes in each step will decrease depending how is the graph...
Solving the whole thing with this approach could be very time consuming on a graph with very connected nodes, however it could be paralelized, and you could do one or two steps, to remove dislocated nodes, extract all subgraphs, and then choose another approach on the remainder, but you would have removed a lot of nodes from the graph so it could decrease the remaining run time...
Unfortunately this approach would benefit the GPU not the CPU, since a LOT of nodes with the same number of edges would go in each step.... and if parallelization is not used this approach is probably bad...
Maybe an inverse would go better with the CPU, sort and proceed with nodes with the maximum number of edges... those will be probably less at start, but you will have more subgraphs to extract from each node...
Another possibility is to calculate the least occuring egde count in the graph and start with nodes that have it, that would alleviate the memory usage and iteration count for extracting subgraphs...
Unless I'm reading the question wrong people seem to be overcomplicating it.
This is just "all possible paths within N edges" and you're allowing cycles.
This, for two nodes: A, B and one edge your result would be:
AA, AB, BA, BB
For two nodes, two edges your result would be:
AAA, AAB, ABA, ABB, BAA, BAB, BBA, BBB
I would recurse into a for each and pass in a "template" tuple
N=edge count
TempTuple = Tuple_of_N_Items ' (01,02,03,...0n) (Could also be an ordered list!)
ListOfTuple_of_N_Items ' Paths (could also be an ordered list!)
edgeDepth = N
Method (Nodes, edgeDepth, TupleTemplate, ListOfTuples, EdgeTotal)
edgeDepth -=1
For Each Node In Nodes
if edgeDepth = 0 'Last Edge
ListOfTuples.Add New Tuple from TupleTemplate + Node ' (x,y,z,...,Node)
else
NewTupleTemplate = TupleTemplate + Node ' (x,y,z,Node,...,0n)
Method(Nodes, edgeDepth, NewTupleTemplate, ListOfTuples, EdgeTotal
next
This will create every possible combination of vertices for a given edge count
What's missing is the factory to generate tuples given an edge count.
You end up with a list of possible paths and the operation is Nodes^(N+1)
If you use ordered lists instead of tuples then you don't need to worry about a factory to create the objects.
If memory is the biggest problem you can use a NP-ish solution using tools from formal verification. I.e., guess a subset of nodes of size N and check whether it's a graph or not. To save space you can use a BDD (http://en.wikipedia.org/wiki/Binary_decision_diagram) to represent the original graph's nodes and edges. Plus you can use a symbolic algorithm to check if the graph you guessed is really a graph - so you don't need to construct the original graph (nor the N-sized graphs) at any point. Your memory consumption should be (in big-O) log(n) (where n is the size of the original graph) to store the original graph, and another log(N) to store every "small graph" you want.
Another tool (which is supposed to be even better) is to use a SAT solver. I.e., construct a SAT formula that is true iff the sub-graph is a graph and supply it to a SAT solver.
For a graph of Kn there are approximately n! paths between any two pairs of vertices. I haven't gone through your code but here is what I would do.
Select a pair of vertices.
Start from a vertex and try to reach the destination vertex recursively (something like dfs but not exactly). I think this would output all the paths between the chosen vertices.
You could do the above for all possible pairs of vertices to get all simple paths.
It seems that the following solution will work.
Go over all partitions into two parts of the set of all vertices. Then count the number of edges which endings lie in different parts (k); these edges correspond to the edge of the tree, they connect subtrees for the first and the second parts. Calculate the answer for both parts recursively (p1, p2). Then the answer for the entire graph can be calculated as sum over all such partitions of k*p1*p2. But all trees will be considered N times: once for each edge. So, the sum must be divided by N to get the answer.
Your solution as is doesn't work I think, although it can be made to work. The main problem is that the subproblems may produce overlapping trees so when you take the union of them you don't end up with a tree of size n. You can reject all solutions where there is an overlap, but you may end up doing a lot more work than needed.
Since you are ok with exponential runtime, and potentially writing 2^n trees out, having V.2^V algorithms is not not bad at all. So the simplest way of doing it would be to generate all possible subsets n nodes, and then test each one if it forms a tree. Since testing whether a subset of nodes form a tree can take O(E.V) time, we are potentially talking about V^2.V^n time, unless you have a graph with O(1) degree. This can be improved slightly by enumerating subsets in a way that two successive subsets differ in exactly one node being swapped. In that case, you just have to check if the new node is connected to any of the existing nodes, which can be done in time proportional to number of outgoing edges of new node by keeping a hash table of all existing nodes.
The next question is how do you enumerate all the subsets of a given size
such that no more than one element is swapped between succesive subsets. I'll leave that as an exercise for you to figure out :)
I think there is a good algorithm (with Perl implementation) at this site (look for TGE), but if you want to use it commercially you'll need to contact the author. The algorithm is similar to yours in the question but avoids the recursion explosion by making the procedure include a current working subtree as a parameter (rather than a single node). That way each edge emanating from the subtree can be selectively included/excluded, and recurse on the expanded tree (with the new edge) and/or reduced graph (without the edge).
This sort of approach is typical of graph enumeration algorithms -- you usually need to keep track of a handful of building blocks that are themselves graphs; if you try to only deal with nodes and edges it becomes intractable.
This algorithm is big and not easy one to post here. But here is link to reservation search algorithm using which you can do what you want. This pdf file contains both algorithms. Also if you understand russian you can take a look to this.
So you have a graph with with edges e_1, e_2, ..., e_E.
If I understand correctly, you are looking to enumerate all subgraphs which are trees and contain N edges.
A simple solution is to generate each of the E choose N subgraphs and check if they are trees.
Have you considered this approach? Of course if E is too large then this is not viable.
EDIT:
We can also use the fact that a tree is a combination of trees, i.e. that each tree of size N can be "grown" by adding an edge to a tree of size N-1. Let E be the set of edges in the graph. An algorithm could then go something like this.
T = E
n = 1
while n<N
newT = empty set
for each tree t in T
for each edge e in E
if t+e is a tree of size n+1 which is not yet in newT
add t+e to newT
T = newT
n = n+1
At the end of this algorithm, T is the set of all subtrees of size N. If space is an issue, don't keep a full list of the trees, but use a compact representation, for instance implement T as a decision tree using ID3.
I think problem is under-specified. You mentioned that graph is undirected and that subgraph you are trying to find is of size N. What is missing is number of edges and whenever trees you are looking for binary or you allowed to have multi-trees. Also - are you interested in mirrored reflections of same tree, or in other words does order in which siblings are listed matters at all?
If single node in a tree you trying to find allowed to have more than 2 siblings which should be allowed given that you don't specify any restriction on initial graph and you mentioned that resulting subgraph should contain all nodes.
You can enumerate all subgraphs that have form of tree by performing depth-first traversal. You need to repeat traversal of the graph for every sibling during traversal. When you'll need to repeat operation for every node as a root.
Discarding symmetric trees you will end up with
N^(N-2)
trees if your graph is fully connected mesh or you need to apply Kirchhoff's Matrix-tree theorem

Resources