Removing duplicate subtrees from binary tree - algorithm

I have to design an algorithm under the additional homework. This algorithm have to compress binary tree by transforming it into DAG by removing repetitive subtrees and redirecting all these connections to one left original subtree. For instance I've got a tree (I'm giving the nodes preorder):
1 2 1 3 2 1 3
The algorithm have to remove right connection (right subtree that means 2 1 3) of 1 (root) and redirect it to left connection (because these substrees are the same and left was first in preorder so we leave only the left)
The way I see it: I'm passing the tree preorder. For current node 'w', I start recursion that have to detect (if there exist) the original subtree equals to the subtree with root 'w'. I'm cutting the recursion if I find equal subtree (and I do what must be done) or when I get to 'w' in my finding the same subtrees recursion. Of course I predict some small improvements like comparing only subtrees with equal number of nodes.
If I'm not wrong it gives complexity O(n^2) where n is number of nodes of given binary tree. Is there any chance to do it faster (I think it is). Is the linear algorithm possible?
Pity that my algorithm finally has complexity O(n^3). Your answers with hashing probably will be very useful for me after some time, when I will know much more.. For now it's too difficult for me..
The last question. Is there any chance to do it in O(n^2) using elementary techniques (not hashing)?

This happens when constructing oBDDs. The Idea is: put the tree into a canonical form, and construct a hashtable with an entry for every node. Hash function is a function of the node + the hash functions for the left/right child nodes. Complexity is O(N), but only if one can rely on the hashvalues being unique. The final compare (e.g. for Resolving collisions) will still cost o(N*N) for the recursive subtree <--> subtree compare.
More on BDDs or the original Bryant paper
The hashfunction I currently use:
#define SHUFFLE(x,n) (((x) << (n))|((x) >>(32-(n))))
/* a node's hashvalue is based on its value
* and (recursively) on it's children's hashvalues.
*/
#define NODE_HASH2(l,r) ((SHUFFLE((l),5)^SHUFFLE((r),9)))
#define NODE_HASH3(v,l,r) ((0x54321u*(v) ^ NODE_HASH2((l),(r))))
Typical usage:
void node_sethash(NodeNum num)
{
if (NODE_IS_NULL(num)) return;
if (NODE_IS_TERMINAL(num)) switch (nodes[num].var) {
case 0: nodes[num].hash.hash= HASH_FALSE; break;
case 1: nodes[num].hash.hash= HASH_TRUE; break;
case 2: nodes[num].hash.hash= HASH_FALSE^HASH_TRUE; break;
}
else if (NODE_IS_NAMED(num)) {
NodeNum f,t;
f = nodes[num].negative;
t = nodes[num].positive;
nodes[num].hash.hash = NODE_HASH3 (nodes[num].var, nodes[f].hash.hash, nodes[t].hash.hash);
}
return ;
}
Searching the hash table:
NodeNum *hash_hnd(NodeNum num, int want_exact)
{
unsigned slot;
NodeNum *ptr, this;
if (NODE_IS_NULL(num)) return NULL;
slot = nodes[num].hash.hash % COUNTOF(hash_nodes);
for (ptr = &hash_nodes[slot]; !NODE_IS_NULL(this= *ptr); ptr = &nodes[this].hash.link) {
if (this == num) break;
if (want_exact) continue;
if (nodes[this].hash.hash != nodes[num].hash.hash) continue;
if (nodes[this].var != nodes[num].var) continue;
if (node_compare( nodes[this].negative , nodes[num].negative)) continue;
if (node_compare( nodes[this].positive , nodes[num].positive)) continue;
/* duplicate node := same var+same children */
break;
}
return ptr;
}
The recursive compare function:
int node_compare(NodeNum one, NodeNum two)
{
int rc;
if (one == two) return 0;
if (NODE_IS_NULL(one) && NODE_IS_NULL(two)) return 0;
if (NODE_IS_NULL(one) && !NODE_IS_NULL(two)) return -1;
if (!NODE_IS_NULL(one) && NODE_IS_NULL(two)) return 1;
if (NODE_IS_TERMINAL(one) && !NODE_IS_TERMINAL(two)) return -1;
if (!NODE_IS_TERMINAL(one) && NODE_IS_TERMINAL(two)) return 1;
if (VAR_RANK(nodes[one].var) < VAR_RANK(nodes[two].var) ) return -1;
if (VAR_RANK(nodes[one].var) > VAR_RANK(nodes[two].var) ) return 1;
rc = node_compare(nodes[one].negative,nodes[two].negative);
if (rc) return rc;
rc = node_compare(nodes[one].positive,nodes[two].positive);
if (rc) return rc;
return 0;
}

This is a problem commonly solved to do common sub-expression elimination in programming languages.
The approach is as follows (and is easily generalized to more than 2 children in a node):
Algorithm (Assumes mutable tree structure; You can easily build a new tree along the way):
MakeDAG(tree):
HASH = a new hash-table-based dictionary
foreach subtree NODE in the tree // traverse this however you like
if NODE is in HASH
replace NODE with HASH[NODE]
else
HASH[NODE] = N // insert the current node, N, in the dictionary
To compute the hash code for a node, you need to recursively compute the hash nodes until you reach the leaves of the tree.
Simply calculating these hash codes naively will bump up your runtime to O(n^2).
It is crucial that you store the results on your way down the tree to avoid repeated recursive calls and to improve the runtime to O(n).

I would go with a hashing approach.
A hash for a leaf is its value mod P_1. Hash for a node is (value+hash(left_son)*P_2+hash(right_son)*P_2^2) mod P_1, where P_1, P_2 are primes. If you count those hashes for at least 5 different big prime pairs(by big i mean something near 10^8-10^9, so you can do your math without overflowing), you can safely assume that nodes with same hashes are the same.
Then you can walk the tree, checking sons, first and do your transform. This will work in O(n) time.
NOTE that you can use other hash functions, like (value + hash(left_son)*P_2 + hash(right_son)*P_3) mod P_1, etc.

Related

How to calculate Hash value of a Tree

What is the best way to calculate the hash value of a Tree?
I need to compare the similarity between several trees in O(1). Now, I want to precalculate the hash values and compare them when needed. But then I realized, hashing a tree is different than hashing a sequence. I wasn't able to come up with a good hash function.
What is the best way to calculate hash value of a tree?
Note : I will implement the function in c/c++
Well hasing a tree means representing it in a unique way so that we can differ other trees from this tree using a simple representation or number. On normal polynomial hash we use number base conversion, we convert a string or a sequence in a specific prime base and use a mod value which is also a large prime. Now using this same technique we can hash a tree.
Now fix the root of the tree at any vertex. Let root = 1 and,
B = The base in which we want to convert.
P[i] = i th power of B (B^i).
level[i] = Depth of the ith vertex where (distance from the root).
child[i] = Total number of the vertex in the subtree of ith vertex including i.
degree[i] = Number of adjacent node of vertex i.
Now the contribution of the ith vertex in the hash value is -
hash[i] = ( (P[level[i]]+degree[i]) * child[i] ) % modVal
And the hash value of the entire tree is the summation of the all vertices hash value-
(hash[1]+hash[2]+....+hash[n]) % modVal
If we use this definition of tree equivalence:
T1 is equivalent to T2 iff
all paths to leaves of T1 exist exactly once in T2, and
all paths to leaves of T2 exist exactly once in T2
Hashing a sequence (a path) is straightforward. If h_tree(T) is a hash of all paths-to-leafs of T, where the order of the paths does not alter the outcome, then it is a good hash for the whole of T, in the sense that equivalent trees will produce equal hashes, according to the above definition of equivalence. So I propose:
h_path(path) = an order-dependent hash of all elements in the path.
Requires O(|path|) time to calculate,
but child nodes can reuse the calculation of their
parent node's h_path in their own calculations.
h_tree(T) = an order-independent hashing of all its paths-to-leaves.
Can be calculated in O(|L|), where L is the number of leaves
In pseudo-c++:
struct node {
int path_hash; // path-to-root hash; only use for building tree_hash
int tree_hash; // takes children into account; use to compare trees
int content;
vector<node> children;
int update_hash(int parent_path_hash = 1) {
path_hash = parent_path_hash * PRIME1 + content; // order-dependent
tree_hash = path_hash;
for (node n : children) {
tree_hash += n.update_hash(path_hash) * PRIME2; // order-independent
}
return tree_hash;
}
};
After building two trees, update their hashes and compare away. Equivalent trees should have the same hash, different trees not so much. Note that the path and tree hashes that I am using are rather simplistic, and chosen rather for ease of programming than for great collision resistance...
Child hashes should be successively multiplied by a prime number & added. Hash of the node itself should be multiplied by a different prime number & added.
Cache the hash of the tree overall -- I prefer to cache it outside the AST node, if I have a wrapper object holding the AST.
public class RequirementsExpr {
protected RequirementsAST ast;
protected int hash = -1;
public int hashCode() {
if (hash == -1)
this.hash = ast.hashCode();
return hash;
}
}
public class RequirementsAST {
protected int nodeType;
protected Object data;
// -
protected RequirementsAST down;
protected RequirementsAST across;
public int hashCode() {
int nodeHash = nodeType;
nodeHash = (nodeHash * 17) + (data != null ? data.hashCode() : 0);
nodeHash *= 23; // prime A.
int childrenHash = 0;
for (RequirementsAST child = down; child != null; child = child.getAcross()) {
childrenHash *= 41; // prime B.
childrenHash += child.hashCode();
}
int result = nodeHash + childrenHash;
return result;
}
}
The result of this, is that child/descendant nodes in different positions are always multiplied in by different factors; and the node itself is always multiplied in by a different factor from any possible child/descendant nodes.
Note that other primes should also be used in building the nodeHash of the node data, itself. This helps avoid eg. different values of nodeType colliding with different values of data.
Within the limits of 32-bit hashing, this scheme overall gives a very high chance of uniqueness for any differences in tree-structure (eg, transposing two siblings) or value.
Once calculated (over the entire AST) the hashes are highly efficient.
I would recommend converting the tree to a canonical sequence and hashing the sequence. (The details of the conversion depend on your definition of equivalence. For example, if the trees are binary search trees and the equivalence relation is structural, then the conversion could be to enumerate the tree in preorder, as the structure of binary search trees can be recovered from the preorder enumeration.)
Thomas's answer boils down at first glance to associating a multivariable polynomial with each tree and evaluating the polynomial at a particular location. There are two steps that, at the moment, have to be assumed on faith; the first is that the map doesn't send inequivalent trees to the same polynomial, and the second is that the evaluation scheme doesn't introduce too many collisions. I can't evaluate the first step presently, though there are reasonable definitions of equivalence that permit reconstruction from a two-variable polynomial. The second is not theoretically sound but could be made so via Schwartz--Zippel.

Trouble with a stack based algorithm

I'm working on this programming assignment. It tests our understanding of stacks and their applications. I find it extremely difficult to come up with an algorithm that can work efficiently and accurately. Some of their test cases have 200,000+ "trees"! While my algorithm can work for simpler test cases with less than 10 trees, it failed in the accuracy and efficiency departments when the number of "trees" is exceedingly large (from 100+ onwards).
I would appreciate it very much, if you guys can kindly give me a hint or point me to the right direction. Thank you.
Task Statement
Monkeys like to swing from tree to tree. They can swing from one tree
to another directly as long as there is no tree in between that is
taller than or have the same height as either one of the two trees.
For example, if there are 5 trees with heights 19m, 17m, 20m, 20m and
20m lining up in that order, then the monkey will be able to swing
from one tree to the other as shown below:
1. from first tree to second tree
2. from first tree to third tree
3. from second tree to third tree
4. from third tree to fourth tree
5. from fourth tree to fifth tree
Tarzan, the king of jungle who is able to communicate with the
monkeys, wants to test the monkeys to see if they know how to count
the total number of pairs of trees that they can swing directly from
one to the other. But he himself is not very good in counting. So he
turns to you, the best Java programmer in the country, to write a
program for getting the correct count for the trees in different parts
of the jungle.
Input
The first line contains N, the number of trees in the path. The next
line contains N integers a1 a2 a3 ... aN, where ai represents the
height of the i-th tree in the path, 0 < ai ≤ 231 and 2 ≤ N ≤ 500,000.
Note that short symbol N is used above for convenience. In your
program, you are expected to give it a descriptive name.
Output
The total number of pairs of trees which the monkeys can swing
directly from one to the other with the given list of tree heights.
Sample Input 1
4
3 4 1 2
Sample Output 1
4
Sample Input 2
5
19 17 20 20 20
Sample Output 2
5
Sample Input 3
4 1
2 21 21 12
Sample Output 3
3
Here's my code. So this is a method that returns the number of pairs of trees a monkey can swing with. The parameter is an array of inputs.
My algorithm goes as follows:
we set the numPairs to be (array length - 1), since all trees can be swing from one to another.
now we find the extra numPairs (extra trees to swing with).
push the first input into the empty stack
we enter a for loop:
for the next input until the end of array:
case1:
if the top of the stack is smaller than the current input and the size of the stack is equal to 1, then we replace the top with the input.
case2:
if the top of the stack smaller than the current input and the size of the stack is bigger than 1, we pop the top, and enter a while loop to pop the previous elements which is smaller than the current top of the stack.
we then push the current input after we exit the while loop.
case3:
otherwise, if the above conditions are not satisfied, we simply push the current input into the stack.
we exit the for loop
return the numPairs
public int solve(int[] arr) {
int input, temp;
numPairs = arr.length-1;
for(int i=0; i<arr.length; i++)
{
input = arr[i];
if(stack.isEmpty())
stack.push(input);
else if(!stack.isEmpty())
{
if(input>stack.peek() && stack.size() == 1)
{
stack.pop();
stack.push(input);
}
else if(input>stack.peek() && stack.size() > 1)
{
temp = stack.pop();
while(!stack.isEmpty() && temp < stack.peek())
{
numPairs++;
temp = stack.pop();
}
stack.push(input);
//numPairs++;
}
else
stack.push(input);
}
}
return numPairs;
}
Here's my solution, it's an iterative one.
class Result {
// declare the member field
Stack<Integer> stack;
int numPairs = 0;
// declare the constructor
public Result()
{
stack = new Stack<Integer>();
}
/*
* solve : to compute the result, return the result
* Pre-condition : parameter must be of array of integer type
* Post-condition : return the number of tree pairs that can be swung with
*/
public int solve(int[] arr) {
// implementation
int input;
for(int i=0; i<arr.length; i++)
{
input = arr[i];
if(stack.isEmpty()) //if stack is empty, just push the input
stack.push(input);
else if(!stack.isEmpty())
{
//do a while loop to pop all possible top stack element until
//the top element is bigger than the input
//or the stack is empty
while(!stack.isEmpty() && input > stack.peek())
{
stack.pop();
numPairs++;
}
//if the stack is empty after exiting the while loop
//push the current element onto the stack
if(stack.isEmpty())
stack.push(input);
//this condition applies for two cases:
//1. the while loop is never entered because the input is smaller than the top element by default
//2. the while loop is exited and the input is pushed onto the non-empty stack with numPairs being incremented
else if(!stack.isEmpty() && input < stack.peek())
{
stack.push(input);
numPairs++;
}
//this is the last condition:
//the input is never pushed if the input is identical to the top element
//instead we increment the numPairs
else if(input == stack.peek())
numPairs++;
}
}
return numPairs;
}
}
If I understand the problem correctly, there are two kinds of trees accessible to each other:
Trees that are next to each (adjacent) other are always accessible to each other
Trees that are not adjacent are only accessible if all the trees in between are shorter than both of the trees.
One might come up with several types of solutions for this:
The brute force solution: compare every tree to every other tree checking the conditions above. Running time: O(n^2)
Find near accessible neighbors solution: look for near neighbors that are accessible. Running time: close to O(n). Here's how this would work:
Build an array of tree sizes in order that they are given. Then walk this array in order and for every tree at index i:
Going to the right from i
If tree at i+1 is taller then tree at i break out (no more accessible neighbors can be found)
Add 1 to the count of accessible trees if tree at i+1 is shorter than tree at i+2
Do the same for trees i+2, i+3.. etc. until you find a tree that is taller than tree at i.
This will get a count of non-adjacent accessible trees for every tree. Then just add N*2-2 to the count to account for all the adjacent trees, and you are done.

Creating path array using IDDFS

My IDDFS algorithm finds the shortest path of my graph using adjacency matrix.
It shows how deep is the solution (I understand that this is amount of points connected together from starting point to end).
I would like to get these points in array.
For example:
Let's say that solution is found in depth 5, so I would like to have array with points: {0,2,3,4,6}.
Depth 3: array {1,2,3}.
Here is the algorithm in C++:
(I'm not sure if that algorithm "knows" if points which were visited are visited again while searching or not - I'm almost beginner with graphs)
int DLS(int node, int goal, int depth,int adj[9][9])
{
int i,x;
if ( depth >= 0 )
{
if ( node == goal )
return node;
for(i=0;i<nodes;i++)
{
if(adj[node][i] == 1)
{
child = i;
x = DLS(child, goal, depth-1,adj);
if(x == goal)
return goal;
}
}
}
return 0;
}
int IDDFS(int root,int goal,int adj[9][9])
{
depth = 0;
solution = 0;
while(solution <= 0 && depth < nodes)
{
solution = DLS(root, goal, depth,adj);
depth++;
}
if(depth == nodes)
return inf;
return depth-1;
}
int main()
{
int i,u,v,source,goal;
int adj[9][9] = {{0,1,0,1,0,0,0,0,0},
{1,0,1,0,1,0,0,0,0},
{0,1,0,0,0,1,0,0,0},
{1,0,0,0,1,0,1,0,0},
{0,1,0,1,0,1,0,1,0},
{0,0,1,0,1,0,0,0,1},
{0,0,0,1,0,0,0,1,0},
{0,0,0,0,1,0,1,0,1},
{0,0,0,0,0,1,0,1,0}
};
nodes=9;
edges=12;
source=0;
goal=8;
depth = IDDFS(source,goal,adj);
if(depth == inf)printf("No solution Exist\n");
else printf("Solution Found in depth limit ( %d ).\n",depth);
system("PAUSE");
return 0;
}
The reason why I'm using IDDFS instead of other path-finding algorithm is that I want to change depth to specified number to search for paths of exact length (but I'm not sure if that will work).
If someone would suggest other algorithm for finding path of specified length using adjacency matrix, please let me know about it.
The idea of getting the actual path retrieved from a pathfinding algorithm is to use a map:V->V such that the key is a vertex, and the value is the vertex used to discover the key (The source will not be a key, or be a key with null value, since it was not discovered from any vertex).
The pathfinding algorithm will modify this map while it runs, and when it is done - you can get your path by reading from the table iteratively - starting from the target - all the way up to the source - and you get your path in reversed order.
In DFS: you insert the (key,value) pair each time you discover a new vertex (which is key). Note that if key is already a key in the map - you should skip this branch.
Once you finish exploring a certain path, and "close" a vertex, you need take it out of the list, However - sometimes you can optimize the algorithm and skip this part (it will make the branch factor smaller).
Since IDDFS is actually doing DFS iteratively, you can just follow the same logic, and each time you make a new DFS iteration - for higher depth, you can just clear the old map, and start a new one from scratch.
Other pathfinding algorithms are are BFS, A* and dijkstra's algorithm. Note that the last 2 also fit for weighted graphs. All of these can be terminated when you reach a certain depth, same as DFS is terminated when you reach a certain depth in IDDFS.

Check if a tree is a mirror image?

Given a binary tree which is huge and can not be placed in memory, how do you check if the tree is a mirror image.
I got this as an interview question
If a tree is a mirror image of another tree, the inorder traversal of one tree would be reverse of another.
So just do inorder traversal on the first tree and a reverse inorder traversal on another and check if all the elements are the same.
I can't take full credit for this reply of course; a handful of my colleagues helped with some assumptions and for poking holes in my original idea. Much thanks to them!
Assumptions
We can't have the entire tree in memory, so it's not ideal to use recursion. Let's assume, for simplicity's sake, that we can only hold a maximum of two nodes in memory.
We know n, the total number of levels in our tree.
We can perform seeks on the data with respect to the character or line position it's in.
The data that is on disk is ordered by depth. That is to say, the first entry on disk is the root, and the next two are its children, and the next four are its children's children, and so forth.
There are cases in which the data is perfectly mirrored, and cases in which it isn't. Blank data interlaced with non-blank data is considered "acceptable", unless otherwise specified.
We have freedom over using any data type we wish so long as the values can be compared for equivalence. Testing for object equivalence may not be ideal, so let's assume we're comparing primitives.
"Mirrored" means mirrored between the root's children. To use different terminologies, the grandparent's left child is mirrored with its right child, and the left child (parent)'s left child is mirrored with the grandparent's right child's right child. This is illustrated in the graph below; the matching symbols represent the mirroring we want to check for.
G
P* P*
C1& C2^ C3^ C4&
Approach
We know how many nodes on each level we should expect when we're reading from disk - some multiple of 2k. We can establish a double loop to iterate over the total depth of the tree, and the count of the nodes in each level. Inside of this, we can simply compare the outermost values for equivalence, and short-circuit if we find an unequal value.
We can determine the location of each outer location by using multiples of 2k. The leftmost child of any level will always be 2k, and the rightmost child of any level will always be 2k+1-1.
Small Proof: Outermost nodes on level 1 are 2 and 3; 21 = 2, 21+1-1 = 22-1 = 3. Outermost nodes on level 2 are 4 and 7; 22 = 4, 22+1-1 = 23-1 = 7. One could expand this all the way to the nth case.
Pseudocode
int k, i;
for(k = 1; k < n; k++) { // Skip root, trivially mirrored
for(i = 0; i < pow(2, k) / 2; i++) {
if(node_retrieve(i + pow(2, k)) != node_retrieve(pow(2, (k+1)-i)) {
return false;
}
}
}
return true;
Thoughts
This sort of question is a great interview question because, more than likely, they want to see how you would approach this problem. This approach may be horrible, it may be immaculate, but an employer would want you to take your time, draw things on a piece of paper or whiteboard, and ask them questions about how the data is stored, how it can be read, what limitations there are on seeks, etc etc.
It's not the coding aspect that interviewers are interested in, but the problem solving aspect.
Recursion is easy.
struct node {
struct node *left;
struct node *right;
int payload;
};
int is_not_mirror(struct node *one, struct node *two)
{
if (!one && !two) return 0;
if (!one) return 1;
if (!two) return 1;
if (compare(one->payload, two->payload)) return 1;
if (is_not_mirror(one->left, two->right)) return 1;
if (is_not_mirror(one->right, two->left)) return 1;
return 0;
}

Create Balanced Binary Search Tree from Sorted linked list

What's the best way to create a balanced binary search tree from a sorted singly linked list?
How about creating nodes bottom-up?
This solution's time complexity is O(N). Detailed explanation in my blog post:
http://www.leetcode.com/2010/11/convert-sorted-list-to-balanced-binary.html
Two traversal of the linked list is all we need. First traversal to get the length of the list (which is then passed in as the parameter n into the function), then create nodes by the list's order.
BinaryTree* sortedListToBST(ListNode *& list, int start, int end) {
if (start > end) return NULL;
// same as (start+end)/2, avoids overflow
int mid = start + (end - start) / 2;
BinaryTree *leftChild = sortedListToBST(list, start, mid-1);
BinaryTree *parent = new BinaryTree(list->data);
parent->left = leftChild;
list = list->next;
parent->right = sortedListToBST(list, mid+1, end);
return parent;
}
BinaryTree* sortedListToBST(ListNode *head, int n) {
return sortedListToBST(head, 0, n-1);
}
You can't do better than linear time, since you have to at least read all the elements of the list, so you might as well copy the list into an array (linear time) and then construct the tree efficiently in the usual way, i.e. if you had the list [9,12,18,23,24,51,84], then you'd start by making 23 the root, with children 12 and 51, then 9 and 18 become children of 12, and 24 and 84 become children of 51. Overall, should be O(n) if you do it right.
The actual algorithm, for what it's worth, is "take the middle element of the list as the root, and recursively build BSTs for the sub-lists to the left and right of the middle element and attach them below the root".
Best isn't only about asynmptopic run time. The sorted linked list has all the information needed to create the binary tree directly, and I think this is probably what they are looking for
Note that the first and third entries become children of the second, then the fourth node has chidren of the second and sixth (which has children the fifth and seventh) and so on...
in psuedo code
read three elements, make a node from them, mark as level 1, push on stack
loop
read three elemeents and make a node of them
mark as level 1
push on stack
loop while top two enties on stack have same level (n)
make node of top two entries, mark as level n + 1, push on stack
while elements remain in list
(with a bit of adjustment for when there's less than three elements left or an unbalanced tree at any point)
EDIT:
At any point, there is a left node of height N on the stack. Next step is to read one element, then read and construct another node of height N on the stack. To construct a node of height N, make and push a node of height N -1 on the stack, then read an element, make another node of height N-1 on the stack -- which is a recursive call.
Actually, this means the algorithm (even as modified) won't produce a balanced tree. If there are 2N+1 nodes, it will produce a tree with 2N-1 values on the left, and 1 on the right.
So I think #sgolodetz's answer is better, unless I can think of a way of rebalancing the tree as it's built.
Trick question!
The best way is to use the STL, and advantage yourself of the fact that the sorted associative container ADT, of which set is an implementation, demands insertion of sorted ranges have amortized linear time. Any passable set of core data structures for any language should offer a similar guarantee. For a real answer, see the quite clever solutions others have provided.
What's that? I should offer something useful?
Hum...
How about this?
The smallest possible meaningful tree in a balanced binary tree is 3 nodes.
A parent, and two children. The very first instance of such a tree is the first three elements. Child-parent-Child. Let's now imagine this as a single node. Okay, well, we no longer have a tree. But we know that the shape we want is Child-parent-Child.
Done for a moment with our imaginings, we want to keep a pointer to the parent in that initial triumvirate. But it's singly linked!
We'll want to have four pointers, which I'll call A, B, C, and D. So, we move A to 1, set B equal to A and advance it one. Set C equal to B, and advance it two. The node under B already points to its right-child-to-be. We build our initial tree. We leave B at the parent of Tree one. C is sitting at the node that will have our two minimal trees as children. Set A equal to C, and advance it one. Set D equal to A, and advance it one. We can now build our next minimal tree. D points to the root of that tree, B points to the root of the other, and C points to the... the new root from which we will hang our two minimal trees.
How about some pictures?
[A][B][-][C]
With our image of a minimal tree as a node...
[B = Tree][C][A][D][-]
And then
[Tree A][C][Tree B]
Except we have a problem. The node two after D is our next root.
[B = Tree A][C][A][D][-][Roooooot?!]
It would be a lot easier on us if we could simply maintain a pointer to it instead of to it and C. Turns out, since we know it will point to C, we can go ahead and start constructing the node in the binary tree that will hold it, and as part of this we can enter C into it as a left-node. How can we do this elegantly?
Set the pointer of the Node under C to the node Under B.
It's cheating in every sense of the word, but by using this trick, we free up B.
Alternatively, you can be sane, and actually start building out the node structure. After all, you really can't reuse the nodes from the SLL, they're probably POD structs.
So now...
[TreeA]<-[C][A][D][-][B]
[TreeA]<-[C]->[TreeB][B]
And... Wait a sec. We can use this same trick to free up C, if we just let ourselves think of it as a single node instead of a tree. Because after all, it really is just a single node.
[TreeC]<-[B][A][D][-][C]
We can further generalize our tricks.
[TreeC]<-[B][TreeD]<-[C][-]<-[D][-][A]
[TreeC]<-[B][TreeD]<-[C]->[TreeE][A]
[TreeC]<-[B]->[TreeF][A]
[TreeG]<-[A][B][C][-][D]
[TreeG]<-[A][-]<-[C][-][D]
[TreeG]<-[A][TreeH]<-[D][B][C][-]
[TreeG]<-[A][TreeH]<-[D][-]<-[C][-][B]
[TreeG]<-[A][TreeJ]<-[B][-]<-[C][-][D]
[TreeG]<-[A][TreeJ]<-[B][TreeK]<-[D][-]<-[C][-]
[TreeG]<-[A][TreeJ]<-[B][TreeK]<-[D][-]<-[C][-]
We are missing a critical step!
[TreeG]<-[A]->([TreeJ]<-[B]->([TreeK]<-[D][-]<-[C][-]))
Becomes :
[TreeG]<-[A]->[TreeL->([TreeK]<-[D][-]<-[C][-])][B]
[TreeG]<-[A]->[TreeL->([TreeK]<-[D]->[TreeM])][B]
[TreeG]<-[A]->[TreeL->[TreeN]][B]
[TreeG]<-[A]->[TreeO][B]
[TreeP]<-[B]
Obviously, the algorithm can be cleaned up considerably, but I thought it would be interesting to demonstrate how one can optimize as you go by iteratively designing your algorithm. I think this kind of process is what a good employer should be looking for more than anything.
The trick, basically, is that each time we reach the next midpoint, which we know is a parent-to-be, we know that its left subtree is already finished. The other trick is that we are done with a node once it has two children and something pointing to it, even if all of the sub-trees aren't finished. Using this, we can get what I am pretty sure is a linear time solution, as each element is touched only 4 times at most. The problem is that this relies on being given a list that will form a truly balanced binary search tree. There are, in other words, some hidden constraints that may make this solution either much harder to apply, or impossible. For example, if you have an odd number of elements, or if there are a lot of non-unique values, this starts to produce a fairly silly tree.
Considerations:
Render the element unique.
Insert a dummy element at the end if the number of nodes is odd.
Sing longingly for a more naive implementation.
Use a deque to keep the roots of completed subtrees and the midpoints in, instead of mucking around with my second trick.
This is a python implementation:
def sll_to_bbst(sll, start, end):
"""Build a balanced binary search tree from sorted linked list.
This assumes that you have a class BinarySearchTree, with properties
'l_child' and 'r_child'.
Params:
sll: sorted linked list, any data structure with 'popleft()' method,
which removes and returns the leftmost element of the list. The
easiest thing to do is to use 'collections.deque' for the sorted
list.
start: int, start index, on initial call set to 0
end: int, on initial call should be set to len(sll)
Returns:
A balanced instance of BinarySearchTree
This is a python implementation of solution found here:
http://leetcode.com/2010/11/convert-sorted-list-to-balanced-binary.html
"""
if start >= end:
return None
middle = (start + end) // 2
l_child = sll_to_bbst(sll, start, middle)
root = BinarySearchTree(sll.popleft())
root.l_child = l_child
root.r_child = sll_to_bbst(sll, middle+1, end)
return root
Instead of the sorted linked list i was asked on a sorted array (doesn't matter though logically, but yes run-time varies) to create a BST of minimal height, following is the code i could get out:
typedef struct Node{
struct Node *left;
int info;
struct Node *right;
}Node_t;
Node_t* Bin(int low, int high) {
Node_t* node = NULL;
int mid = 0;
if(low <= high) {
mid = (low+high)/2;
node = CreateNode(a[mid]);
printf("DEBUG: creating node for %d\n", a[mid]);
if(node->left == NULL) {
node->left = Bin(low, mid-1);
}
if(node->right == NULL) {
node->right = Bin(mid+1, high);
}
return node;
}//if(low <=high)
else {
return NULL;
}
}//Bin(low,high)
Node_t* CreateNode(int info) {
Node_t* node = malloc(sizeof(Node_t));
memset(node, 0, sizeof(Node_t));
node->info = info;
node->left = NULL;
node->right = NULL;
return node;
}//CreateNode(info)
// call function for an array example: 6 7 8 9 10 11 12, it gets you desired
// result
Bin(0,6);
HTH Somebody..
This is the pseudo recursive algorithm that I will suggest.
createTree(treenode *root, linknode *start, linknode *end)
{
if(start == end or start = end->next)
{
return;
}
ptrsingle=start;
ptrdouble=start;
while(ptrdouble != end and ptrdouble->next !=end)
{
ptrsignle=ptrsingle->next;
ptrdouble=ptrdouble->next->next;
}
//ptrsignle will now be at the middle element.
treenode cur_node=Allocatememory;
cur_node->data = ptrsingle->data;
if(root = null)
{
root = cur_node;
}
else
{
if(cur_node->data (less than) root->data)
root->left=cur_node
else
root->right=cur_node
}
createTree(cur_node, start, ptrSingle);
createTree(cur_node, ptrSingle, End);
}
Root = null;
The inital call will be createtree(Root, list, null);
We are doing the recursive building of the tree, but without using the intermediate array.
To get to the middle element every time we are advancing two pointers, one by one element, other by two elements. By the time the second pointer is at the end, the first pointer will be at the middle.
The running time will be o(nlogn). The extra space will be o(logn). Not an efficient solution for a real situation where you can have R-B tree which guarantees nlogn insertion. But good enough for interview.
Similar to #Stuart Golodetz and #Jake Kurzer the important thing is that the list is already sorted.
In #Stuart's answer, the array he presented is the backing data structure for the BST. The find operation for example would just need to perform index array calculations to traverse the tree. Growing the array and removing elements would be the trickier part, so I'd prefer a vector or other constant time lookup data structure.
#Jake's answer also uses this fact but unfortunately requires you to traverse the list to find each time to do a get(index) operation. But requires no additional memory usage.
Unless it was specifically mentioned by the interviewer that they wanted an object structure representation of the tree, I would use #Stuart's answer.
In a question like this you'd be given extra points for discussing the tradeoffs and all the options that you have.
Hope the detailed explanation on this post helps:
http://preparefortechinterview.blogspot.com/2013/10/planting-trees_1.html
A slightly improved implementation from #1337c0d3r in my blog.
// create a balanced BST using #len elements starting from #head & move #head forward by #len
TreeNode *sortedListToBSTHelper(ListNode *&head, int len) {
if (0 == len) return NULL;
auto left = sortedListToBSTHelper(head, len / 2);
auto root = new TreeNode(head->val);
root->left = left;
head = head->next;
root->right = sortedListToBSTHelper(head, (len - 1) / 2);
return root;
}
TreeNode *sortedListToBST(ListNode *head) {
int n = length(head);
return sortedListToBSTHelper(head, n);
}
If you know how many nodes are in the linked list, you can do it like this:
// Gives path to subtree being built. If branch[N] is false, branch
// less from the node at depth N, if true branch greater.
bool branch[max depth];
// If rem[N] is true, then for the current subtree at depth N, it's
// greater subtree has one more node than it's less subtree.
bool rem[max depth];
// Depth of root node of current subtree.
unsigned depth = 0;
// Number of nodes in current subtree.
unsigned num_sub = Number of nodes in linked list;
// The algorithm relies on a stack of nodes whose less subtree has
// been built, but whose right subtree has not yet been built. The
// stack is implemented as linked list. The nodes are linked
// together by having the "greater" handle of a node set to the
// next node in the list. "less_parent" is the handle of the first
// node in the list.
Node *less_parent = nullptr;
// h is root of current subtree, child is one of its children.
Node *h, *child;
Node *p = head of the sorted linked list of nodes;
LOOP // loop unconditionally
LOOP WHILE (num_sub > 2)
// Subtract one for root of subtree.
num_sub = num_sub - 1;
rem[depth] = !!(num_sub & 1); // true if num_sub is an odd number
branch[depth] = false;
depth = depth + 1;
num_sub = num_sub / 2;
END LOOP
IF (num_sub == 2)
// Build a subtree with two nodes, slanting to greater.
// I arbitrarily chose to always have the extra node in the
// greater subtree when there is an odd number of nodes to
// split between the two subtrees.
h = p;
p = the node after p in the linked list;
child = p;
p = the node after p in the linked list;
make h and p into a two-element AVL tree;
ELSE // num_sub == 1
// Build a subtree with one node.
h = p;
p = the next node in the linked list;
make h into a leaf node;
END IF
LOOP WHILE (depth > 0)
depth = depth - 1;
IF (not branch[depth])
// We've completed a less subtree, exit while loop.
EXIT LOOP;
END IF
// We've completed a greater subtree, so attach it to
// its parent (that is less than it). We pop the parent
// off the stack of less parents.
child = h;
h = less_parent;
less_parent = h->greater_child;
h->greater_child = child;
num_sub = 2 * (num_sub - rem[depth]) + rem[depth] + 1;
IF (num_sub & (num_sub - 1))
// num_sub is not a power of 2
h->balance_factor = 0;
ELSE
// num_sub is a power of 2
h->balance_factor = 1;
END IF
END LOOP
IF (num_sub == number of node in original linked list)
// We've completed the full tree, exit outer unconditional loop
EXIT LOOP;
END IF
// The subtree we've completed is the less subtree of the
// next node in the sequence.
child = h;
h = p;
p = the next node in the linked list;
h->less_child = child;
// Put h onto the stack of less parents.
h->greater_child = less_parent;
less_parent = h;
// Proceed to creating greater than subtree of h.
branch[depth] = true;
num_sub = num_sub + rem[depth];
depth = depth + 1;
END LOOP
// h now points to the root of the completed AVL tree.
For an encoding of this in C++, see the build member function (currently at line 361) in https://github.com/wkaras/C-plus-plus-intrusive-container-templates/blob/master/avl_tree.h . It's actually more general, a template using any forward iterator rather than specifically a linked list.

Resources