Check if a tree is a mirror image? - algorithm

Given a binary tree which is huge and can not be placed in memory, how do you check if the tree is a mirror image.
I got this as an interview question

If a tree is a mirror image of another tree, the inorder traversal of one tree would be reverse of another.
So just do inorder traversal on the first tree and a reverse inorder traversal on another and check if all the elements are the same.

I can't take full credit for this reply of course; a handful of my colleagues helped with some assumptions and for poking holes in my original idea. Much thanks to them!
Assumptions
We can't have the entire tree in memory, so it's not ideal to use recursion. Let's assume, for simplicity's sake, that we can only hold a maximum of two nodes in memory.
We know n, the total number of levels in our tree.
We can perform seeks on the data with respect to the character or line position it's in.
The data that is on disk is ordered by depth. That is to say, the first entry on disk is the root, and the next two are its children, and the next four are its children's children, and so forth.
There are cases in which the data is perfectly mirrored, and cases in which it isn't. Blank data interlaced with non-blank data is considered "acceptable", unless otherwise specified.
We have freedom over using any data type we wish so long as the values can be compared for equivalence. Testing for object equivalence may not be ideal, so let's assume we're comparing primitives.
"Mirrored" means mirrored between the root's children. To use different terminologies, the grandparent's left child is mirrored with its right child, and the left child (parent)'s left child is mirrored with the grandparent's right child's right child. This is illustrated in the graph below; the matching symbols represent the mirroring we want to check for.
G
P* P*
C1& C2^ C3^ C4&
Approach
We know how many nodes on each level we should expect when we're reading from disk - some multiple of 2k. We can establish a double loop to iterate over the total depth of the tree, and the count of the nodes in each level. Inside of this, we can simply compare the outermost values for equivalence, and short-circuit if we find an unequal value.
We can determine the location of each outer location by using multiples of 2k. The leftmost child of any level will always be 2k, and the rightmost child of any level will always be 2k+1-1.
Small Proof: Outermost nodes on level 1 are 2 and 3; 21 = 2, 21+1-1 = 22-1 = 3. Outermost nodes on level 2 are 4 and 7; 22 = 4, 22+1-1 = 23-1 = 7. One could expand this all the way to the nth case.
Pseudocode
int k, i;
for(k = 1; k < n; k++) { // Skip root, trivially mirrored
for(i = 0; i < pow(2, k) / 2; i++) {
if(node_retrieve(i + pow(2, k)) != node_retrieve(pow(2, (k+1)-i)) {
return false;
}
}
}
return true;
Thoughts
This sort of question is a great interview question because, more than likely, they want to see how you would approach this problem. This approach may be horrible, it may be immaculate, but an employer would want you to take your time, draw things on a piece of paper or whiteboard, and ask them questions about how the data is stored, how it can be read, what limitations there are on seeks, etc etc.
It's not the coding aspect that interviewers are interested in, but the problem solving aspect.

Recursion is easy.
struct node {
struct node *left;
struct node *right;
int payload;
};
int is_not_mirror(struct node *one, struct node *two)
{
if (!one && !two) return 0;
if (!one) return 1;
if (!two) return 1;
if (compare(one->payload, two->payload)) return 1;
if (is_not_mirror(one->left, two->right)) return 1;
if (is_not_mirror(one->right, two->left)) return 1;
return 0;
}

Related

How to adapt Fenwick tree to answer range minimum queries

Fenwick tree is a data-structure that gives an efficient way to answer to main queries:
add an element to a particular index of an array update(index, value)
find sum of elements from 1 to N find(n)
both operations are done in O(log(n)) time and I understand the logic and implementation. It is not hard to implement a bunch of other operations like find a sum from N to M.
I wanted to understand how to adapt Fenwick tree for RMQ. It is obvious to change Fenwick tree for first two operations. But I am failing to figure out how to find minimum on the range from N to M.
After searching for solutions majority of people think that this is not possible and a small minority claims that it actually can be done (approach1, approach2).
The first approach (written in Russian, based on my google translate has 0 explanation and only two functions) relies on three arrays (initial, left and right) upon my testing was not working correctly for all possible test cases.
The second approach requires only one array and based on the claims runs in O(log^2(n)) and also has close to no explanation of why and how should it work. I have not tried to test it.
In light of controversial claims, I wanted to find out whether it is possible to augment Fenwick tree to answer update(index, value) and findMin(from, to).
If it is possible, I would be happy to hear how it works.
Yes, you can adapt Fenwick Trees (Binary Indexed Trees) to
Update value at a given index in O(log n)
Query minimum value for a range in O(log n) (amortized)
We need 2 Fenwick trees and an additional array holding the real values for nodes.
Suppose we have the following array:
index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
value 1 0 2 1 1 3 0 4 2 5 2 2 3 1 0
We wave a magic wand and the following trees appear:
Note that in both trees each node represents the minimum value for all nodes within that subtree. For example, in BIT2 node 12 has value 0, which is the minimum value for nodes 12,13,14,15.
Queries
We can efficiently query the minimum value for any range by calculating the minimum of several subtree values and one additional real node value. For example, the minimum value for range [2,7] can be determined by taking the minimum value of BIT2_Node2 (representing nodes 2,3) and BIT1_Node7 (representing node 7), BIT1_Node6 (representing nodes 5,6) and REAL_4 - therefore covering all nodes in [2,7]. But how do we know which sub trees we want to look at?
Query(int a, int b) {
int val = infinity // always holds the known min value for our range
// Start traversing the first tree, BIT1, from the beginning of range, a
int i = a
while (parentOf(i, BIT1) <= b) {
val = min(val, BIT2[i]) // Note: traversing BIT1, yet looking up values in BIT2
i = parentOf(i, BIT1)
}
// Start traversing the second tree, BIT2, from the end of range, b
i = b
while (parentOf(i, BIT2) >= a) {
val = min(val, BIT1[i]) // Note: traversing BIT2, yet looking up values in BIT1
i = parentOf(i, BIT2)
}
val = min(val, REAL[i]) // Explained below
return val
}
It can be mathematically proven that both traversals will end in the same node. That node is a part of our range, yet it is not a part of any subtrees we have looked at. Imagine a case where the (unique) smallest value of our range is in that special node. If we didn't look it up our algorithm would give incorrect results. This is why we have to do that one lookup into the real values array.
To help understand the algorithm I suggest you simulate it with pen & paper, looking up data in the example trees above. For example, a query for range [4,14] would return the minimum of values BIT2_4 (rep. 4,5,6,7), BIT1_14 (rep. 13,14), BIT1_12 (rep. 9,10,11,12) and REAL_8, therefore covering all possible values [4,14].
Updates
Since a node represents the minimum value of itself and its children, changing a node will affect its parents, but not its children. Therefore, to update a tree we start from the node we are modifying and move up all the way to the fictional root node (0 or N+1 depending on which tree).
Suppose we are updating some node in some tree:
If new value < old value, we will always overwrite the value and move up
If new value == old value, we can stop since there will be no more changes cascading upwards
If new value > old value, things get interesting.
If the old value still exists somewhere within that subtree, we are done
If not, we have to find the new minimum value between real[node] and each tree[child_of_node], change tree[node] and move up
Pseudocode for updating node with value v in a tree:
while (node <= n+1) {
if (v > tree[node]) {
if (oldValue == tree[node]) {
v = min(v, real[node])
for-each child {
v = min(v, tree[child])
}
} else break
}
if (v == tree[node]) break
tree[node] = v
node = parentOf(node, tree)
}
Note that oldValue is the original value we replaced, whereas v may be reassigned multiple times as we move up the tree.
Binary Indexing
In my experiments Range Minimum Queries were about twice as fast as a Segment Tree implementation and updates were marginally faster. The main reason for this is using super efficient bitwise operations for moving between nodes. They are very well explained here. Segment Trees are really simple to code so think about is the performance advantage really worth it? The update method of my Fenwick RMQ is 40 lines and took a while to debug. If anyone wants my code I can put it on github. I also produced a brute and test generators to make sure everything works.
I had help understanding this subject & implementing it from the Finnish algorithm community. Source of the image is http://ioinformatics.org/oi/pdf/v9_2015_39_44.pdf, but they credit Fenwick's 1994 paper for it.
The Fenwick tree structure works for addition because addition is invertible. It doesn't work for minimum, because as soon as you have a cell that's supposed to be the minimum of two or more inputs, you've lost information potentially.
If you're willing to double your storage requirements, you can support RMQ with a segment tree that is constructed implicitly, like a binary heap. For an RMQ with n values, store the n values at locations [n, 2n) of an array. Locations [1, n) are aggregates, with the formula A(k) = min(A(2k), A(2k+1)). Location 2n is an infinite sentinel. The update routine should look something like this.
def update(n, a, i, x): # value[i] = x
i += n
a[i] = x
# update the aggregates
while i > 1:
i //= 2
a[i] = min(a[2*i], a[2*i+1])
The multiplies and divides here can be replaced by shifts for efficiency.
The RMQ pseudocode is more delicate. Here's another untested and unoptimized routine.
def rmq(n, a, i, j): # min(value[i:j])
i += n
j += n
x = inf
while i < j:
if i%2 == 0:
i //= 2
else:
x = min(x, a[i])
i = i//2 + 1
if j%2 == 0:
j //= 2
else:
x = min(x, a[j-1])
j //= 2
return x

Trouble with a stack based algorithm

I'm working on this programming assignment. It tests our understanding of stacks and their applications. I find it extremely difficult to come up with an algorithm that can work efficiently and accurately. Some of their test cases have 200,000+ "trees"! While my algorithm can work for simpler test cases with less than 10 trees, it failed in the accuracy and efficiency departments when the number of "trees" is exceedingly large (from 100+ onwards).
I would appreciate it very much, if you guys can kindly give me a hint or point me to the right direction. Thank you.
Task Statement
Monkeys like to swing from tree to tree. They can swing from one tree
to another directly as long as there is no tree in between that is
taller than or have the same height as either one of the two trees.
For example, if there are 5 trees with heights 19m, 17m, 20m, 20m and
20m lining up in that order, then the monkey will be able to swing
from one tree to the other as shown below:
1. from first tree to second tree
2. from first tree to third tree
3. from second tree to third tree
4. from third tree to fourth tree
5. from fourth tree to fifth tree
Tarzan, the king of jungle who is able to communicate with the
monkeys, wants to test the monkeys to see if they know how to count
the total number of pairs of trees that they can swing directly from
one to the other. But he himself is not very good in counting. So he
turns to you, the best Java programmer in the country, to write a
program for getting the correct count for the trees in different parts
of the jungle.
Input
The first line contains N, the number of trees in the path. The next
line contains N integers a1 a2 a3 ... aN, where ai represents the
height of the i-th tree in the path, 0 < ai ≤ 231 and 2 ≤ N ≤ 500,000.
Note that short symbol N is used above for convenience. In your
program, you are expected to give it a descriptive name.
Output
The total number of pairs of trees which the monkeys can swing
directly from one to the other with the given list of tree heights.
Sample Input 1
4
3 4 1 2
Sample Output 1
4
Sample Input 2
5
19 17 20 20 20
Sample Output 2
5
Sample Input 3
4 1
2 21 21 12
Sample Output 3
3
Here's my code. So this is a method that returns the number of pairs of trees a monkey can swing with. The parameter is an array of inputs.
My algorithm goes as follows:
we set the numPairs to be (array length - 1), since all trees can be swing from one to another.
now we find the extra numPairs (extra trees to swing with).
push the first input into the empty stack
we enter a for loop:
for the next input until the end of array:
case1:
if the top of the stack is smaller than the current input and the size of the stack is equal to 1, then we replace the top with the input.
case2:
if the top of the stack smaller than the current input and the size of the stack is bigger than 1, we pop the top, and enter a while loop to pop the previous elements which is smaller than the current top of the stack.
we then push the current input after we exit the while loop.
case3:
otherwise, if the above conditions are not satisfied, we simply push the current input into the stack.
we exit the for loop
return the numPairs
public int solve(int[] arr) {
int input, temp;
numPairs = arr.length-1;
for(int i=0; i<arr.length; i++)
{
input = arr[i];
if(stack.isEmpty())
stack.push(input);
else if(!stack.isEmpty())
{
if(input>stack.peek() && stack.size() == 1)
{
stack.pop();
stack.push(input);
}
else if(input>stack.peek() && stack.size() > 1)
{
temp = stack.pop();
while(!stack.isEmpty() && temp < stack.peek())
{
numPairs++;
temp = stack.pop();
}
stack.push(input);
//numPairs++;
}
else
stack.push(input);
}
}
return numPairs;
}
Here's my solution, it's an iterative one.
class Result {
// declare the member field
Stack<Integer> stack;
int numPairs = 0;
// declare the constructor
public Result()
{
stack = new Stack<Integer>();
}
/*
* solve : to compute the result, return the result
* Pre-condition : parameter must be of array of integer type
* Post-condition : return the number of tree pairs that can be swung with
*/
public int solve(int[] arr) {
// implementation
int input;
for(int i=0; i<arr.length; i++)
{
input = arr[i];
if(stack.isEmpty()) //if stack is empty, just push the input
stack.push(input);
else if(!stack.isEmpty())
{
//do a while loop to pop all possible top stack element until
//the top element is bigger than the input
//or the stack is empty
while(!stack.isEmpty() && input > stack.peek())
{
stack.pop();
numPairs++;
}
//if the stack is empty after exiting the while loop
//push the current element onto the stack
if(stack.isEmpty())
stack.push(input);
//this condition applies for two cases:
//1. the while loop is never entered because the input is smaller than the top element by default
//2. the while loop is exited and the input is pushed onto the non-empty stack with numPairs being incremented
else if(!stack.isEmpty() && input < stack.peek())
{
stack.push(input);
numPairs++;
}
//this is the last condition:
//the input is never pushed if the input is identical to the top element
//instead we increment the numPairs
else if(input == stack.peek())
numPairs++;
}
}
return numPairs;
}
}
If I understand the problem correctly, there are two kinds of trees accessible to each other:
Trees that are next to each (adjacent) other are always accessible to each other
Trees that are not adjacent are only accessible if all the trees in between are shorter than both of the trees.
One might come up with several types of solutions for this:
The brute force solution: compare every tree to every other tree checking the conditions above. Running time: O(n^2)
Find near accessible neighbors solution: look for near neighbors that are accessible. Running time: close to O(n). Here's how this would work:
Build an array of tree sizes in order that they are given. Then walk this array in order and for every tree at index i:
Going to the right from i
If tree at i+1 is taller then tree at i break out (no more accessible neighbors can be found)
Add 1 to the count of accessible trees if tree at i+1 is shorter than tree at i+2
Do the same for trees i+2, i+3.. etc. until you find a tree that is taller than tree at i.
This will get a count of non-adjacent accessible trees for every tree. Then just add N*2-2 to the count to account for all the adjacent trees, and you are done.

How to find the rank of a node in an AVL tree?

I need to implement two rank queries [rank(k) and select(r)]. But before I can start on this, I need to figure out how the two functions work.
As far as I know, rank(k) returns the rank of a given key k, and select(r) returns the key of a given rank r.
So my questions are:
1.) How do you calculate the rank of a node in an AVL(self balancing BST)?
2.) Is it possible for more than one key to have the same rank? And if so, what woulud select(r) return?
I'm going to include a sample AVL tree which you can refer to if it helps answer the question.
Thanks!
Your question really boils down to: "how is the term 'rank' normally defined with respect to an AVL tree?" (and, possibly, how is 'select' normally defined as well).
At least as I've seen the term used, "rank" means the position among the nodes in the tree -- i.e., how many nodes are to its left. You're typically given a pointer to a node (or perhaps a key value) and you need to count the number of nodes to its left.
"Select" is basically the opposite -- you're given a particular rank, and need to retrieve a pointer to the specified node (or the key for that node).
Two notes: First, since neither of these modifies the tree at all, it makes no real difference what form of balancing is used (e.g., AVL vs. red/black); for that matter a tree with no balancing at all is equivalent as well. Second, if you need to do this frequently, you can improve speed considerably by adding an extra field to each node recording how many nodes are to its left.
Rank is the number of nodes in the Left sub tree plus one, and is calculated for every node. I believe rank is not a concept specific to AVL trees - it can be calculated for any binary tree.
Select is just opposite to rank. A rank is given and you have to return a node matching that rank.
The following code will perform rank calculation:
void InitRank(struct TreeNode *Node)
{
if(!Node)
{
return;
}
else
{ Node->rank = 1 + NumeberofNodeInTree(Node->LChild);
InitRank(Node->LChild);
InitRank(Node->RChild);
}
}
int NumeberofNodeInTree(struct TreeNode *Node)
{
if(!Node)
{
return 0;
}
else
{
return(1+NumeberofNodeInTree(Node->LChild)+NumeberofNodeInTree(Node->RChild));
}
}
Here is the code i wrote and worked fine for AVL Tree to get the rank of a particular value. difference is just you used a node as parameter and i used a key a parameter. you can modify this as your own way. Sample code:
public int rank(int data){
return rank(data,root);
}
private int rank(int data, AVLNode r){
int rank=1;
while(r != null){
if(data<r.data)
r = r.left;
else if(data > r.data){
rank += 1+ countNodes(r.left);
r = r.right;
}
else{
r.rank=rank+countNodes(r.left);
return r.rank;
}
}
return 0;
}
[N.B] If you want to start your rank from 0 then initialize variable rank=0.
you definitely should have implemented the method countNodes() to execute this code.

Create Balanced Binary Search Tree from Sorted linked list

What's the best way to create a balanced binary search tree from a sorted singly linked list?
How about creating nodes bottom-up?
This solution's time complexity is O(N). Detailed explanation in my blog post:
http://www.leetcode.com/2010/11/convert-sorted-list-to-balanced-binary.html
Two traversal of the linked list is all we need. First traversal to get the length of the list (which is then passed in as the parameter n into the function), then create nodes by the list's order.
BinaryTree* sortedListToBST(ListNode *& list, int start, int end) {
if (start > end) return NULL;
// same as (start+end)/2, avoids overflow
int mid = start + (end - start) / 2;
BinaryTree *leftChild = sortedListToBST(list, start, mid-1);
BinaryTree *parent = new BinaryTree(list->data);
parent->left = leftChild;
list = list->next;
parent->right = sortedListToBST(list, mid+1, end);
return parent;
}
BinaryTree* sortedListToBST(ListNode *head, int n) {
return sortedListToBST(head, 0, n-1);
}
You can't do better than linear time, since you have to at least read all the elements of the list, so you might as well copy the list into an array (linear time) and then construct the tree efficiently in the usual way, i.e. if you had the list [9,12,18,23,24,51,84], then you'd start by making 23 the root, with children 12 and 51, then 9 and 18 become children of 12, and 24 and 84 become children of 51. Overall, should be O(n) if you do it right.
The actual algorithm, for what it's worth, is "take the middle element of the list as the root, and recursively build BSTs for the sub-lists to the left and right of the middle element and attach them below the root".
Best isn't only about asynmptopic run time. The sorted linked list has all the information needed to create the binary tree directly, and I think this is probably what they are looking for
Note that the first and third entries become children of the second, then the fourth node has chidren of the second and sixth (which has children the fifth and seventh) and so on...
in psuedo code
read three elements, make a node from them, mark as level 1, push on stack
loop
read three elemeents and make a node of them
mark as level 1
push on stack
loop while top two enties on stack have same level (n)
make node of top two entries, mark as level n + 1, push on stack
while elements remain in list
(with a bit of adjustment for when there's less than three elements left or an unbalanced tree at any point)
EDIT:
At any point, there is a left node of height N on the stack. Next step is to read one element, then read and construct another node of height N on the stack. To construct a node of height N, make and push a node of height N -1 on the stack, then read an element, make another node of height N-1 on the stack -- which is a recursive call.
Actually, this means the algorithm (even as modified) won't produce a balanced tree. If there are 2N+1 nodes, it will produce a tree with 2N-1 values on the left, and 1 on the right.
So I think #sgolodetz's answer is better, unless I can think of a way of rebalancing the tree as it's built.
Trick question!
The best way is to use the STL, and advantage yourself of the fact that the sorted associative container ADT, of which set is an implementation, demands insertion of sorted ranges have amortized linear time. Any passable set of core data structures for any language should offer a similar guarantee. For a real answer, see the quite clever solutions others have provided.
What's that? I should offer something useful?
Hum...
How about this?
The smallest possible meaningful tree in a balanced binary tree is 3 nodes.
A parent, and two children. The very first instance of such a tree is the first three elements. Child-parent-Child. Let's now imagine this as a single node. Okay, well, we no longer have a tree. But we know that the shape we want is Child-parent-Child.
Done for a moment with our imaginings, we want to keep a pointer to the parent in that initial triumvirate. But it's singly linked!
We'll want to have four pointers, which I'll call A, B, C, and D. So, we move A to 1, set B equal to A and advance it one. Set C equal to B, and advance it two. The node under B already points to its right-child-to-be. We build our initial tree. We leave B at the parent of Tree one. C is sitting at the node that will have our two minimal trees as children. Set A equal to C, and advance it one. Set D equal to A, and advance it one. We can now build our next minimal tree. D points to the root of that tree, B points to the root of the other, and C points to the... the new root from which we will hang our two minimal trees.
How about some pictures?
[A][B][-][C]
With our image of a minimal tree as a node...
[B = Tree][C][A][D][-]
And then
[Tree A][C][Tree B]
Except we have a problem. The node two after D is our next root.
[B = Tree A][C][A][D][-][Roooooot?!]
It would be a lot easier on us if we could simply maintain a pointer to it instead of to it and C. Turns out, since we know it will point to C, we can go ahead and start constructing the node in the binary tree that will hold it, and as part of this we can enter C into it as a left-node. How can we do this elegantly?
Set the pointer of the Node under C to the node Under B.
It's cheating in every sense of the word, but by using this trick, we free up B.
Alternatively, you can be sane, and actually start building out the node structure. After all, you really can't reuse the nodes from the SLL, they're probably POD structs.
So now...
[TreeA]<-[C][A][D][-][B]
[TreeA]<-[C]->[TreeB][B]
And... Wait a sec. We can use this same trick to free up C, if we just let ourselves think of it as a single node instead of a tree. Because after all, it really is just a single node.
[TreeC]<-[B][A][D][-][C]
We can further generalize our tricks.
[TreeC]<-[B][TreeD]<-[C][-]<-[D][-][A]
[TreeC]<-[B][TreeD]<-[C]->[TreeE][A]
[TreeC]<-[B]->[TreeF][A]
[TreeG]<-[A][B][C][-][D]
[TreeG]<-[A][-]<-[C][-][D]
[TreeG]<-[A][TreeH]<-[D][B][C][-]
[TreeG]<-[A][TreeH]<-[D][-]<-[C][-][B]
[TreeG]<-[A][TreeJ]<-[B][-]<-[C][-][D]
[TreeG]<-[A][TreeJ]<-[B][TreeK]<-[D][-]<-[C][-]
[TreeG]<-[A][TreeJ]<-[B][TreeK]<-[D][-]<-[C][-]
We are missing a critical step!
[TreeG]<-[A]->([TreeJ]<-[B]->([TreeK]<-[D][-]<-[C][-]))
Becomes :
[TreeG]<-[A]->[TreeL->([TreeK]<-[D][-]<-[C][-])][B]
[TreeG]<-[A]->[TreeL->([TreeK]<-[D]->[TreeM])][B]
[TreeG]<-[A]->[TreeL->[TreeN]][B]
[TreeG]<-[A]->[TreeO][B]
[TreeP]<-[B]
Obviously, the algorithm can be cleaned up considerably, but I thought it would be interesting to demonstrate how one can optimize as you go by iteratively designing your algorithm. I think this kind of process is what a good employer should be looking for more than anything.
The trick, basically, is that each time we reach the next midpoint, which we know is a parent-to-be, we know that its left subtree is already finished. The other trick is that we are done with a node once it has two children and something pointing to it, even if all of the sub-trees aren't finished. Using this, we can get what I am pretty sure is a linear time solution, as each element is touched only 4 times at most. The problem is that this relies on being given a list that will form a truly balanced binary search tree. There are, in other words, some hidden constraints that may make this solution either much harder to apply, or impossible. For example, if you have an odd number of elements, or if there are a lot of non-unique values, this starts to produce a fairly silly tree.
Considerations:
Render the element unique.
Insert a dummy element at the end if the number of nodes is odd.
Sing longingly for a more naive implementation.
Use a deque to keep the roots of completed subtrees and the midpoints in, instead of mucking around with my second trick.
This is a python implementation:
def sll_to_bbst(sll, start, end):
"""Build a balanced binary search tree from sorted linked list.
This assumes that you have a class BinarySearchTree, with properties
'l_child' and 'r_child'.
Params:
sll: sorted linked list, any data structure with 'popleft()' method,
which removes and returns the leftmost element of the list. The
easiest thing to do is to use 'collections.deque' for the sorted
list.
start: int, start index, on initial call set to 0
end: int, on initial call should be set to len(sll)
Returns:
A balanced instance of BinarySearchTree
This is a python implementation of solution found here:
http://leetcode.com/2010/11/convert-sorted-list-to-balanced-binary.html
"""
if start >= end:
return None
middle = (start + end) // 2
l_child = sll_to_bbst(sll, start, middle)
root = BinarySearchTree(sll.popleft())
root.l_child = l_child
root.r_child = sll_to_bbst(sll, middle+1, end)
return root
Instead of the sorted linked list i was asked on a sorted array (doesn't matter though logically, but yes run-time varies) to create a BST of minimal height, following is the code i could get out:
typedef struct Node{
struct Node *left;
int info;
struct Node *right;
}Node_t;
Node_t* Bin(int low, int high) {
Node_t* node = NULL;
int mid = 0;
if(low <= high) {
mid = (low+high)/2;
node = CreateNode(a[mid]);
printf("DEBUG: creating node for %d\n", a[mid]);
if(node->left == NULL) {
node->left = Bin(low, mid-1);
}
if(node->right == NULL) {
node->right = Bin(mid+1, high);
}
return node;
}//if(low <=high)
else {
return NULL;
}
}//Bin(low,high)
Node_t* CreateNode(int info) {
Node_t* node = malloc(sizeof(Node_t));
memset(node, 0, sizeof(Node_t));
node->info = info;
node->left = NULL;
node->right = NULL;
return node;
}//CreateNode(info)
// call function for an array example: 6 7 8 9 10 11 12, it gets you desired
// result
Bin(0,6);
HTH Somebody..
This is the pseudo recursive algorithm that I will suggest.
createTree(treenode *root, linknode *start, linknode *end)
{
if(start == end or start = end->next)
{
return;
}
ptrsingle=start;
ptrdouble=start;
while(ptrdouble != end and ptrdouble->next !=end)
{
ptrsignle=ptrsingle->next;
ptrdouble=ptrdouble->next->next;
}
//ptrsignle will now be at the middle element.
treenode cur_node=Allocatememory;
cur_node->data = ptrsingle->data;
if(root = null)
{
root = cur_node;
}
else
{
if(cur_node->data (less than) root->data)
root->left=cur_node
else
root->right=cur_node
}
createTree(cur_node, start, ptrSingle);
createTree(cur_node, ptrSingle, End);
}
Root = null;
The inital call will be createtree(Root, list, null);
We are doing the recursive building of the tree, but without using the intermediate array.
To get to the middle element every time we are advancing two pointers, one by one element, other by two elements. By the time the second pointer is at the end, the first pointer will be at the middle.
The running time will be o(nlogn). The extra space will be o(logn). Not an efficient solution for a real situation where you can have R-B tree which guarantees nlogn insertion. But good enough for interview.
Similar to #Stuart Golodetz and #Jake Kurzer the important thing is that the list is already sorted.
In #Stuart's answer, the array he presented is the backing data structure for the BST. The find operation for example would just need to perform index array calculations to traverse the tree. Growing the array and removing elements would be the trickier part, so I'd prefer a vector or other constant time lookup data structure.
#Jake's answer also uses this fact but unfortunately requires you to traverse the list to find each time to do a get(index) operation. But requires no additional memory usage.
Unless it was specifically mentioned by the interviewer that they wanted an object structure representation of the tree, I would use #Stuart's answer.
In a question like this you'd be given extra points for discussing the tradeoffs and all the options that you have.
Hope the detailed explanation on this post helps:
http://preparefortechinterview.blogspot.com/2013/10/planting-trees_1.html
A slightly improved implementation from #1337c0d3r in my blog.
// create a balanced BST using #len elements starting from #head & move #head forward by #len
TreeNode *sortedListToBSTHelper(ListNode *&head, int len) {
if (0 == len) return NULL;
auto left = sortedListToBSTHelper(head, len / 2);
auto root = new TreeNode(head->val);
root->left = left;
head = head->next;
root->right = sortedListToBSTHelper(head, (len - 1) / 2);
return root;
}
TreeNode *sortedListToBST(ListNode *head) {
int n = length(head);
return sortedListToBSTHelper(head, n);
}
If you know how many nodes are in the linked list, you can do it like this:
// Gives path to subtree being built. If branch[N] is false, branch
// less from the node at depth N, if true branch greater.
bool branch[max depth];
// If rem[N] is true, then for the current subtree at depth N, it's
// greater subtree has one more node than it's less subtree.
bool rem[max depth];
// Depth of root node of current subtree.
unsigned depth = 0;
// Number of nodes in current subtree.
unsigned num_sub = Number of nodes in linked list;
// The algorithm relies on a stack of nodes whose less subtree has
// been built, but whose right subtree has not yet been built. The
// stack is implemented as linked list. The nodes are linked
// together by having the "greater" handle of a node set to the
// next node in the list. "less_parent" is the handle of the first
// node in the list.
Node *less_parent = nullptr;
// h is root of current subtree, child is one of its children.
Node *h, *child;
Node *p = head of the sorted linked list of nodes;
LOOP // loop unconditionally
LOOP WHILE (num_sub > 2)
// Subtract one for root of subtree.
num_sub = num_sub - 1;
rem[depth] = !!(num_sub & 1); // true if num_sub is an odd number
branch[depth] = false;
depth = depth + 1;
num_sub = num_sub / 2;
END LOOP
IF (num_sub == 2)
// Build a subtree with two nodes, slanting to greater.
// I arbitrarily chose to always have the extra node in the
// greater subtree when there is an odd number of nodes to
// split between the two subtrees.
h = p;
p = the node after p in the linked list;
child = p;
p = the node after p in the linked list;
make h and p into a two-element AVL tree;
ELSE // num_sub == 1
// Build a subtree with one node.
h = p;
p = the next node in the linked list;
make h into a leaf node;
END IF
LOOP WHILE (depth > 0)
depth = depth - 1;
IF (not branch[depth])
// We've completed a less subtree, exit while loop.
EXIT LOOP;
END IF
// We've completed a greater subtree, so attach it to
// its parent (that is less than it). We pop the parent
// off the stack of less parents.
child = h;
h = less_parent;
less_parent = h->greater_child;
h->greater_child = child;
num_sub = 2 * (num_sub - rem[depth]) + rem[depth] + 1;
IF (num_sub & (num_sub - 1))
// num_sub is not a power of 2
h->balance_factor = 0;
ELSE
// num_sub is a power of 2
h->balance_factor = 1;
END IF
END LOOP
IF (num_sub == number of node in original linked list)
// We've completed the full tree, exit outer unconditional loop
EXIT LOOP;
END IF
// The subtree we've completed is the less subtree of the
// next node in the sequence.
child = h;
h = p;
p = the next node in the linked list;
h->less_child = child;
// Put h onto the stack of less parents.
h->greater_child = less_parent;
less_parent = h;
// Proceed to creating greater than subtree of h.
branch[depth] = true;
num_sub = num_sub + rem[depth];
depth = depth + 1;
END LOOP
// h now points to the root of the completed AVL tree.
For an encoding of this in C++, see the build member function (currently at line 361) in https://github.com/wkaras/C-plus-plus-intrusive-container-templates/blob/master/avl_tree.h . It's actually more general, a template using any forward iterator rather than specifically a linked list.

Hashing a Tree Structure

I've just come across a scenario in my project where it I need to compare different tree objects for equality with already known instances, and have considered that some sort of hashing algorithm that operates on an arbitrary tree would be very useful.
Take for example the following tree:
O
/ \
/ \
O O
/|\ |
/ | \ |
O O O O
/ \
/ \
O O
Where each O represents a node of the tree, is an arbitrary object, has has an associated hash function. So the problem reduces to: given the hash code of the nodes of tree structure, and a known structure, what is a decent algorithm for computing a (relatively) collision-free hash code for the entire tree?
A few notes on the properties of the hash function:
The hash function should depend on the hash code of every node within the tree as well as its position.
Reordering the children of a node should distinctly change the resulting hash code.
Reflecting any part of the tree should distinctly change the resulting hash code
If it helps, I'm using C# 4.0 here in my project, though I'm primarily looking for a theoretical solution, so pseudo-code, a description, or code in another imperative language would be fine.
UPDATE
Well, here's my own proposed solution. It has been helped much by several of the answers here.
Each node (sub-tree/leaf node) has the following hash function:
public override int GetHashCode()
{
int hashCode = unchecked((this.Symbol.GetHashCode() * 31 +
this.Value.GetHashCode()));
for (int i = 0; i < this.Children.Count; i++)
hashCode = unchecked(hashCode * 31 + this.Children[i].GetHashCode());
return hashCode;
}
The nice thing about this method, as I see it, is that hash codes can be cached and only recalculated when the node or one of its descendants changes. (Thanks to vatine and Jason Orendorff for pointing this out).
Anyway, I would be grateful if people could comment on my suggested solution here - if it does the job well, then great, otherwise any possible improvements would be welcome.
If I were to do this, I'd probably do something like the following:
For each leaf node, compute the concatenation of 0 and the hash of the node data.
For each internal node, compute the concatenation of 1 and the hash of any local data (NB: may not be applicable) and the hash of the children from left to right.
This will lead to a cascade up the tree every time you change anything, but that MAY be low-enough of an overhead to be worthwhile. If changes are relatively infrequent compared to the amount of changes, it may even make sense to go for a cryptographically secure hash.
Edit1: There is also the possibility of adding a "hash valid" flag to each node and simply propagate a "false" up the tree (or "hash invalid" and propagate "true") up the tree on a node change. That way, it may be possible to avoid a complete recalculation when the tree hash is needed and possibly avoid multiple hash calculations that are not used, at the risk of slightly less predictable time to get a hash when needed.
Edit3: The hash code suggested by Noldorin in the question looks like it would have a chance of collisions, if the result of GetHashCode can ever be 0. Essentially, there is no way of distinguishing a tree composed of a single node, with "symbol hash" 30 and "value hash" 25 and a two-node tree, where the root has a "symbol hash" of 0 and a "value hash" of 30 and the child node has a total hash of 25. The examples are entirely invented, I don't know what expected hash ranges are so I can only comment on what I see in the presented code.
Using 31 as the multiplicative constant is good, in that it will cause any overflow to happen on a non-bit boundary, although I am thinking that, with sufficient children and possibly adversarial content in the tree, the hash contribution from items hashed early MAY be dominated by later hashed items.
However, if the hash performs decently on expected data, it looks as if it will do the job. It's certainly faster than using a cryptographic hash (as done in the example code listed below).
Edit2: As for specific algorithms and minimum data structure needed, something like the following (Python, translating to any other language should be relatively easy).
#! /usr/bin/env python
import Crypto.Hash.SHA
class Node:
def __init__ (self, parent=None, contents="", children=[]):
self.valid = False
self.hash = False
self.contents = contents
self.children = children
def append_child (self, child):
self.children.append(child)
self.invalidate()
def invalidate (self):
self.valid = False
if self.parent:
self.parent.invalidate()
def gethash (self):
if self.valid:
return self.hash
digester = crypto.hash.SHA.new()
digester.update(self.contents)
if self.children:
for child in self.children:
digester.update(child.gethash())
self.hash = "1"+digester.hexdigest()
else:
self.hash = "0"+digester.hexdigest()
return self.hash
def setcontents (self):
self.valid = False
return self.contents
Okay, after your edit where you've introduced a requirement that the hashing result should be different for different tree layouts, you're only left with option to traverse the whole tree and write its structure to a single array.
That's done like this: you traverse the tree and dump the operations you do. For an original tree that could be (for a left-child-right-sibling structure):
[1, child, 2, child, 3, sibling, 4, sibling, 5, parent, parent, //we're at root again
sibling, 6, child, 7, child, 8, sibling, 9, parent, parent]
You may then hash the list (that is, effectively, a string) the way you like. As another option, you may even return this list as a result of hash-function, so it becomes collision-free tree representation.
But adding precise information about the whole structure is not what hash functions usually do. The way proposed should compute hash function of every node as well as traverse the whole tree. So you may consider other ways of hashing, described below.
If you don't want to traverse the whole tree:
One algorithm that immediately came to my mind is like this. Pick a large prime number H (that's greater than maximal number of children). To hash a tree, hash its root, pick a child number H mod n, where n is the number of children of root, and recursively hash the subtree of this child.
This seems to be a bad option if trees differ only deeply near the leaves. But at least it should run fast for not very tall trees.
If you want to hash less elements but go through the whole tree:
Instead of hashing subtree, you may want to hash layer-wise. I.e. hash root first, than hash one of nodes that are its children, then one of children of the children etc. So you cover the whole tree instead of one of specific paths. This makes hashing procedure slower, of course.
--- O ------- layer 0, n=1
/ \
/ \
--- O --- O ----- layer 1, n=2
/|\ |
/ | \ |
/ | \ |
O - O - O O------ layer 2, n=4
/ \
/ \
------ O --- O -- layer 3, n=2
A node from a layer is picked with H mod n rule.
The difference between this version and previous version is that a tree should undergo quite an illogical transformation to retain the hash function.
The usual technique of hashing any sequence is combining the values (or hashes thereof) of its elements in some mathematical way. I don't think a tree would be any different in this respect.
For example, here is the hash function for tuples in Python (taken from Objects/tupleobject.c in the source of Python 2.6):
static long
tuplehash(PyTupleObject *v)
{
register long x, y;
register Py_ssize_t len = Py_SIZE(v);
register PyObject **p;
long mult = 1000003L;
x = 0x345678L;
p = v->ob_item;
while (--len >= 0) {
y = PyObject_Hash(*p++);
if (y == -1)
return -1;
x = (x ^ y) * mult;
/* the cast might truncate len; that doesn't change hash stability */
mult += (long)(82520L + len + len);
}
x += 97531L;
if (x == -1)
x = -2;
return x;
}
It's a relatively complex combination with constants experimentally chosen for best results for tuples of typical lengths. What I'm trying to show with this code snippet is that the issue is very complex and very heuristic, and the quality of the results probably depend on the more specific aspects of your data - i.e. domain knowledge may help you reach better results. However, for good-enough results you shouldn't look too far. I would guess that taking this algorithm and combining all the nodes of the tree instead of all the tuple elements, plus adding their position into play will give you a pretty good algorithm.
One option of taking the position into account is the node's position in an inorder walk of the tree.
Any time you are working with trees recursion should come to mind:
public override int GetHashCode() {
int hash = 5381;
foreach(var node in this.BreadthFirstTraversal()) {
hash = 33 * hash + node.GetHashCode();
}
}
The hash function should depend on the hash code of every node within the tree as well as its position.
Check. We are explicitly using node.GetHashCode() in the computation of the tree's hash code. Further, because of the nature of the algorithm, a node's position plays a role in the tree's ultimate hash code.
Reordering the children of a node should distinctly change the resulting hash code.
Check. They will be visited in a different order in the in-order traversal leading to a different hash code. (Note that if there are two children with the same hash code you will end up with the same hash code upon swapping the order of those children.)
Reflecting any part of the tree should distinctly change the resulting hash code
Check. Again the nodes would be visited in a different order leading to a different hash code. (Note that there are circumstances where the reflection could lead to the same hash code if every node is reflected into a node with the same hash code.)
The collision-free property of this will depend on how collision-free the hash function used for the node data is.
It sounds like you want a system where the hash of a particular node is a combination of the child node hashes, where order matters.
If you're planning on manipulating this tree a lot, you may want to pay the price in space of storing the hashcode with each node, to avoid the penalty of recalculation when performing operations on the tree.
Since the order of the child nodes matters, a method which might work here would be to combine the node data and children using prime number multiples and addition modulo some large number.
To go for something similar to Java's String hashcode:
Say you have n child nodes.
hash(node) = hash(nodedata) +
hash(childnode[0]) * 31^(n-1) +
hash(childnode[1]) * 31^(n-2) +
<...> +
hash(childnode[n])
Some more detail on the scheme used above can be found here: http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/
I can see that if you have a large set of trees to compare, then you could use a hash function to retrieve a set of potential candidates, then do a direct comparison.
A substring that would work is just use lisp syntax to put brackets around the tree, write out the identifiere of each node in pre-order. But this is computationally equivalent to a pre-order comparison of the tree, so why not just do that?
I've given 2 solutions: one is for comparing the two trees when you're done (needed to resolve collisions) and the other to compute the hashcode.
TREE COMPARISON:
The most efficient way to compare will be to simply recursively traverse each tree in a fixed order (pre-order is simple and as good as anything else), comparing the node at each step.
So, just create a Visitor pattern that successively returns the next node in pre-order for a tree. i.e. it's constructor can take the root of the tree.
Then, just create two insces of the Visitor, that act as generators for the next node in preorder. i.e. Vistor v1 = new Visitor(root1), Visitor v2 = new Visitor(root2)
Write a comparison function that can compare itself to another node.
Then just visit each node of the trees, comparing, and returning false if comparison fails. i.e.
Module
Function Compare(Node root1, Node root2)
Visitor v1 = new Visitor(root1)
Visitor v2 = new Visitor(root2)
loop
Node n1 = v1.next
Node n2 = v2.next
if (n1 == null) and (n2 == null) then
return true
if (n1 == null) or (n2 == null) then
return false
if n1.compare(n2) != 0 then
return false
end loop
// unreachable
End Function
End Module
HASH CODE GENERATION:
if you want to write out a string representation of the tree, you can use the lisp syntax for a tree, then sample the string to generate a shorter hashcode.
Module
Function TreeToString(Node n1) : String
if node == null
return ""
String s1 = "(" + n1.toString()
for each child of n1
s1 = TreeToString(child)
return s1 + ")"
End Function
The node.toString() can return the unique label/hash code/whatever for that node. Then you can just do a substring comparison from the strings returned by the TreeToString function to determine if the trees are equivalent. For a shorter hashcode, just sample the TreeToString Function, i.e. take every 5 character.
End Module
I think you could do this recursively: Assume you have a hash function h that hashes strings of arbitrary length (e.g. SHA-1). Now, the hash of a tree is the hash of a string that is created as a concatenation of the hash of the current element (you have your own function for that) and hashes of all the children of that node (from recursive calls of the function).
For a binary tree you would have:
Hash( h(node->data) || Hash(node->left) || Hash(node->right) )
You may need to carefully check if tree geometry is properly accounted for. I think that with some effort you could derive a method for which finding collisions for such trees could be as hard as finding collisions in the underlying hash function.
A simple enumeration (in any deterministic order) together with a hash function that depends when the node is visited should work.
int hash(Node root) {
ArrayList<Node> worklist = new ArrayList<Node>();
worklist.add(root);
int h = 0;
int n = 0;
while (!worklist.isEmpty()) {
Node x = worklist.remove(worklist.size() - 1);
worklist.addAll(x.children());
h ^= place_hash(x.hash(), n);
n++;
}
return h;
}
int place_hash(int hash, int place) {
return (Integer.toString(hash) + "_" + Integer.toString(place)).hash();
}
class TreeNode
{
public static QualityAgainstPerformance = 3; // tune this for your needs
public static PositionMarkConstan = 23498735; // just anything
public object TargetObject; // this is a subject of this TreeNode, which has to add it's hashcode;
IEnumerable<TreeNode> GetChildParticipiants()
{
yield return this;
foreach(var child in Children)
{
yield return child;
foreach(var grandchild in child.GetParticipiants() )
yield return grandchild;
}
IEnumerable<TreeNode> GetParentParticipiants()
{
TreeNode parent = Parent;
do
yield return parent;
while( ( parent = parent.Parent ) != null );
}
public override int GetHashcode()
{
int computed = 0;
var nodesToCombine =
(Parent != null ? Parent : this).GetChildParticipiants()
.Take(QualityAgainstPerformance/2)
.Concat(GetParentParticipiants().Take(QualityAgainstPerformance/2));
foreach(var node in nodesToCombine)
{
if ( node.ReferenceEquals(this) )
computed = AddToMix(computed, PositionMarkConstant );
computed = AddToMix(computed, node.GetPositionInParent());
computed = AddToMix(computed, node.TargetObject.GetHashCode());
}
return computed;
}
}
AddToTheMix is a function, which combines the two hashcodes, so the sequence matters.
I don't know what it is, but you can figure out. Some bit shifting, rounding, you know...
The idea is that you have to analyse some environment of the node, depending on the quality you want to achieve.
I have to say, that you requirements are somewhat against the entire concept of hashcodes.
Hash function computational complexity should be very limited.
It's computational complexity should not linearly depend on the size of the container (the tree), otherwise it totally breaks the hashcode-based algorithms.
Considering the position as a major property of the nodes hash function also somewhat goes against the concept of the tree, but achievable, if you replace the requirement, that it HAS to depend on the position.
Overall principle i would suggest, is replacing MUST requirements with SHOULD requirements.
That way you can come up with appropriate and efficient algorithm.
For example, consider building a limited sequence of integer hashcode tokens, and add what you want to this sequence, in the order of preference.
Order of the elements in this sequence is important, it affects the computed value.
for example for each node you want to compute:
add the hashcode of underlying object
add the hashcodes of underlying objects of the nearest siblings, if available. I think, even the single left sibling would be enough.
add the hashcode of underlying object of the parent and it's nearest siblings like for the node itself, same as 2.
repeat this to with the grandparents to a limited depth.
//--------5------- ancestor depth 2 and it's left sibling;
//-------/|------- ;
//------4-3------- ancestor depth 1 and it's left sibling;
//-------/|------- ;
//------2-1------- this;
the fact that you are adding a direct sibling's underlying object's hashcode gives a positional property to the hashfunction.
if this is not enough, add the children:
You should add every child, just some to give a decent hashcode.
add the first child and it's first child and it's first child.. limit the depth to some constant, and do not compute anything recursively - just the underlying node's object's hashcode.
//----- this;
//-----/--;
//----6---;
//---/--;
//--7---;
This way the complexity is linear to the depth of the underlying tree, not the total number of elements.
Now you have a sequence if integers, combine them with a known algorithm, like Ely suggests above.
1,2,...7
This way, you will have a lightweight hash function, with a positional property, not dependent on the total size of the tree, and even not dependent on the tree depth, and not requiring to recompute hash function of the entire tree when you change the tree structure.
I bet this 7 numbers would give a hash destribution near to perfect.
Writing your own hash function is almost always a bug, because you basically need a degree in mathematics to do it well. Hashfunctions are incredibly nonintuitive, and have highly unpredictable collision characteristics.
Don't try directly combining hashcodes for Child nodes -- this will magnify any problems in the underlying hash functions. Instead, concatenate the raw bytes from each node in order, and feed this as a byte stream to a tried-and-true hash function. All the cryptographic hash functions can accept a byte stream. If the tree is small, you may want to just create a byte array and hash it in one operation.

Resources