In-order traversal of unthreaded binary search tree, without a stack - algorithm

Suppose that we somehow get a dump of an alien computer's memory. Unfortunately, said alien civilization had much large RAM sizes than we have, and somehow loved functional languages so much that all the data is in a gigantic, unthreaded, binary search tree (with no parent pointers either). Hey, aliens have lots of stack for recursion!
Now this binary tree is in a gigantic 100 terabyte disk array. We need some way of getting an in-order traversal. There is the recursive way, which uses up stack, and no computer has 100 terabytes of stack, and also the "iterative" way, which is really manually maintaining a stack.
We are allowed to modify the tree, but only with additional pointer fields and integer fields at the nodes. This is because the 100 terabytes disk array is almost completely full. We definitely can't do something like use another 100 terabytes as mmap'ed stack or something.
How can this impossible task be completed? The really infuriating part is that, hey, the tree is sitting there, perfectly ordered, inside the disk array, but we can't seem to get it out in order.

You can traverse the tree, by temporarily linking the rightmost child in each tree to its successor in the inorder traversal. For example, when you first come to t:
t
/ ^ \
a | d
/\ |
b c-+
you link the rightmost element of the left child of t back to t via the originally null
right child pointer. Later on when following right pointers, you come back to t and try to repeat the same procedure, but this time arriving back to t. In this case you restore the pointer to null, traverse t and continue traversing the right child of t.
This is essentially exercise 21 in "The Art of Computer Programming", Vol.1, section 2.3.1.
struct tree {
tree (int v) : value (v), left (nullptr), right (nullptr) {}
int value;
tree *left, *right;
};
template<typename F>
void
inorder (tree *t, F visit) {
while (t != nullptr) {
if (t->left == nullptr) {
visit (t);
t = t->right;
} else {
tree *q, *r;
r = t;
q = t = t->left;
while (q->right && q->right != r)
q = q->right;
if (q->right == nullptr)
q->right = r;
else {
q->right = nullptr;
visit (r);
t = r->right;
}
}
}
}

Related

Find the number of nodes in a general binary tree that can be searched using BST searching algorithm

First of all, we know that the searching algorithm of a BST looks like this:
// Function to check if given key exists in given BST.
bool search (node* root, int key)
{
if (root == NULL)
return false;
if (root->key == key)
return true;
if (root->key > key)
return search(root->left, key);
else
return search(root->right, key);
}
This searching algorithm is usually applied in a binary search tree. However, when it comes to a general binary tree, such algorithm may give us wrong results.
The following question has trapped me for quite a long time.
Given a general binary tree, count how many nodes in it can be found using the BST searching algorithm above.
Take the binary tree below as an example. The colored nodes are searchable, so the answer is 3.
Suppose the keys in a tree are unique, and we know the values of all the keys.
My thoughts
I have a naive solution in my mind, which is to call the searching algorithm for every possible key, and count how many times the function returns true.
However, I want to reduce the times of calling functions, and also to improve the time complexity. My intuition tells me that recursion can be useful, but I'm not sure.
I think for each node, we need the information about its parent (or ancestors), therefore I have thought about defining the binary tree data structure as follows
struct node {
int key;
struct node* left;
struct node* right;
struct node* parent; // Adding a 'parent' pointer may be useful....
};
I couldn't really figure out an efficient way to tell if a node is searchable, neither can I come up with one to find out the number of searchable nodes. Thus I came here to look for help. A hint will be better than a full solution.
This is my first time asking a question on Stack Overflow. If you think this post needs improvement, feel free to leave a comment. Thanks for reading.
To count the keys that can be found, you should traverse the tree and keep track of the range (low, high) that is implied by the path you took from the root. So when you go left from a node that has key 5, then you should consider that you cannot find any values any more that are greater than 5 (or equal, as that value is already accounted for). If that node's left child node has key 2, and you take a right, then you know that you cannot find any values any more that are less than 2. So your window has at that moment narrowed to (2, 5). If that window becomes empty, than it makes no sense to dig deeper in that direction of the tree.
This is an algorithm you can apply easily using recursion. Here is some code:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
typedef struct node {
int key;
struct node* left;
struct node* right;
} Node;
Node *create_node(int key, Node *left, Node *right) {
Node * node = malloc(sizeof(struct node));
node->key = key;
node->left = left;
node->right = right;
return node;
}
int count_bst_nodes_recur(Node *node, int low, int high) {
return node == NULL || node->key <= low || node->key >= high ? 0
: 1 + count_bst_nodes_recur(node->left, low, node->key)
+ count_bst_nodes_recur(node->right, node->key, high);
}
int count_bst_nodes(Node *node) {
return count_bst_nodes_recur(node, -INT_MAX, INT_MAX);
}
int main(void) {
// Create example tree given in the question:
Node *tree = create_node(1,
create_node(2,
create_node(4, NULL, NULL),
create_node(5, NULL, NULL)
),
create_node(6,
NULL,
create_node(7, NULL, NULL)
)
);
printf("Number of searchable keys: %d\n", count_bst_nodes(tree)); // -> 3
return 0;
}
The following property is very important for solving this question.
Any binary tree node which respects the BST properties will always be searchable
using BST Search Algorithm.
Consider the example you had shared.
.
Now, suppose
If you are searching for 1 => Then it will lead to success in the first hit. (Count =1)
For 2, it will search in the right subtree of 1. AT 6, no left subtree was found, hence not found.
For 6, search in the right subtree of 1. Match found! (Count =2)
Similarly for 7, search in the right subtree of 1 followed by a search in 6. Match found! (Count =3)
Now, the counter is incremented when all numbers from 0 to max(nodes) are searched in the list.
Another interesting pattern, you can see is that counter is incremented whenever node follows a BST Node property.
One of the important property is:
Root node's value is greater than all the root's left's values and less than all the root's right values.
For example, Consider node 7: it is to the right of 6 and right of 1. Hence a valid node.
With this in mind, the problem can be decomposed to Number of valid BST Nodes in a tree.
Solving this is quite straightforward. You try to use a Tree traversal from top to bottom and check if it is in increasing order. If it is not, there is no need to check its children. If it is, then add counter by 1 and check its children.

Check if a tree is a mirror image?

Given a binary tree which is huge and can not be placed in memory, how do you check if the tree is a mirror image.
I got this as an interview question
If a tree is a mirror image of another tree, the inorder traversal of one tree would be reverse of another.
So just do inorder traversal on the first tree and a reverse inorder traversal on another and check if all the elements are the same.
I can't take full credit for this reply of course; a handful of my colleagues helped with some assumptions and for poking holes in my original idea. Much thanks to them!
Assumptions
We can't have the entire tree in memory, so it's not ideal to use recursion. Let's assume, for simplicity's sake, that we can only hold a maximum of two nodes in memory.
We know n, the total number of levels in our tree.
We can perform seeks on the data with respect to the character or line position it's in.
The data that is on disk is ordered by depth. That is to say, the first entry on disk is the root, and the next two are its children, and the next four are its children's children, and so forth.
There are cases in which the data is perfectly mirrored, and cases in which it isn't. Blank data interlaced with non-blank data is considered "acceptable", unless otherwise specified.
We have freedom over using any data type we wish so long as the values can be compared for equivalence. Testing for object equivalence may not be ideal, so let's assume we're comparing primitives.
"Mirrored" means mirrored between the root's children. To use different terminologies, the grandparent's left child is mirrored with its right child, and the left child (parent)'s left child is mirrored with the grandparent's right child's right child. This is illustrated in the graph below; the matching symbols represent the mirroring we want to check for.
G
P* P*
C1& C2^ C3^ C4&
Approach
We know how many nodes on each level we should expect when we're reading from disk - some multiple of 2k. We can establish a double loop to iterate over the total depth of the tree, and the count of the nodes in each level. Inside of this, we can simply compare the outermost values for equivalence, and short-circuit if we find an unequal value.
We can determine the location of each outer location by using multiples of 2k. The leftmost child of any level will always be 2k, and the rightmost child of any level will always be 2k+1-1.
Small Proof: Outermost nodes on level 1 are 2 and 3; 21 = 2, 21+1-1 = 22-1 = 3. Outermost nodes on level 2 are 4 and 7; 22 = 4, 22+1-1 = 23-1 = 7. One could expand this all the way to the nth case.
Pseudocode
int k, i;
for(k = 1; k < n; k++) { // Skip root, trivially mirrored
for(i = 0; i < pow(2, k) / 2; i++) {
if(node_retrieve(i + pow(2, k)) != node_retrieve(pow(2, (k+1)-i)) {
return false;
}
}
}
return true;
Thoughts
This sort of question is a great interview question because, more than likely, they want to see how you would approach this problem. This approach may be horrible, it may be immaculate, but an employer would want you to take your time, draw things on a piece of paper or whiteboard, and ask them questions about how the data is stored, how it can be read, what limitations there are on seeks, etc etc.
It's not the coding aspect that interviewers are interested in, but the problem solving aspect.
Recursion is easy.
struct node {
struct node *left;
struct node *right;
int payload;
};
int is_not_mirror(struct node *one, struct node *two)
{
if (!one && !two) return 0;
if (!one) return 1;
if (!two) return 1;
if (compare(one->payload, two->payload)) return 1;
if (is_not_mirror(one->left, two->right)) return 1;
if (is_not_mirror(one->right, two->left)) return 1;
return 0;
}

Efficient algorithm to determine if an alleged binary tree contains a cycle?

One of my favorite interview questions is
In O(n) time and O(1) space, determine whether a linked list contains a cycle.
This can be done using Floyd's cycle-finding algorithm.
My question is whether it's possible to get such good time and space guarantees when trying to detect whether a binary tree contains a cycle. That is, if someone gives you a struct definition along the lines of
struct node {
node* left;
node* right;
};
How efficiently can you verify that the given structure is indeed a binary tree and not, say, a DAG or a graph containing a cycle?
Is there an algorithm that, given the root of a binary tree, can determine whether that tree contains a cycle in O(n) time and better than O(n) space? Clearly this could be done with a standard DFS or BFS, but this requires O(n) space. Can it be done in O(√n) space? O(log n) space? Or (the holy grail) in just O(1) space? I'm curious because in the case of a linked list this can be done in O(1) space, but I've never seen a correspondingly efficient algorithm for this case.
You can't even visit each node of a real, honest-to-god, cycle-free tree in O(1) space, so what you are asking for is clearly impossible. (Tricks with modifying a tree along the way are not O(1) space).
If you are willing to consider stack-based algorithms, then a regular tree walk can be easily modified along the lines of Floyd's algorithm.
it is possible to test in logarithmic space if two vertices of a graph belong to the same connected component (Reingold, Omer (2008), "Undirected connectivity in log-space", Journal of the ACM 55 (4): Article 17, 24 pages, doi:10.1145/1391289.1391291). A connected component is cyclic; hence if you can find two vertices in a graph that belong to the same connected component there is a cycle in the graph. Reingold published the algorithm 26 years after the question of its existence was first posed (see http://en.wikipedia.org/wiki/Connected_component_%28graph_theory%29). Having an O(1) space algorithm sounds unlikely given that it took 25 years to find a log-space solution. Note that picking two vertices from a graph and asking if they belong to a cycle is equivalent to asking if they belong to a connected component.
This algorithm can be extended to a log-space solution for graphs with out-degree 2 (OP: "trees"), as it is enough to check for every pair of a node and one of its immediate siblings if they belong to the same connect component, and these pairs can be enumerated in O(log n) space using standard recursive tree descent.
If you assume that the cycle points to a node at the same depth or smaller depth in the "tree", then you can do a BFS (iterative version) with two stacks, one for the turtle (x1) and one for the hare (x2 speed). At some point, the Hare's stack will be either empty (no cycle), or be a subset of the turtle's stack (a cycle was found). The time required is O(n k), and the space is O(lg n) where n is the number of used nodes, and k the time required to check the subset condition which can be upper bounded by lg(n). Note that the initial assumption about the cycle does not constraints the original problem, since it is assumed to be a tree but for a finite number of arcs that form a cycle with previous nodes; a link to a deeper node in the tree will not form a cycle but destroy the tree structure.
If it can be further assumed that the cycle points to an ancestor, then, the subset condition can be changed by checking that both stacks are equal, which is faster.
Visited Aware
You would need to redefine the structure as such (I am going to leave pointers out of this):
class node {
node left;
node right;
bool visited = false;
};
And use the following recursive algorithm (obviously re-working it to use a custom stack if your tree could grow big enough):
bool validate(node value)
{
if (value.visited)
return (value.visited = false);
value.visited = true;
if (value.left != null && !validate(value.left))
return (value.visited = false);
if (value.right != null && !validate(value.right))
return (value.visited = false);
value.visited = false;
return true;
}
Comments: It does technically have O(n) space; because of the extra field in the struct. The worst case for it would also be O(n+1) if all the values are on a single side of the tree and every value is in the cycle.
Depth Aware
When inserting into the tree you could keep track of the maximum depth:
struct node {
node left;
node right;
};
global int maximumDepth = 0;
void insert(node item) { insert(root, item, 1); }
void insert(node parent, node item, int depth)
{
if (depth > maximumDepth)
maximumDepth = depth;
// Do your insertion magic, ensuring to pass in depth + 1 to any other insert() calls.
}
bool validate(node value, int depth)
{
if (depth > maximumDepth)
return false;
if (value.left != null && !validate(value.left, depth + 1))
return false;
if (value.right != null && !validate(value.right, depth + 1))
return false;
return true;
}
Comments: The storage space is O(n+1) because we are storing the depth on the stack (as well as the maximum depth); the time is still O(n+1). This would do better on invalid trees.
Managed to get it right!
Runtime: O(n). I suspect it goes through an edge at most a constant number of times. No formal proof.
Space: O(1). Only stores a few nodes. Does not create new nodes or edges, only rearranges them.
Destructive: Yes. It flattens the tree, every node has the inorder successor as its right child, and null as the left.
The algorithm tries to flatten the binary tree by moving the whole left subtree of the current node to above it, making it the rightmost node of the subtree, then updating the current node to find further left subtrees in the newly discovered nodes. If we know both the left child and the predecessor of the current node, we can move the entire subtree in a few operations, in a similar manner to inserting a list into another. Such a move preserves the in-order sequence of the tree and it invariably makes the tree more right slanted.
There are three cases depending on the local configuration of nodes around the current one: left child is the same as the predecessor, left child is different from the predecessor, or no left subtree. The first case is trivial. The second case requires finding the predecessor, the third case requires finding a node on the right with a left subtree. Graphical representation helps in understanding them.
In the latter two cases we can run into cycles. Since we only traverse a list of the right childs, we can use Floyd's cycle detection algorithm to find and report loops. Sooner or later every cycle will be rotated into such a form.
#include <cstdio>
#include <iostream>
#include <queue>
#define null NULL
#define int32 int
using namespace std;
/**
* Binary tree node class
**/
template <class T>
class Node
{
public:
/* Public Attributes */
Node* left;
Node* right;
T value;
};
/**
* This exception is thrown when the flattener & cycle detector algorithm encounters a cycle
**/
class CycleException
{
public:
/* Public Constructors */
CycleException () {}
virtual ~CycleException () {}
};
/**
* Biny tree flattener and cycle detector class.
**/
template <class T>
class Flattener
{
public:
/* Public Constructors */
Flattener () :
root (null),
parent (null),
current (null),
top (null),
bottom (null),
turtle (null),
{}
virtual ~Flattener () {}
/* Public Methods */
/**
* This function flattens an alleged binary tree, throwing a new CycleException when encountering a cycle. Returns the root of the flattened tree.
**/
Node<T>* flatten (Node<T>* pRoot)
{
init(pRoot);
// Loop while there are left subtrees to process
while( findNodeWithLeftSubtree() ){
// We need to find the topmost and rightmost node of the subtree
findSubtree();
// Move the entire subtree above the current node
moveSubtree();
}
// There are no more left subtrees to process, we are finished, the tree does not contain cycles
return root;
}
protected:
/* Protected Methods */
void init (Node<T>* pRoot)
{
// Keep track of the root node so the tree is not lost
root = pRoot;
// Keep track of the parent of the current node since it is needed for insertions
parent = null;
// Keep track of the current node, obviously it is needed
current = root;
}
bool findNodeWithLeftSubtree ()
{
// Find a node with a left subtree using Floyd's cycle detection algorithm
turtle = parent;
while( current->left == null and current->right != null ){
if( current == turtle ){
throw new CycleException();
}
parent = current;
current = current->right;
if( current->right != null ){
parent = current;
current = current->right;
}
if( turtle != null ){
turtle = turtle->right;
}else{
turtle = root;
}
}
return current->left != null;
}
void findSubtree ()
{
// Find the topmost node
top = current->left;
// The topmost and rightmost nodes are the same
if( top->right == null ){
bottom = top;
return;
}
// The rightmost node is buried in the right subtree of topmost node. Find it using Floyd's cycle detection algorithm applied to right childs.
bottom = top->right;
turtle = top;
while( bottom->right != null ){
if( bottom == turtle ){
throw new CycleException();
}
bottom = bottom->right;
if( bottom->right != null ){
bottom = bottom->right;
}
turtle = turtle->right;
}
}
void moveSubtree ()
{
// Update root; if the current node is the root then the top is the new root
if( root == current ){
root = top;
}
// Add subtree below parent
if( parent != null ){
parent->right = top;
}
// Add current below subtree
bottom->right = current;
// Remove subtree from current
current->left = null;
// Update current; step up to process the top
current = top;
}
Node<T>* root;
Node<T>* parent;
Node<T>* current;
Node<T>* top;
Node<T>* bottom;
Node<T>* turtle;
private:
Flattener (Flattener&);
Flattener& operator = (Flattener&);
};
template <class T>
void traverseFlat (Node<T>* current)
{
while( current != null ){
cout << dec << current->value << " # 0x" << hex << reinterpret_cast<int32>(current) << endl;
current = current->right;
}
}
template <class T>
Node<T>* makeCompleteBinaryTree (int32 maxNodes)
{
Node<T>* root = new Node<T>();
queue<Node<T>*> q;
q.push(root);
int32 nodes = 1;
while( nodes < maxNodes ){
Node<T>* node = q.front();
q.pop();
node->left = new Node<T>();
q.push(node->left);
nodes++;
if( nodes < maxNodes ){
node->right = new Node<T>();
q.push(node->right);
nodes++;
}
}
return root;
}
template <class T>
void inorderLabel (Node<T>* root)
{
int32 label = 0;
inorderLabel(root, label);
}
template <class T>
void inorderLabel (Node<T>* root, int32& label)
{
if( root == null ){
return;
}
inorderLabel(root->left, label);
root->value = label++;
inorderLabel(root->right, label);
}
int32 main (int32 argc, char* argv[])
{
if(argc||argv){}
typedef Node<int32> Node;
// Make binary tree and label it in-order
Node* root = makeCompleteBinaryTree<int32>(1 << 24);
inorderLabel(root);
// Try to flatten it
try{
Flattener<int32> F;
root = F.flatten(root);
}catch(CycleException*){
cout << "Oh noes, cycle detected!" << endl;
return 0;
}
// Traverse its flattened form
// traverseFlat(root);
}
As said by Karl by definition a "Tree" is free of cycles. But still i get the spirit in which the question is asked. Why do you need fancy algorithms for detecting cycle in any graph. You can simply run a BFS or DFS and if you visit a node which is visited already it implies a cycle. This will run in O(n) time but the space complexity is also O(n), dont know if this can be reduced.
As mentioned by ChingPing, a simple DFS should do the trick. You would need to mark each node as visited(need to do some mapping from Reference of Node to Integer) and if an reentry is attempted on an already visited Node, that means there is a cycle.
This is O(n) in memory though.
At first glance you can see that this problem would be solved by a non-deterministic application of Floyd's algorithm. So what happens if we apply Floyd's in a split-and-branch way?
Well we can use Floyd's from the base node, and then add an additional Floyd's at each branch.
So for each terminal path we have an instance of Floyd's algorithm which ends there. And if a cycle ever arises, there is a turtle which MUST have a corresponding hare chasing it.
So the algorithm finishes. (and as a side effect, each terminal node is only reached by one hare/turtle pair so there are O(n) visits and thus O(n) time. (store the nodes which have been branched from, this doesn't increase the order of magnitude of memory and prevents memory blowouts in the case of cycles)
Furthermore, doing this ensures the memory footprint is the same as the number of terminal nodes. The number of terminal nodes is O(log n) but O(n) in worst case.
TL;DR: apply Floyd's and branch at each time you have a choice to make:
time: O(n)
space: O(log n)
I don't believe there exists an algorithm for walking a tree with less than O(N) space. And, for a (purported) binary tree, it takes no more space/time (in "order" terms) to detect cycles than it does to walk the tree. I believe that DFS will walk a tree in O(N) time, so probably O(N) is the limit in both measures.
Okay after further thought I believe I found a way, provided you
know the number of nodes in advance
can make modifications to the binary tree
The basic idea is to traverse the tree with Morris inorder tree traversal, and count the number of visited nodes, both in the visiting phase and individual predecessor finding phases. If any of these exceeds the number of nodes, you definitely have a cycle. If you don't have a cycle, then it is equivalent to the plain Morris traversal, and your binary tree will be restored.
I'm not sure if it is possible without knowing the number of nodes in advance though. Will think about it more.

Create Balanced Binary Search Tree from Sorted linked list

What's the best way to create a balanced binary search tree from a sorted singly linked list?
How about creating nodes bottom-up?
This solution's time complexity is O(N). Detailed explanation in my blog post:
http://www.leetcode.com/2010/11/convert-sorted-list-to-balanced-binary.html
Two traversal of the linked list is all we need. First traversal to get the length of the list (which is then passed in as the parameter n into the function), then create nodes by the list's order.
BinaryTree* sortedListToBST(ListNode *& list, int start, int end) {
if (start > end) return NULL;
// same as (start+end)/2, avoids overflow
int mid = start + (end - start) / 2;
BinaryTree *leftChild = sortedListToBST(list, start, mid-1);
BinaryTree *parent = new BinaryTree(list->data);
parent->left = leftChild;
list = list->next;
parent->right = sortedListToBST(list, mid+1, end);
return parent;
}
BinaryTree* sortedListToBST(ListNode *head, int n) {
return sortedListToBST(head, 0, n-1);
}
You can't do better than linear time, since you have to at least read all the elements of the list, so you might as well copy the list into an array (linear time) and then construct the tree efficiently in the usual way, i.e. if you had the list [9,12,18,23,24,51,84], then you'd start by making 23 the root, with children 12 and 51, then 9 and 18 become children of 12, and 24 and 84 become children of 51. Overall, should be O(n) if you do it right.
The actual algorithm, for what it's worth, is "take the middle element of the list as the root, and recursively build BSTs for the sub-lists to the left and right of the middle element and attach them below the root".
Best isn't only about asynmptopic run time. The sorted linked list has all the information needed to create the binary tree directly, and I think this is probably what they are looking for
Note that the first and third entries become children of the second, then the fourth node has chidren of the second and sixth (which has children the fifth and seventh) and so on...
in psuedo code
read three elements, make a node from them, mark as level 1, push on stack
loop
read three elemeents and make a node of them
mark as level 1
push on stack
loop while top two enties on stack have same level (n)
make node of top two entries, mark as level n + 1, push on stack
while elements remain in list
(with a bit of adjustment for when there's less than three elements left or an unbalanced tree at any point)
EDIT:
At any point, there is a left node of height N on the stack. Next step is to read one element, then read and construct another node of height N on the stack. To construct a node of height N, make and push a node of height N -1 on the stack, then read an element, make another node of height N-1 on the stack -- which is a recursive call.
Actually, this means the algorithm (even as modified) won't produce a balanced tree. If there are 2N+1 nodes, it will produce a tree with 2N-1 values on the left, and 1 on the right.
So I think #sgolodetz's answer is better, unless I can think of a way of rebalancing the tree as it's built.
Trick question!
The best way is to use the STL, and advantage yourself of the fact that the sorted associative container ADT, of which set is an implementation, demands insertion of sorted ranges have amortized linear time. Any passable set of core data structures for any language should offer a similar guarantee. For a real answer, see the quite clever solutions others have provided.
What's that? I should offer something useful?
Hum...
How about this?
The smallest possible meaningful tree in a balanced binary tree is 3 nodes.
A parent, and two children. The very first instance of such a tree is the first three elements. Child-parent-Child. Let's now imagine this as a single node. Okay, well, we no longer have a tree. But we know that the shape we want is Child-parent-Child.
Done for a moment with our imaginings, we want to keep a pointer to the parent in that initial triumvirate. But it's singly linked!
We'll want to have four pointers, which I'll call A, B, C, and D. So, we move A to 1, set B equal to A and advance it one. Set C equal to B, and advance it two. The node under B already points to its right-child-to-be. We build our initial tree. We leave B at the parent of Tree one. C is sitting at the node that will have our two minimal trees as children. Set A equal to C, and advance it one. Set D equal to A, and advance it one. We can now build our next minimal tree. D points to the root of that tree, B points to the root of the other, and C points to the... the new root from which we will hang our two minimal trees.
How about some pictures?
[A][B][-][C]
With our image of a minimal tree as a node...
[B = Tree][C][A][D][-]
And then
[Tree A][C][Tree B]
Except we have a problem. The node two after D is our next root.
[B = Tree A][C][A][D][-][Roooooot?!]
It would be a lot easier on us if we could simply maintain a pointer to it instead of to it and C. Turns out, since we know it will point to C, we can go ahead and start constructing the node in the binary tree that will hold it, and as part of this we can enter C into it as a left-node. How can we do this elegantly?
Set the pointer of the Node under C to the node Under B.
It's cheating in every sense of the word, but by using this trick, we free up B.
Alternatively, you can be sane, and actually start building out the node structure. After all, you really can't reuse the nodes from the SLL, they're probably POD structs.
So now...
[TreeA]<-[C][A][D][-][B]
[TreeA]<-[C]->[TreeB][B]
And... Wait a sec. We can use this same trick to free up C, if we just let ourselves think of it as a single node instead of a tree. Because after all, it really is just a single node.
[TreeC]<-[B][A][D][-][C]
We can further generalize our tricks.
[TreeC]<-[B][TreeD]<-[C][-]<-[D][-][A]
[TreeC]<-[B][TreeD]<-[C]->[TreeE][A]
[TreeC]<-[B]->[TreeF][A]
[TreeG]<-[A][B][C][-][D]
[TreeG]<-[A][-]<-[C][-][D]
[TreeG]<-[A][TreeH]<-[D][B][C][-]
[TreeG]<-[A][TreeH]<-[D][-]<-[C][-][B]
[TreeG]<-[A][TreeJ]<-[B][-]<-[C][-][D]
[TreeG]<-[A][TreeJ]<-[B][TreeK]<-[D][-]<-[C][-]
[TreeG]<-[A][TreeJ]<-[B][TreeK]<-[D][-]<-[C][-]
We are missing a critical step!
[TreeG]<-[A]->([TreeJ]<-[B]->([TreeK]<-[D][-]<-[C][-]))
Becomes :
[TreeG]<-[A]->[TreeL->([TreeK]<-[D][-]<-[C][-])][B]
[TreeG]<-[A]->[TreeL->([TreeK]<-[D]->[TreeM])][B]
[TreeG]<-[A]->[TreeL->[TreeN]][B]
[TreeG]<-[A]->[TreeO][B]
[TreeP]<-[B]
Obviously, the algorithm can be cleaned up considerably, but I thought it would be interesting to demonstrate how one can optimize as you go by iteratively designing your algorithm. I think this kind of process is what a good employer should be looking for more than anything.
The trick, basically, is that each time we reach the next midpoint, which we know is a parent-to-be, we know that its left subtree is already finished. The other trick is that we are done with a node once it has two children and something pointing to it, even if all of the sub-trees aren't finished. Using this, we can get what I am pretty sure is a linear time solution, as each element is touched only 4 times at most. The problem is that this relies on being given a list that will form a truly balanced binary search tree. There are, in other words, some hidden constraints that may make this solution either much harder to apply, or impossible. For example, if you have an odd number of elements, or if there are a lot of non-unique values, this starts to produce a fairly silly tree.
Considerations:
Render the element unique.
Insert a dummy element at the end if the number of nodes is odd.
Sing longingly for a more naive implementation.
Use a deque to keep the roots of completed subtrees and the midpoints in, instead of mucking around with my second trick.
This is a python implementation:
def sll_to_bbst(sll, start, end):
"""Build a balanced binary search tree from sorted linked list.
This assumes that you have a class BinarySearchTree, with properties
'l_child' and 'r_child'.
Params:
sll: sorted linked list, any data structure with 'popleft()' method,
which removes and returns the leftmost element of the list. The
easiest thing to do is to use 'collections.deque' for the sorted
list.
start: int, start index, on initial call set to 0
end: int, on initial call should be set to len(sll)
Returns:
A balanced instance of BinarySearchTree
This is a python implementation of solution found here:
http://leetcode.com/2010/11/convert-sorted-list-to-balanced-binary.html
"""
if start >= end:
return None
middle = (start + end) // 2
l_child = sll_to_bbst(sll, start, middle)
root = BinarySearchTree(sll.popleft())
root.l_child = l_child
root.r_child = sll_to_bbst(sll, middle+1, end)
return root
Instead of the sorted linked list i was asked on a sorted array (doesn't matter though logically, but yes run-time varies) to create a BST of minimal height, following is the code i could get out:
typedef struct Node{
struct Node *left;
int info;
struct Node *right;
}Node_t;
Node_t* Bin(int low, int high) {
Node_t* node = NULL;
int mid = 0;
if(low <= high) {
mid = (low+high)/2;
node = CreateNode(a[mid]);
printf("DEBUG: creating node for %d\n", a[mid]);
if(node->left == NULL) {
node->left = Bin(low, mid-1);
}
if(node->right == NULL) {
node->right = Bin(mid+1, high);
}
return node;
}//if(low <=high)
else {
return NULL;
}
}//Bin(low,high)
Node_t* CreateNode(int info) {
Node_t* node = malloc(sizeof(Node_t));
memset(node, 0, sizeof(Node_t));
node->info = info;
node->left = NULL;
node->right = NULL;
return node;
}//CreateNode(info)
// call function for an array example: 6 7 8 9 10 11 12, it gets you desired
// result
Bin(0,6);
HTH Somebody..
This is the pseudo recursive algorithm that I will suggest.
createTree(treenode *root, linknode *start, linknode *end)
{
if(start == end or start = end->next)
{
return;
}
ptrsingle=start;
ptrdouble=start;
while(ptrdouble != end and ptrdouble->next !=end)
{
ptrsignle=ptrsingle->next;
ptrdouble=ptrdouble->next->next;
}
//ptrsignle will now be at the middle element.
treenode cur_node=Allocatememory;
cur_node->data = ptrsingle->data;
if(root = null)
{
root = cur_node;
}
else
{
if(cur_node->data (less than) root->data)
root->left=cur_node
else
root->right=cur_node
}
createTree(cur_node, start, ptrSingle);
createTree(cur_node, ptrSingle, End);
}
Root = null;
The inital call will be createtree(Root, list, null);
We are doing the recursive building of the tree, but without using the intermediate array.
To get to the middle element every time we are advancing two pointers, one by one element, other by two elements. By the time the second pointer is at the end, the first pointer will be at the middle.
The running time will be o(nlogn). The extra space will be o(logn). Not an efficient solution for a real situation where you can have R-B tree which guarantees nlogn insertion. But good enough for interview.
Similar to #Stuart Golodetz and #Jake Kurzer the important thing is that the list is already sorted.
In #Stuart's answer, the array he presented is the backing data structure for the BST. The find operation for example would just need to perform index array calculations to traverse the tree. Growing the array and removing elements would be the trickier part, so I'd prefer a vector or other constant time lookup data structure.
#Jake's answer also uses this fact but unfortunately requires you to traverse the list to find each time to do a get(index) operation. But requires no additional memory usage.
Unless it was specifically mentioned by the interviewer that they wanted an object structure representation of the tree, I would use #Stuart's answer.
In a question like this you'd be given extra points for discussing the tradeoffs and all the options that you have.
Hope the detailed explanation on this post helps:
http://preparefortechinterview.blogspot.com/2013/10/planting-trees_1.html
A slightly improved implementation from #1337c0d3r in my blog.
// create a balanced BST using #len elements starting from #head & move #head forward by #len
TreeNode *sortedListToBSTHelper(ListNode *&head, int len) {
if (0 == len) return NULL;
auto left = sortedListToBSTHelper(head, len / 2);
auto root = new TreeNode(head->val);
root->left = left;
head = head->next;
root->right = sortedListToBSTHelper(head, (len - 1) / 2);
return root;
}
TreeNode *sortedListToBST(ListNode *head) {
int n = length(head);
return sortedListToBSTHelper(head, n);
}
If you know how many nodes are in the linked list, you can do it like this:
// Gives path to subtree being built. If branch[N] is false, branch
// less from the node at depth N, if true branch greater.
bool branch[max depth];
// If rem[N] is true, then for the current subtree at depth N, it's
// greater subtree has one more node than it's less subtree.
bool rem[max depth];
// Depth of root node of current subtree.
unsigned depth = 0;
// Number of nodes in current subtree.
unsigned num_sub = Number of nodes in linked list;
// The algorithm relies on a stack of nodes whose less subtree has
// been built, but whose right subtree has not yet been built. The
// stack is implemented as linked list. The nodes are linked
// together by having the "greater" handle of a node set to the
// next node in the list. "less_parent" is the handle of the first
// node in the list.
Node *less_parent = nullptr;
// h is root of current subtree, child is one of its children.
Node *h, *child;
Node *p = head of the sorted linked list of nodes;
LOOP // loop unconditionally
LOOP WHILE (num_sub > 2)
// Subtract one for root of subtree.
num_sub = num_sub - 1;
rem[depth] = !!(num_sub & 1); // true if num_sub is an odd number
branch[depth] = false;
depth = depth + 1;
num_sub = num_sub / 2;
END LOOP
IF (num_sub == 2)
// Build a subtree with two nodes, slanting to greater.
// I arbitrarily chose to always have the extra node in the
// greater subtree when there is an odd number of nodes to
// split between the two subtrees.
h = p;
p = the node after p in the linked list;
child = p;
p = the node after p in the linked list;
make h and p into a two-element AVL tree;
ELSE // num_sub == 1
// Build a subtree with one node.
h = p;
p = the next node in the linked list;
make h into a leaf node;
END IF
LOOP WHILE (depth > 0)
depth = depth - 1;
IF (not branch[depth])
// We've completed a less subtree, exit while loop.
EXIT LOOP;
END IF
// We've completed a greater subtree, so attach it to
// its parent (that is less than it). We pop the parent
// off the stack of less parents.
child = h;
h = less_parent;
less_parent = h->greater_child;
h->greater_child = child;
num_sub = 2 * (num_sub - rem[depth]) + rem[depth] + 1;
IF (num_sub & (num_sub - 1))
// num_sub is not a power of 2
h->balance_factor = 0;
ELSE
// num_sub is a power of 2
h->balance_factor = 1;
END IF
END LOOP
IF (num_sub == number of node in original linked list)
// We've completed the full tree, exit outer unconditional loop
EXIT LOOP;
END IF
// The subtree we've completed is the less subtree of the
// next node in the sequence.
child = h;
h = p;
p = the next node in the linked list;
h->less_child = child;
// Put h onto the stack of less parents.
h->greater_child = less_parent;
less_parent = h;
// Proceed to creating greater than subtree of h.
branch[depth] = true;
num_sub = num_sub + rem[depth];
depth = depth + 1;
END LOOP
// h now points to the root of the completed AVL tree.
For an encoding of this in C++, see the build member function (currently at line 361) in https://github.com/wkaras/C-plus-plus-intrusive-container-templates/blob/master/avl_tree.h . It's actually more general, a template using any forward iterator rather than specifically a linked list.

Want to save binary tree to disk for "20 questions" game

In short, I'd like to learn/develop an elegant method to save a binary tree to disk (a general tree, not necessarily a BST). Here is the description of my problem:
I'm implementing a game of "20-questions". I've written a binary tree whose internal nodes are questions and leaves are answers. The left child of a node is the path you'd follow if somebody answered "yes" to your current question, while the right child is a "no" answer. Note this is not a binary search tree, just a binary tree whose left child is "yes" and right is "no".
The program adds a node to a tree if it encounters a leaf that is null by asking the user to distinguish her answer from the one the computer was thinking of.
This is neat, because the tree builds itself up as the user plays. What's not neat is that I don't have a good way of saving the tree to disk.
I've thought about saving the tree as an array representation (for node i, left child is 2i+1, and 2i+2 right, (i-1)/2 for parent), but it's not clean and I end up with a lot of wasted space.
Any ideas for an elegant solution to saving a sparse binary tree to disk?
You can store it recursively:
void encodeState(OutputStream out,Node n) {
if(n==null) {
out.write("[null]");
} else {
out.write("{");
out.write(n.nodeDetails());
encodeState(out, n.yesNode());
encodeState(out, n.noNode());
out.write("}");
}
}
Devise your own less texty output format. I'm sure I don't need to describe the method to read the resulting output.
This is depth-first traversal. Breadth-first works too.
I would do a Level-order traversal. That is to say you are basically doing a Breadth-first search algorithm.
You have:
Create a qeueue with the root element inserted into it
Dequeue an element from the queue, call it E
Add the left and right children of E into the queue. If there is no left or right, just put a null node representation.
write node E to disk.
Repeat from step 2.
Level-order traversal sequence: F, B, G, A, D, I, C, E, H
What you will store on disk: F, B, G, A, D, NullNode, I, NullNode, NullNode, C, E, H, NullNode
Loading it back from disk is even easier. Simply read from left to right the nodes you stored to disk. This will give you each level's left and right nodes. I.e. the tree will fill in from top to bottom left to right.
Step 1 reading in:
F
Step 2 reading in:
F
B
Step 3 reading in:
F
B G
Step 4 reading in:
F
B G
A
And so on ...
Note: Once you have a NULL node representation, you no longer need to list its children to disk. When loading back you will know to skip to the next node. So for very deep trees, this solution will still be efficient.
A simple way to accomplish this is to traverse the tree outputting each element as you do so. Then to load the tree back, simply iterate through your list, inserting each element back into the tree. If your tree isn't self balancing, you may want to reorder the list in such a way that the final tree is reasonably balanced.
Not sure it's elegant, but it's simple and explainable:
Assign a unique ID to each node, whether stem or leaf. A simple counting integer will do.
When saving to disk, traverse the tree, storing each node ID, "yes" link ID, "no" link ID, and the text of the question or answer. For null links, use zero as the null value. You could either add a flag to indicate whether question or answer, or more simply, check whether both links are null. You should get something like this:
1,2,3,"Does it have wings?"
2,0,0,"a bird"
3,4,0,"Does it purr?"
4,0,0,"a cat"
Note that if you use the sequential integers approach, saving the node's ID may be redundant, as shown here. You could just put them in order by ID.
To restore from disk, read a line, then add it to the tree. You will probably need a table or array to hold forward-referenced nodes, e.g. when processing node 1, you'll need to keep track of 2 and 3 until you can fill in those values.
The most arbitrary simple way is just a basic format that can be used to represent any graph.
<parent>,<relation>,<child>
Ie:
"Is it Red", "yes", "does it have wings"
"Is it Red", "no" , "does it swim"
There isn't much redundancy here, and the formats mostly human readable, the only data duplication is that there must be a copy of a parent for every direct child it has.
The only thing you really have to watch is that you don't accidentally generate a cycle ;)
Unless that's what you want.
The problem here is rebuilding the
tree afterwards. If I create the "does
it have wings" object upon reading the
first line, I have to somehow locate
it when I later encounter the line
reading "does it have
wings","yes","Has it got a beak?"
This is why I traditionally just use graph structures in memory for such a thing with pointers going everywhere.
[0x1111111 "Is It Red" => [ 'yes' => 0xF752347 , 'no' => 0xFF6F664 ],
0xF752347 "does it have wings" => [ 'yes' => 0xFFFFFFF , 'no' => 0x2222222 ],
0xFF6F664 "does it swim" => [ 'yes' => "I Dont KNOW :( " , ... etc etc ]
Then the "child/parent" connectivity is merely metadata.
In java if you were to make a class serializeable you can just write the class object to disc and read it back using input/output streams.
I would store the tree like this:
<node identifier>
node data
[<yes child identfier>
yes child]
[<no child identifier>
no child]
<end of node identifier>
where the child nodes are just recursive instances of the above. The bits in [] are optional and the four identifiers are just constants/enum values.
Here is the C++ code using PreOrder DFS:
void SaveBinaryTreeToStream(TreeNode* root, ostringstream& oss)
{
if (!root)
{
oss << '#';
return;
}
oss << root->data;
SaveBinaryTreeToStream(root->left, oss);
SaveBinaryTreeToStream(root->right, oss);
}
TreeNode* LoadBinaryTreeFromStream(istringstream& iss)
{
if (iss.eof())
return NULL;
char c;
if ('#' == (c = iss.get()))
return NULL;
TreeNode* root = new TreeNode(c, NULL, NULL);
root->left = LoadBinaryTreeFromStream(iss);
root->right = LoadBinaryTreeFromStream(iss);
return root;
}
In main(), you can do:
ostringstream oss;
root = MakeCharTree();
PrintVTree(root);
SaveBinaryTreeToStream(root, oss);
ClearTree(root);
cout << oss.str() << endl;
istringstream iss(oss.str());
cout << iss.str() << endl;
root = LoadBinaryTreeFromStream(iss);
PrintVTree(root);
ClearTree(root);
/* Output:
A
B C
D E F
G H I
ABD#G###CEH##I##F##
ABD#G###CEH##I##F##
A
B C
D E F
G H I
*/
The DFS is easier to understand.
*********************************************************************************
But we can use level scan BFS using a queue
ostringstream SaveBinaryTreeToStream_BFS(TreeNode* root)
{
ostringstream oss;
if (!root)
return oss;
queue<TreeNode*> q;
q.push(root);
while (!q.empty())
{
TreeNode* tn = q.front(); q.pop();
if (tn)
{
q.push(tn->left);
q.push(tn->right);
oss << tn->data;
}
else
{
oss << '#';
}
}
return oss;
}
TreeNode* LoadBinaryTreeFromStream_BFS(istringstream& iss)
{
if (iss.eof())
return NULL;
TreeNode* root = new TreeNode(iss.get(), NULL, NULL);
queue<TreeNode*> q; q.push(root); // The parents from upper level
while (!iss.eof() && !q.empty())
{
TreeNode* tn = q.front(); q.pop();
char c = iss.get();
if ('#' == c)
tn->left = NULL;
else
q.push(tn->left = new TreeNode(c, NULL, NULL));
c = iss.get();
if ('#' == c)
tn->right = NULL;
else
q.push(tn->right = new TreeNode(c, NULL, NULL));
}
return root;
}
In main(), you can do:
root = MakeCharTree();
PrintVTree(root);
ostringstream oss = SaveBinaryTreeToStream_BFS(root);
ClearTree(root);
cout << oss.str() << endl;
istringstream iss(oss.str());
cout << iss.str() << endl;
root = LoadBinaryTreeFromStream_BFS(iss);
PrintVTree(root);
ClearTree(root);
/* Output:
A
B C
D E F
G H I
ABCD#EF#GHI########
ABCD#EF#GHI########
A
B C
D E F
G H I
*/

Resources