Related
A friend of mine was asked this question in an interview.
Given two binary trees, explain how would you create a diff such that if you have that diff and either of the trees you should be able to generate the other binary tree. Implement a function createDiff(Node tree1, Node tree 2) returns that diff.
Tree 1
4
/ \
3 2
/ \ / \
5 8 10 22
Tree 2
1
\
4
/ \
11 12
If you are given Tree 2 and the diff you should be able to generate Tree 1.
My solution:
Convert both the binary trees into array where left child is at 2n+1 and right child is at 2n+2and represent empty node by -1. Then just do element-wise subtraction of the array to create the diff. This solution will fail if tree has -1 as node value and I think there has to be a better and neat solution but I'm not able to figure it out.
Think of them as direcory tres and print a sorted list of the path to every leaf item
Tree 1 becomes:
4/2/10
4/2/22
4/3/5
4/3/8
These list formats can be diff'ed and the tree recreated from such a list.
There are many ways to do this.
I would suggest that you turn the tree into a sorted array of triples of (parent, child, direction). So start with tree1:
4
/ \
3 2
/ \ / \
5 8 10 22
This quickly becomes:
(None, 4, None) # top
(4, 3, L)
(3, 5, L)
(3, 8, L)
(4, 2, R)
(2, 10, L)
(2, 22, R)
Which you sort to get
(None, 4, None) # top
(2, 10, L)
(2, 22, R)
(3, 5, L)
(3, 8, L)
(4, 2, R)
(4, 3, L)
Do the same with the other, and then diff them.
Given a tree and the diff, you can first turn the tree into this form, look at the diff, realize which direction it is and get the desired representation with patch. You can then reconstruct the other tree recursively.
The reason why I would do it with this representation is that if the two trees share any subtrees in common - even if they are placed differently in the main tree - those will show up in common. And therefore you are likely to get relatively small diffs if the trees do, in fact, match in some interesting way.
Edit
Per point from #ruakh, this does assume that values do not repeat in a tree. If they do, then you could do a representation like this:
4
/ \
3 2
/ \ / \
5 8 10 22
becomes
(, 4)
(0, 3)
(00, 5)
(01, 8)
(1, 2)
(10, 10)
(11, 22)
And now if you move subtrees, they will show up as large diffs. But if you just change one node, it will still be a small diff.
(The example from the question(/interview) is not very helpful in not showing any shared sub-structure of non-trivial size. Or the interview question outstanding for initiating a dialogue between customer and developer.)
Re-use of subtrees needs a representation allowing to identify such. It seems useful to be able to reconstruct the smaller tree without walking most of the difference. Denoting "definition" of identifiable sub-trees with capital letters and re-use by a tacked-on ':
d e d--------e
c b "-" c b => C B' C' b
b a a b a a B a a
a a a
(The problem statement does not say diff is linear.)
Things to note:
there's a sub-tree B occurring in two places of T1
in T2, there's another b with one leaf-child a that is not another occurrence of B
no attempt to share leaves
What if now I imagine (or the interviewer suggests) two huge trees, identical but for one node somewhere in the middle which has a different value?
Well, at least its sub-trees will be shared, and "the other sub-trees" all the way up to the root. Too bad if the trees are degenerated and almost all nodes are part of that path.
Huge trees with children of the root exchanged?
(Detecting trees occurring more than once has a chance to shine here.)
The bigger problem would seem to be the whole trees represented in "the diff", while the requirement may be
Given one tree, the diff shall support reconstruction of the other using little space and processing.
(It might include setting up the diff shall be cheap, too - which I'd immediately challenge: small diff looks related to editing distance.)
A way to identify "crucial nodes" in each tree is needed - btilly's suggestion of "left-right-string" is good as gold.
Then, one would need a way to keep differences in children & value.
That's the far end I'd expect an exchange in an interview to reach.
To detect re-used trees, I'd add the height to each internal node. For a proof of principle, I'd probably use an existing implementation of find repeated strings on a suitable serialisation.
There are many ways to think of a workable diff-structure.
Naive solution
One naive way is to store the two trees in a tuple. Then, when you need to regenerate a tree, given the other and the diff, you just look for a node that is different when comparing the given tree with the tree in the first tuple entry of the diff. If found you return that tree from the first tuple entry. If not found, you return the second one from the diff tuple.
Small diffs for small differences
An interviewer would probably ask for a less memory consuming alternative. One could try to think of a structure that will be small in size when there are only a few values or nodes different. In the extreme case where both trees are equal, such diff would be (near-)empty as well.
Definitions
I define these terms before defining the diff's structure:
Imagine the trees get extra NIL leaf nodes, i.e. an empty tree would consist of 1 NIL node. A tree with only a root node, would have two NIL nodes as its direct children, ...etc.
A node is common to both trees when it can be reached via the same path from the root (e.g. left-left-right), irrespective of whether they contain the same value or have the same children. A node can even be common when it is a NIL node in one or both of the trees (as defined above).
Common nodes (including NIL nodes when they are common) get a preorder sequence number (0, 1, 2, ...). Nodes that are not common are discarded during this numbering.
Diff structure
The difference could be a list of tuples, where each tuple has this information:
The above mentioned preorder sequence number, identifying a common node
A value: when neither nodes is a NIL node, this is the diff of the values (e.g. XOR). When one of the nodes is a NIL node, the value is the other node object (so effectively including the whole subtree below it). In typeless languages, either information can fit in the same tuple position. In strongly typed languages, you would use an extra entry in the tuple (e.g. atomicValue, subtree), where only one of two would have a significant value.
A tuple will only be added for a common node, and only when either their values differ, and at least one of both is a not-NIL node.
Algorithm
The diff can be created via a preorder walk through the common nodes of the trees.
Here is an implementation in JavaScript:
class Node {
constructor(value, left, right) {
this.value = value;
if (left) this.left = left;
if (right) this.right = right;
}
clone() {
return new Node(this.value, this.left ? this.left.clone() : undefined,
this.right ? this.right.clone() : undefined);
}
}
// Main functions:
function createDiff(tree1, tree2) {
let i = -1; // preorder sequence number
function recur(node1, node2) {
i++;
if (!node1 !== !node2) return [[i, (node1 || node2).clone()]];
if (!node1) return [];
const result = [];
if (node1.value !== node2.value) result.push([i, node1.value ^ node2.value]);
return result.concat(recur(node1.left, node2.left), recur(node1.right, node2.right));
}
return recur(tree1, tree2);
}
function applyDiff(tree, diff) {
let i = -1; // preorder sequence number
let j = 0; // index in diff array
function recur(node) {
i++;
let diffData = j >= diff.length || diff[j][0] !== i ? 0 : diff[j++][1];
if (diffData instanceof Node) return node ? undefined : diffData.clone();
return node && new Node(node.value ^ diffData, recur(node.left), recur(node.right));
}
return recur(tree);
}
// Create sample data:
let tree1 =
new Node(4,
new Node(3,
new Node(5), new Node(8)
),
new Node(2,
new Node(10), new Node(22)
)
);
let tree2 =
new Node(2,
undefined,
new Node(4,
new Node(11), new Node(12)
)
);
// Demo:
let diff = createDiff(tree1, tree2);
console.log("Diff:");
console.log(diff);
const restoreTree2 = applyDiff(tree1, diff);
console.log("Is restored second tree equal to original?");
console.log(JSON.stringify(tree2)===JSON.stringify(restoreTree2));
const restoreTree1 = applyDiff(tree2, diff);
console.log("Is restored first tree equal to original?");
console.log(JSON.stringify(tree1)===JSON.stringify(restoreTree1));
const noDiff = createDiff(tree1, tree1);
console.log("Diff for two equal trees:");
console.log(noDiff);
I have a line. It starts with two indexes, call them 0 and 1, at the outermost points. At any point I can create a new point which bisects two other ones (there must not already be a point between them). However when this happens the indexes need to increment. For example, here's a potential series of steps to achieve N=5 since there are indexes in the result.
(graph) (split between) (iteration #)
< ============================ >
0 1 0,1 0
0 1 2 1,2 1
0 1 2 3 0,1 2
0 1 2 3 4
I have two questions:
What pseudocode could be used to find the "split between" values given the iteration number?
How could I prevent the shape from being unbalanced? Are there certain restrictions I should place on the value of N? I don't particularly care what order the splits happen in, but I do want to make sure the result is balanced.
This is an issue I've encountered when developing a video game.
I'm not sure if this is the kind of answer you are looking for, but I see this as a binary tree structure. Every tree node contains its own label and its left and right labels. The root of the tree (level 0) would be (2, 0, 1) (split 2 with 0 on the left and 1 and the right). Every node would be split into two children. The algorithm would go something like this:
At step N, pick the leftmost node without two children in level floor(log2(N - 1)).
Take the node label T and the left and right labels L and R from that node.
If the node does not have a left child, add a left child node (N, L, T).
If the node already has a left child, add a right child node (N, T, R).
N <- N + 1
For example, at iteration 5 you would have something like this:
Level 0: (2, 0, 1)
/ \
/ \
/ \
Level 1: (3, 0, 2) (4, 2, 1)
/
/
Level 2: (5, 0, 3)
Now, to reconstruct the current split, you would do the following:
Initialize a list S <- [0].
For every node (T, L, R) in the tree traversed in postorder:
If the node does not have a left child, append T to S.
If the node does not have a right child, append R to S.
For the previous case, you would have:
S = [0]
(5, 0, 3) -> S = [0, 5, 3]
(3, 0, 2) -> S = [0, 5, 3, 2]
(4, 2, 1) -> S = [0, 5, 3, 2, 4, 1]
(2, 0, 1) -> S = [0, 5, 3, 2, 4, 1]
So the complete split would be [0, 5, 3, 2, 4, 1]. The split would be perfectly balanced only when N = 2k for some positive integer k. Of course, you can annotate the tree nodes with additional "distance" information if you need to keep track of something like that.
I agree with jdehesa in that what you are doing does have its similarities with a binary tree. I would recommend looking in using that data structure if you can, since it is highly structured, well-defined, and many great algorithms exist for working with them.
Additionally, as mentioned in the comment section above, a linked list would also be a nice option, since you are adding in a lot of elements. A normal array (which is contiguous in memory) will require you to move many elements over and over again as you insert additional elements, which is slow. A linked list would allow you to add your element anywhere in memory, and then just update a few pointers in the linked list on both sides of where you want to insert it, and be done. No moving things around.
However, if you really just want to put together a working solution using array and aren't concerned with using other data structures, here is the math for the indexing you requested:
Each pair can be listed as (a, b), and we can quickly see b = a + 1. Thus, if you find a, you know b. To get these, you'll need two loops:
iteration := 0
i := 0
while iteration < desired_iterations
for j = (2 ^ i) - 1; j >= 0 && iteration < desired_iterations; j--
print j, j + 1
iteration++
i++
Where ^ is the exponentiation operator. What we do is find the second to last element in the list (2^i)-1 and count backwards, listing off the indices. We then increment "i" to signify that we've now doubled our array size, and then repeat again. If at any point we research our desired number of iterations, we break out of both loops because we're finished.
A BST is generated (by successive insertion of nodes) from each permutation of keys from the set
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}.
How many permutations determine trees of height three?
The number of permutations of nodes you have to check is 11! = 39,916,800, so you could just write a program to brute-force this. Here's a skeleton of one, written in C++:
vector<int> values = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11};
unsigned numSuccesses = 0;
do {
if (bstHeightOf(values) == 3) values++;
} while (next_permutation(values.begin(), values.end());
Here, you just need to write the bstHeightOf function, which computes the height of a BST formed by inserting the given nodes in the specified order. I'll leave this as an exercise.
You can prune down the search space a bunch by using these observations:
The maximum number of nodes in a BST of height 2 is 7.
The root can't be 1, 2, 3, 9, 10, or 11, because if it were, one subtree would have more than 7 nodes in it and therefore the overall tree would have height greater than three.
Given that you know the possible roots, one option would be to generate all BSTs with the keys {1, 2, 3, ..., 11} (not by listing off all orderings, but by listing off all trees), filter it down just to the set of nodes with height 3, and then use this recursive algorithm to count the number of ways each tree can be built by inserting values. This would probably be significantly faster than the above approach, since the number of trees to check is much lower than the number of orderings and each tree can be checked in linear time.
Hope this helps!
An alternative to templatetypdef's answer that might be more tricky but can be done completely by hand.
Consider the complete binary tree of height 3: it has 15 nodes. You're looking for trees with 11 nodes; that means that four of those 15 nodes are missing. The patterns in which these missing nodes can occur can be enumerated with fairly little effort. (Hint: I did this by dividing the patterns into two groups.) This will give you all the shapes of trees of height 3 with 11 nodes.
Once you've done this, you just need to reason about the relationship between these tree shapes and the actual trees you're looking for. (Hint: this relationship is extremely simple - don't overthink it.)
This allows you to enumerate the resulting trees that satisfy the requirements. If you get to 96, you have the same result as I do. For each of these trees, we now need to find how many permutations give rise to that tree.
This part is the tricky part; you might now need to split these trees up into smaller groups for which you know, by symmetry, that the number of permutations that gives rise to that tree is the same for all trees in a group. For example,
6
/ \
/ \
3 8
/ \ / \
2 5 7 10
/ / / \
1 4 9 11
is going to have the same number of permutations that give rise to it as
6
/ \
/ \
4 9
/ \ / \
2 5 7 11
/ \ \ /
1 3 8 10
You'll also need to find out how many trees occur in each group; the class of this example contains 16 trees. (Hint: I split them up into 7 groups of between 2 and 32 trees.) Now you'll need to find the number of permutations that give rise to such a tree, for each group. You can determine this "recursively", still on paper; for the class containing the two example trees above, I get 12096 permutations. Since that class contains 16 trees, the total number of permutations leading to such a tree is 16*12069 = 193536. Do the same for the six other classes and add the numbers up to get the total.
If any particular part of this solution has you stumped or anything is unclear, don't hesitate to ask!
Since this site is about programming, I'll provide code to determine this. We can use a backtracking algorithm, that backtracks as soon as the height constraint is violated.
We can implement the BST as a flat array, where the children of a node at index k are stored at indices 2*k and 2*k + 1. The root is at index 1. Index 0 is not used. When an index is not occupied we can store a special value there, like -1.
The algorithm is quite brute force, and on my laptop it takes about a 1.5 seconds to complete:
function insert(tree, value) {
let k = 1;
while (k < tree.length) {
if (tree[k] == -1) {
tree[k] = value;
return k;
}
k = 2*k + (value > tree[k] ? 1 : 0);
}
return -1;
}
function populate(tree, values) {
if (values.length == 0) return 1; // All values were inserted! Count this permutation
let count = 0;
for (let i = 0; i < values.length; i++) {
let value = values[i]
let node = insert(tree, value);
if (node >= 0) { // Height is OK
values.splice(i, 1); // Remove this value from remaining values
count += populate(tree, values);
values.splice(i, 0, value); // Backtrack
tree[node] = -1; // Free the node
}
}
return count;
}
function countTrees(n) {
// Create an empty tree as flat array of height 3,
// and provide n unique values to insert
return populate(Array(16).fill(-1), [...Array(n).keys()]);
}
console.log(countTrees(11));
Output: 1056000
Given a fixed number of keys or values(stored either in array or in some data structure) and order of b-tree, can we determine the sequence of inserting keys that would generate a space efficient b-tree.
To illustrate, consider b-tree of order 3. Let the keys be {1,2,3,4,5,6,7}. Inserting elements into tree in the following order
for(int i=1 ;i<8; ++i)
{
tree.push(i);
}
would create a tree like this
4
2 6
1 3 5 7
see http://en.wikipedia.org/wiki/B-tree
But inserting elements in this way
flag = true;
for(int i=1,j=7; i<8; ++i,--j)
{
if(flag)
{
tree.push(i);
flag = false;
}
else
{
tree.push(j);
flag = true;
}
}
creates a tree like this
3 5
1 2 4 6 7
where we can see there is decrease in level.
So is there a particular way to determine sequence of insertion which would reduce space consumption?
The following trick should work for most ordered search trees, assuming the data to insert are the integers 1..n.
Consider the binary representation of your integer keys - for 1..7 (with dots for zeros) that's...
Bit : 210
1 : ..1
2 : .1.
3 : .11
4 : 1..
5 : 1.1
6 : 11.
7 : 111
Bit 2 changes least often, Bit 0 changes most often. That's the opposite of what we want, so what if we reverse the order of those bits, then sort our keys in order of this bit-reversed value...
Bit : 210 Rev
4 : 1.. -> ..1 : 1
------------------
2 : .1. -> .1. : 2
6 : 11. -> .11 : 3
------------------
1 : ..1 -> 1.. : 4
5 : 1.1 -> 1.1 : 5
3 : .11 -> 11. : 6
7 : 111 -> 111 : 7
It's easiest to explain this in terms of an unbalanced binary search tree, growing by adding leaves. The first item is dead centre - it's exactly the item we want for the root. Then we add the keys for the next layer down. Finally, we add the leaf layer. At every step, the tree is as balanced as it can be, so even if you happen to be building an AVL or red-black balanced tree, the rebalancing logic should never be invoked.
[EDIT I just realised you don't need to sort the data based on those bit-reversed values in order to access the keys in that order. The trick to that is to notice that bit-reversing is its own inverse. As well as mapping keys to positions, it maps positions to keys. So if you loop through from 1..n, you can use this bit-reversed value to decide which item to insert next - for the first insert use the 4th item, for the second insert use the second item and so on. One complication - you have to round n upwards to one less than a power of two (7 is OK, but use 15 instead of 8) and you have to bounds-check the bit-reversed values. The reason is that bit-reversing can move some in-bounds positions out-of-bounds and visa versa.]
Actually, for a red-black tree some rebalancing logic will be invoked, but it should just be re-colouring nodes - not rearranging them. However, I haven't double checked, so don't rely on this claim.
For a B tree, the height of the tree grows by adding a new root. Proving this works is, therefore, a little awkward (and it may require a more careful node-splitting than a B tree normally requires) but the basic idea is the same. Although rebalancing occurs, it occurs in a balanced way because of the order of inserts.
This can be generalised for any set of known-in-advance keys because, once the keys are sorted, you can assign suitable indexes based on that sorted order.
WARNING - This isn't an efficient way to construct a perfectly balanced tree from known already-sorted data.
If you have your data already sorted, and know it's size, you can build a perfectly balanced tree in O(n) time. Here's some pseudocode...
if size is zero, return null
from the size, decide which index should be the (subtree) root
recurse for the left subtree, giving that index as the size (assuming 0 is a valid index)
take the next item to build the (subtree) root
recurse for the right subtree, giving (size - (index + 1)) as the size
add the left and right subtree results as the child pointers
return the new (subtree) root
Basically, this decides the structure of the tree based on the size and traverses that structure, building the actual nodes along the way. It shouldn't be too hard to adapt it for B Trees.
This is how I would add elements to b-tree.
Thanks to Steve314, for giving me the start with binary representation,
Given are n elements to add, in order. We have to add it to m-order b-tree. Take their indexes (1...n) and convert it to radix m. The main idea of this insertion is to insert number with highest m-radix bit currently and keep it above the lesser m-radix numbers added in the tree despite splitting of nodes.
1,2,3.. are indexes so you actually insert the numbers they point to.
For example, order-4 tree
4 8 12 highest radix bit numbers
1,2,3 5,6,7 9,10,11 13,14,15
Now depending on order median can be:
order is even -> number of keys are odd -> median is middle (mid median)
order is odd -> number of keys are even -> left median or right median
The choice of median (left/right) to be promoted will decide the order in which I should insert elements. This has to be fixed for the b-tree.
I add elements to trees in buckets. First I add bucket elements then on completion next bucket in order. Buckets can be easily created if median is known, bucket size is order m.
I take left median for promotion. Choosing bucket for insertion.
| 4 | 8 | 12 |
1,2,|3 5,6,|7 9,10,|11 13,14,|15
3 2 1 Order to insert buckets.
For left-median choice I insert buckets to the tree starting from right side, for right median choice I insert buckets from left side. Choosing left-median we insert median first, then elements to left of it first then rest of the numbers in the bucket.
Example
Bucket median first
12,
Add elements to left
11,12,
Then after all elements inserted it looks like,
| 12 |
|11 13,14,|
Then I choose the bucket left to it. And repeat the same process.
Median
12
8,11 13,14,
Add elements to left first
12
7,8,11 13,14,
Adding rest
8 | 12
7 9,10,|11 13,14,
Similarly keep adding all the numbers,
4 | 8 | 12
3 5,6,|7 9,10,|11 13,14,
At the end add numbers left out from buckets.
| 4 | 8 | 12 |
1,2,|3 5,6,|7 9,10,|11 13,14,|15
For mid-median (even order b-trees) you simply insert the median and then all the numbers in the bucket.
For right-median I add buckets from the left. For elements within the bucket I first insert median then right elements and then left elements.
Here we are adding the highest m-radix numbers, and in the process I added numbers with immediate lesser m-radix bit, making sure the highest m-radix numbers stay at top. Here I have only two levels, for more levels I repeat the same process in descending order of radix bits.
Last case is when remaining elements are of same radix-bit and there is no numbers with lesser radix-bit, then simply insert them and finish the procedure.
I would give an example for 3 levels, but it is too long to show. So please try with other parameters and tell if it works.
Unfortunately, all trees exhibit their worst case scenario running times, and require rigid balancing techniques when data is entered in increasing order like that. Binary trees quickly turn into linked lists, etc.
For typical B-Tree use cases (databases, filesystems, etc), you can typically count on your data naturally being more distributed, producing a tree more like your second example.
Though if it is really a concern, you could hash each key, guaranteeing a wider distribution of values.
for( i=1; i<8; ++i )
tree.push(hash(i));
To build a particular B-tree using Insert() as a black box, work backward. Given a nonempty B-tree, find a node with more than the minimum number of children that's as close to the leaves as possible. The root is considered to have minimum 0, so a node with the minimum number of children always exists. Delete a value from this node to be prepended to the list of Insert() calls. Work toward the leaves, merging subtrees.
For example, given the 2-3 tree
8
4 c
2 6 a e
1 3 5 7 9 b d f,
we choose 8 and do merges to obtain the predecessor
4 c
2 6 a e
1 3 5 79 b d f.
Then we choose 9.
4 c
2 6 a e
1 3 5 7 b d f
Then a.
4 c
2 6 e
1 3 5 7b d f
Then b.
4 c
2 6 e
1 3 5 7 d f
Then c.
4
2 6 e
1 3 5 7d f
Et cetera.
So is there a particular way to determine sequence of insertion which would reduce space consumption?
Edit note: since the question was quite interesting, I try to improve my answer with a bit of Haskell.
Let k be the Knuth order of the B-Tree and list a list of keys
The minimization of space consumption has a trivial solution:
-- won't use point free notation to ease haskell newbies
trivial k list = concat $ reverse $ chunksOf (k-1) $ sort list
Such algorithm will efficiently produce a time-inefficient B-Tree, unbalanced on the left but with minimal space consumption.
A lot of non trivial solutions exist that are less efficient to produce but show better lookup performance (lower height/depth). As you know, it's all about trade-offs!
A simple algorithm that minimizes both the B-Tree depth and the space consumption (but it doesn't minimize lookup performance!), is the following
-- Sort the list in increasing order and call sortByBTreeSpaceConsumption
-- with the result
smart k list = sortByBTreeSpaceConsumption k $ sort list
-- Sort list so that inserting in a B-Tree with Knuth order = k
-- will produce a B-Tree with minimal space consumption minimal depth
-- (but not best performance)
sortByBTreeSpaceConsumption :: Ord a => Int -> [a] -> [a]
sortByBTreeSpaceConsumption _ [] = []
sortByBTreeSpaceConsumption k list
| k - 1 >= numOfItems = list -- this will be a leaf
| otherwise = heads ++ tails ++ sortByBTreeSpaceConsumption k remainder
where requiredLayers = minNumberOfLayersToArrange k list
numOfItems = length list
capacityOfInnerLayers = capacityOfBTree k $ requiredLayers - 1
blockSize = capacityOfInnerLayers + 1
blocks = chunksOf blockSize balanced
heads = map last blocks
tails = concat $ map (sortByBTreeSpaceConsumption k . init) blocks
balanced = take (numOfItems - (mod numOfItems blockSize)) list
remainder = drop (numOfItems - (mod numOfItems blockSize)) list
-- Capacity of a layer n in a B-Tree with Knuth order = k
layerCapacity k 0 = k - 1
layerCapacity k n = k * layerCapacity k (n - 1)
-- Infinite list of capacities of layers in a B-Tree with Knuth order = k
capacitiesOfLayers k = map (layerCapacity k) [0..]
-- Capacity of a B-Tree with Knut order = k and l layers
capacityOfBTree k l = sum $ take l $ capacitiesOfLayers k
-- Infinite list of capacities of B-Trees with Knuth order = k
-- as the number of layers increases
capacitiesOfBTree k = map (capacityOfBTree k) [1..]
-- compute the minimum number of layers in a B-Tree of Knuth order k
-- required to store the items in list
minNumberOfLayersToArrange k list = 1 + f k
where numOfItems = length list
f = length . takeWhile (< numOfItems) . capacitiesOfBTree
With this smart function given a list = [21, 18, 16, 9, 12, 7, 6, 5, 1, 2] and a B-Tree with knuth order = 3 we should obtain [18, 5, 9, 1, 2, 6, 7, 12, 16, 21] with a resulting B-Tree like
[18, 21]
/
[5 , 9]
/ | \
[1,2] [6,7] [12, 16]
Obviously this is suboptimal from a performance point of view, but should be acceptable, since obtaining a better one (like the following) would be far more expensive (computationally and economically):
[7 , 16]
/ | \
[5,6] [9,12] [18, 21]
/
[1,2]
If you want to run it, compile the previous code in a Main.hs file and compile it with ghc after prepending
import Data.List (sort)
import Data.List.Split
import System.Environment (getArgs)
main = do
args <- getArgs
let knuthOrder = read $ head args
let keys = (map read $ tail args) :: [Int]
putStr "smart: "
putStrLn $ show $ smart knuthOrder keys
putStr "trivial: "
putStrLn $ show $ trivial knuthOrder keys
I have multiple binary trees stored as an array. In each slot is either nil (or null; pick your language) or a fixed tuple storing two numbers: the indices of the two "children". No node will have only one child -- it's either none or two.
Think of each slot as a binary node that only stores pointers to its children, and no inherent value.
Take this system of binary trees:
0 1
/ \ / \
2 3 4 5
/ \ / \
6 7 8 9
/ \
10 11
The associated array would be:
0 1 2 3 4 5 6 7 8 9 10 11
[ [2,3] , [4,5] , [6,7] , nil , nil , [8,9] , nil , [10,11] , nil , nil , nil , nil ]
I've already written simple functions to find direct parents of nodes (simply by searching from the front until there is a node that contains the child)
Furthermore, let us say that at relevant times, both all trees are anywhere between a few to a few thousand levels deep.
I'd like to find a function
P(m,n)
to find the lowest common ancestor of m and n -- to put more formally, the LCA is defined as the "lowest", or deepest node in which have m and n as descendants (children, or children of children, etc.). If there is none, a nil would be a valid return.
Some examples, given our given tree:
P( 6,11) # => 2
P( 3,10) # => 0
P( 8, 6) # => nil
P( 2,11) # => 2
The main method I've been able to find is one that uses an Euler trace, which turns the given tree (Adding node A as the invisible parent of 0 and 1, with a "value" of -1), into:
A-0-2-6-2-7-10-7-11-7-2-0-3-0-A-1-4-1-5-8-5-9-5-1-A
And from that, simply find the node between your given m and n that has the lowest number; For example, to find P(6,11), look for a 6 and an 11 on the trace. The number between them that is the lowest is 2, and that's your answer. If A (-1) is in between them, return nil.
-- Calculating P(6,11) --
A-0-2-6-2-7-10-7-11-7-2-0-3-0-A-1-4-1-5-8-5-9-5-1-A
^ ^ ^
| | |
m lowest n
Unfortunately, I do believe that finding the Euler trace of a tree that can be several thousands of levels deep is a bit machine-taxing...and because my tree is constantly being changed throughout the course of the programming, every time I wanted to find the LCA, I'd have to re-calculate the Euler trace and hold it in memory every time.
Is there a more memory efficient way, given the framework I'm using? One that maybe iterates upwards? One way I could think of would be the "count" the generation/depth of both nodes, and climb the lowest node until it matched the depth of the highest, and increment both until they find someone similar.
But that'd involve climbing up from level, say, 3025, back to 0, twice, to count the generation, and using a terribly inefficient climbing-up algorithm in the first place, and then re-climbing back up.
Are there any other better ways?
Clarifications
In the way this system is built, every child will have a number greater than their parents.
This does not guarantee that if n is in generation X, there are no nodes in generation (X-1) that are greater than n. For example:
0
/ \
/ \
/ \
1 2 6
/ \ / \ / \
2 3 9 10 7 8
/ \ / \
4 5 11 12
is a valid tree system.
Also, an artifact of the way the trees are built are that the two immediate children of the same parent will always be consecutively numbered.
Are the nodes in order like in your example where the children have a larger id than the parent? If so, you might be able to do something similar to a merge sort to find them.. for your example, the parent tree of 6 and 11 are:
6 -> 2 -> 0
11 -> 7 -> 2 -> 0
So perhaps the algorithm would be:
left = left_start
right = right_start
while left > 0 and right > 0
if left = right
return left
else if left > right
left = parent(left)
else
right = parent(right)
Which would run as:
left right
---- -----
6 11 (right -> 7)
6 7 (right -> 2)
6 2 (left -> 2)
2 2 (return 2)
Is this correct?
Maybe this will help: Dynamic LCA Queries on Trees.
Abstract:
Richard Cole, Ramesh Hariharan
We show how to maintain a data
structure on trees which allows for
the following operations, all in
worst-case constant time. 1. Insertion
of leaves and internal nodes. 2.
Deletion of leaves. 3. Deletion of
internal nodes with only one child. 4.
Determining the Least Common Ancestor
of any two nodes.
Conference: Symposium on Discrete
Algorithms - SODA 1999
I've solved your problem in Haskell. Assuming you know the roots of the forest, the solution takes time linear in the size of the forest and constant additional memory. You can find the full code at http://pastebin.com/ha4gqU0n.
The solution is recursive, and the main idea is that you can call a function on a subtree which returns one of four results:
The subtree contains neither m nor n.
The subtree contains m but not n.
The subtree contains n but not m.
The subtree contains both m and n, and the index of their least common ancestor is k.
A node without children may contain m, n, or neither, and you simply return the appropriate result.
If a node with index k has two children, you combine the results as follows:
join :: Int -> Result -> Result -> Result
join _ (HasBoth k) _ = HasBoth k
join _ _ (HasBoth k) = HasBoth k
join _ HasNeither r = r
join _ r HasNeither = r
join k HasLeft HasRight = HasBoth k
join k HasRight HasLeft = HasBoth k
After computing this result you have to check the index k of the node itself; if k is equal to m or n, you will "extend" the result of the join operation.
My code uses algebraic data types, but I've been careful to assume you need only the following operations:
Get the index of a node
Find out if a node is empty, and if not, find its two children
Since your question is language-agnostic I hope you'll be able to adapt my solution.
There are various performance tweaks you could put in. For example, if you find a root that has exactly one of the two nodes m and n, you can quit right away, because you know there's no common ancestor. Also, if you look at one subtree and it has the common ancestor, you can ignore the other subtree (that one I get for free using lazy evaluation).
Your question was primarily about how to save memory. If a linear-time solution is too slow, you'll probably need an auxiliary data structure. Space-for-time tradeoffs are the bane of our existence.
I think that you can simply loop backwards through the array, always replacing the higher of the two indices by its parent, until they are either equal or no further parent is found:
(defun lowest-common-ancestor (array node-index-1 node-index-2)
(cond ((or (null node-index-1)
(null node-index-2))
nil)
((= node-index-1 node-index-2)
node-index-1)
((< node-index-1 node-index-2)
(lowest-common-ancestor array
node-index-1
(find-parent array node-index-2)))
(t
(lowest-common-ancestor array
(find-parent array node-index-1)
node-index-2))))