Find number of leaves under each node of a tree - algorithm

I have a tree which is represented in the following format:
nodes is a list of nodes in the tree in the order of their height from top. Node at height 0 is the first element of nodes. Nodes at height 1 (read from left to right) are the next elements of nodes and so on.
n_children is a list of integers such that n_children[i] = num children of nodes[i]
For example given a tree like {1: {2, 3:{4,5,2}}}, nodes=[1,2,3,4,5,2], n_children = [2,0,3,0,0,0].
Given a Tree, is it possible to generate nodes and n_children and the number of leaves corresponding to each node in nodes by traversing the tree only once?
Is such a representation unique? Or is it possible for two different trees to have the same representation?

For the first question - creating the representation given a tree:
I am assuming by "a given tree" we mean a tree that is given in the form of node-objects, each holding its value and a list of references to its children-node-objects.
I propose this algorithm:
Start at node=root.
if node.children is empty return {values_list:[[node.value]], children_list:[[0]]}
otherwise:
3.1. construct two lists. One will be called values_list and each element there shall be a list of values. The other will be called children_list and each element there shall be a list of integers. Each element in these two lists will represent a level in the sub-tree beginning with node, including node itself (will be added at step 3.3).
So values_list[1] will become the list of values of the children-nodes of node, and values_list[2] will become the list of values of the grandchildren-nodes of node. values_list[1][0] will be the value of the leftmost child-node of node. And values_list[0] will be a list with one element alone, values_list[0][0], which will be the value of node.
3.2. for each child-node of node (for which we have references through node.children):
3.2.1. start over at (2.) with the child-node set to node, and the returned results will be assigned back (when the function returns) to child_values_list and child_children_list accordingly.
3.2.2. for each index i in the lists (they are of same length) if there is a list already in values_list[i] - concatenate child_values_list[i] to values_list[i] and concatenate child_children_list[i] to children_list[i]. Otherwise assign values_list[i]=child_values_list[i] and children_list[i]=child.children.list[i] (that would be a push - adding to the end of the list).
3.3. Make node.value the sole element of a new list and add that list to the beginning of values_list. Make node.children.length the sole element of a new list and add that list to the beginning of children_list.
3.4. return values_list and children_list
when the above returns with values_list and children_list for node=root (from step (1)), all we need to do is concatenate the elements of the lists (because they are lists, each for one specific level of the tree). After concatenating the list-elements, the resulting values_list_concatenated and children_list_concatenated will be the wanted representation.
In the algorithm above we visit a node only by starting step (2) with it set as node and we do that only once for each child of a node we visit. We start at the root-node and each node has only one parent => every node is visited exactly once.
For the number of leaves associated with each node: (if I understand correctly - the number of leaves in the sub-tree a node is its root), we can add another list that will be generated and returned: leaves_list.
In the stop-case (no children to node - step (2)) we will return leaves_list:[[1]]. In step (3.2.2) we will concatenate the list-elements like the other two lists' list-elements. And in step (3.3) we will sum the first list-element leaves_list[0] and will make that sum the sole element in a new list that we will add to the beginning of leaves_list. (something like leaves_list.add_to_eginning([leaves_list[0].sum()]))
For the second question - is this representation unique:
To prove uniqueness we actually want to show that the function (let's call it rep for "representation") preserves distinctiveness over the space of trees. i.e. that it is an injection. As you can see in the wiki linked, for that it suffices to show that there exists a function (let's call it tre for "tree") that given a representation gives a tree back, and that for every tree t it holds that tre(rep(t))=t. In simple words - that we can make a method that takes a representation and builds a tree out of it, and for every tree if we make its representation and passes that representation through that methos we'll get the exact same tree back.
So let's get cracking!
Actually the first job - creating that method (the function tre) is already done by you - by the way you explained what the representation is. But let's make it explicit:
if the lists are empty return the empty tree. Otherwise continue
make the root node with values[0] as its value and n_children[0] as its number of children (without making the children nodes yet).
initiate a list-index i=1 and a level index li=1 and level-elements index lei=root.children.length and a next-level-elements accumulator nle_acc=0
while lei>0:
4.1. for lei times:
4.1.1. make a node with values[i] as value and n_children[i] as the number of children.
4.1.2. add the new node as the leftmost child in level li that has not been filled yet (traverse the tree to the li level from the leftmost in right direction and assign the new node to the first reference that is not assigned yet. We know the previous level is done, so each node in the li-1 level has a children.length property we can check and see if each has filled the number of children they should have)
4.1.3. add nle_acc+=n_children[i]
4.1.4. increment ++i
4.2. assign lei=nle_acc (level-elements can take what the accumulator gathered for it)
4.3. clear nle_acc=0 (next-level-elements accumulator needs to accumulate from the start for the next round)
Now we need to prove that an arbitrary tree that is passed through the first algorithm and then through the second algorithm (this one here) will get out of all of that the same as it was originally.
As I'm not trying to prove the corectness of the algorithms (although I should), let's assume they do what I intended them to do. i.e. the first one writes the representation as you described it, and the second one makes a tree level-by-level, left-to-right, assigning a value and the number of children from the representation and fills the children references according to those numbers when it comes to the next level.
So each node has the right amount of children according to the representation (that's how the children were filled), and that number was written from the tree (when generating the representation). And the same is true for the values and thus it is the same tree as the original.
The proof actually should be much more elaborate and detailed - but I think I'll leave it at that now. If there will be a demand for elaboration maybe I'll make it an actual proof.

Related

Binary tree without pointers

Below is a representation of a binary tree that I use in my project. In the bottom are the leaf nodes (orange boxes), and every level is the sum of the children below.
So, 3 on the leftmost node is the sum of 1 and 2 (it's left and right children), 10 is the sum of 3 and 7 (again left and right children).
What I am trying to do is, store this tree in a flat array without using any pointers. So this array is basically an integer array, holding 2n-1 nodes (n is the number of the leaf nodes).
So the index of the root element is 0 (let's call it p), and the index of it's left child is 2p+1, index of the right child is 2p+2. Please see Binary Tree (Array implementation)
Everything works nicely if I know the number of leaf values beforehand but I can't seem to find a way to store this tree in a dynamically expanding array.
If I need to add 9 for example as the 9th element to the array, the structure needs to change and I need to recalculate all the indices again which I refrain because there may be hundreds of thousand of elements in the array at any time.
Does anyone know of an implementation that handles dynamic arrays with this implementation?
EDIT:
Below is the demonstration of what happens when I add new elements to the array. 36 was the root before, now it's a second level element and the new root array[0] is 114, which triggers a new layout.

Is there a heap or heap-like structure that works with pointers, in other words nodes not in an array?

I currently have a double-linked list of objects in descending sorted order. (The list is intrusive--pointers in the objects.) I have a very limited set of operations:
add a node with the highest possible key
remove a node with the highest possible key (doesn't matter which one)
remove a node with key 0 (doesn't matter which one)
increment key of a node with highest current key (doesn't matter which one)
decrement key of any given node whose key is above 0
Operations 1-4 will be constant time, but operation 5 is O(n), where n=number of nodes with same key value. This is because such nodes, when incremented, have to be moved past their siblings with the same key value, and placed after that range. And finding that re-insert place will be O(n).
I thought of the heap (heapsort heap, not malloc heap) as a solution where worst-case would be O(log n) (where n=number of nodes). However, based on my recollection and what Google is finding me, it seems invariably implemented in an array, as opposed to a binary tree. So:
Question: is there an implementation of a heap that uses pointers in the manner of a binary tree, as opposed to an array, that maintains O() of the typical array implementation?
One common way to do this is to use an array-based heap, but:
In the heap you store pointers to nodes;
In each node you store its index in the heap; and
Whenever you swap elements in the heap, you update the indexes in the corresponding nodes;
This preserves the complexity of all the heap operations, and costs around 1.5 pointers and 1 integer per node. (the extra .5 is because of the way growable arrays are implemented).
Alternatively, you can just link the nodes together into a tree with pointers. To support the operations you want, though, this requires 3 pointers per node (parent, left, right)
Both ways work fine, but the array implementation is simpler, faster, and uses a bit less memory.
ETA:
I should point out, though, that if you use pointers then you can use different kinds of heaps. A Fibonacci heap will let you decrement the value of a node in amortized constant time. It's kinda complicated, though, and slow in practice: https://en.wikipedia.org/wiki/Fibonacci_heap
Unfortunately the answer to the written problem isn't an answer to the headline title of the written problem.
Solution 1: amortized O(1) data structure
A solution was found with amortized O(1) implementations of all required operations.
It is simply a double-linked list of double-linked lists. The "main" double-linked list nodes are called parents, and we have at most one parent per key value. The parent nodes keep a double-linked list of child nodes with the same key value. Each child additionally points to its parent.
add a node with the highest possible value: If there is no list head or it's value is not max, add new node to head of main linked list. Otherwise, add it to tail of the head node's child list.
remove a (any) node with the highest possible value: In the case of multiple items with highest value, it doesn't matter which we remove. So, if head parent has children, remove the tail child from the child list. Otherwise, remove the parent from the main list.
remove a (any) node with value 0: Same operations.
increment value of a (any) node with the highest current value: In case of multiple nodes with same key value, we can choose any, so choose the head parent's tail child. Remove it from the child list. If incrementing its value exceeds max value then you're done. Otherwise it's a new head node. If instead there are no children, then increment the head parent in place, and if it exceeds maximum value remove it.
decrement value of any node above 0: If the node is a child, remove from child list, then either add to parent's successor's child list or as a new node after the parent. A parent with no children: if the successor in the main list still has a smaller key, you're done. Otherwise remove it and add as successor's tail child. A parent with children: same but promote the head child to take its place. This is O(n), where n=number of nodes of given size, because you must change the parent pointer for all children. However, if the odds of the node selected for decrement being the parent node of all nodes of given size are 1/n, this amortizes to O(1).
The main downside is that we logically have 7 different pointers for each node. If it's in the parent role we need previous and next parent, and head and tail child. If it's in the child role we need previous and next child, and parent. These can be unionized into two alternate substructures of 4 and 3 pointers, which saves storage, but not CPU time (excepting perhaps the need to zero out unused pointers for cleanliness). Updating them all won't be fast.
Solution 2: Sloppy is Good Enough
Another approach is simply to be sloppy. The application benefits from finding nodes with higher scores but it's not critical that they be absolutely in perfect order. So rather than an O(n) operation to move nodes potentially from one end of the chain to the other, we could accept a solution that does an O(1) albeit at times imperfect job.
This could be the current implementation of a double linked list. It can support all operations except decrement in O(1). It can handle decrement of a unique key value in O(1). Only decrement of a non-unique key value would go O(n), as we need to skip the remaining nodes with the previous key value to find the first with the same or higher key. in the worst case, we could simply cap that search at say 5 or 10 links. This too would provide a nominally O(1) solution. However, some pernicious usage patterns may slowly cause the entire list to become quite unordered.

Uniqueness of B-tree

Say that I have a sequence of key values to be inserted into a B-tree of any given order. After insertion of all the elements, I am performing a deletion operation on some of those elements. Does it always give an unique result (in the form of a B-tree) or it can it differ according to the deletion operation?
Quoted from wiki :
link:https://en.wikipedia.org/wiki/B-tree
Deletion from an internal node
Each element in an internal node acts as a separation value for two
subtrees, therefore we need to find a replacement for separation. Note
that the largest element in the left subtree is still less than the
separator. Likewise, the smallest element in the right subtree is
still greater than the separator. Both of those elements are in leaf
nodes, and either one can be the new separator for the two subtrees.
Algorithmically described below:
Choose a new separator (either the largest element in the left subtree or the smallest element in the right subtree), remove it from
the leaf node it is in, and replace the element to be deleted with the
new separator.
The previous step deleted an element (the new separator) from a leaf
node. If that leaf node is now deficient (has fewer than the required
number of nodes), then rebalance the tree starting from the leaf node.
I think according to the deletion operation it may vary because of the above lines quoted in bold letters. Am I right? help :)
If your question is whether two B-trees that contain the exact same collection of key values will always have identical nodes, then the answer is No.
Note that this is also true for e.g. simple binary trees.
However, in the case of B-trees this can be more pronounced because B-trees are optimized for minimizing page changes and thus the need to write back to slow secondary storage.

How to find nodes fast in an unordered tree

I have an unordered tree in the form of, for example:
Root
A1
A1_1
A1_1_1
A1_1_2
A1_1_2_1
A1_1_2_2
A1_1_2_3
A1_1_3
A1_1_n
A1_2
A1_3
A1_n
A2
A2_1
A2_2
A2_3
A2_n
The tree is unordered
each child can have a random N count of children
each node stores an unique long value.
the value required can be at any position.
My problem: if I need the long value of A1_1_2_3, first time I will traverse the nodes I do depth first search to get it, however: on later calls to the same node I must get its value without a recursive search. Why? If this tree would have hundreds of thousands of nodes until it reaches my A1_1_2_3 node, it would take too much time.
What I thought of, is to leave some pointers after the first traverse. E.g. for my case, when I give back the long value for A1_1_2_3 I also give back an array with information for future searches of the same node and say: to get to A1_1_2_3, I need:
first child of Root, which is A1
first child of A1, which is A1_1
second child of A1_1, which is A1_1_2
third child of A1_1_2, which is what I need: A1_1_2_3
So I figured I would store this information along with the value for A1_1_2_3 as an array of indexes: [0, 0, 1, 2]. By doing so, I could easily recreate the node on subsequent calls to the A1_1_2_3 and avoid recursion each time.
However the nodes can change. On subsequent calls, I might have a new structure, so my indexes stored earlier would not match anymore. But if this happens, I thought whnever I dont find the element anymore, I would recursively go back up a level and search for the item, and so on until I find it again and store the indexes again for future references:
e.g. if my A1_1_2_3 is now situated in this new structure:
A1_1
A1_1_0
A1_1_1
A1_1_2
A1_1_2_1
A1_1_2_2
A1_1_21_22
A1_1_2_3
... in this case the new element A1_1_0 ruined my stored structure, so I would go back up a level and search children again recursively until I find it again.
Does this even make sense, what I thought of here, or am I overcomplicating things? Im talking about an unordered tree which can have max about three hundreds of thousands of nodes, and it is vital that I can jump to nodes as fast as possible. But the tree can also be very small, under 10 nodes.
Is there a more efficient way to search in such a situation?
Thank you for any idea.
edit:
I forgot to add: what I need on subsequent calls is not just the same value, but also its position is important, because I must get the next page of children after that child (since its a tree structure, Im calling paging on nodes after the initially selected one). Hope it makes more sense now.

Difference between a LinkedList and a Binary Search Tree

What are the main differences between a Linked List and a BinarySearchTree? Is BST just a way of maintaining a LinkedList? My instructor talked about LinkedList and then BST but did't compare them or didn't say when to prefer one over another. This is probably a dumb question but I'm really confused. I would appreciate if someone can clarify this in a simple manner.
Linked List:
Item(1) -> Item(2) -> Item(3) -> Item(4) -> Item(5) -> Item(6) -> Item(7)
Binary tree:
Node(1)
/
Node(2)
/ \
/ Node(3)
RootNode(4)
\ Node(5)
\ /
Node(6)
\
Node(7)
In a linked list, the items are linked together through a single next pointer.
In a binary tree, each node can have 0, 1 or 2 subnodes, where (in case of a binary search tree) the key of the left node is lesser than the key of the node and the key of the right node is more than the node. As long as the tree is balanced, the searchpath to each item is a lot shorter than that in a linked list.
Searchpaths:
------ ------ ------
key List Tree
------ ------ ------
1 1 3
2 2 2
3 3 3
4 4 1
5 5 3
6 6 2
7 7 3
------ ------ ------
avg 4 2.43
------ ------ ------
By larger structures the average search path becomes significant smaller:
------ ------ ------
items List Tree
------ ------ ------
1 1 1
3 2 1.67
7 4 2.43
15 8 3.29
31 16 4.16
63 32 5.09
------ ------ ------
A Binary Search Tree is a binary tree in which each internal node x stores an element such that the element stored in the left subtree of x are less than or equal to x and elements stored in the right subtree of x are greater than or equal to x.
Now a Linked List consists of a sequence of nodes, each containing arbitrary values and one or two references pointing to the next and/or previous nodes.
In computer science, a binary search tree (BST) is a binary tree data structure which has the following properties:
each node (item in the tree) has a distinct value;
both the left and right subtrees must also be binary search trees;
the left subtree of a node contains only values less than the node's value;
the right subtree of a node contains only values greater than or equal to the node's value.
In computer science, a linked list is one of the fundamental data structures, and can be used to implement other data structures.
So a Binary Search tree is an abstract concept that may be implemented with a linked list or an array. While the linked list is a fundamental data structure.
I would say the MAIN difference is that a binary search tree is sorted. When you insert into a binary search tree, where those elements end up being stored in memory is a function of their value. With a linked list, elements are blindly added to the list regardless of their value.
Right away you can some trade offs:
Linked lists preserve insertion order and inserting is less expensive
Binary search trees are generally quicker to search
A linked list is a sequential number of "nodes" linked to each other, ie:
public class LinkedListNode
{
Object Data;
LinkedListNode NextNode;
}
A Binary Search Tree uses a similar node structure, but instead of linking to the next node, it links to two child nodes:
public class BSTNode
{
Object Data
BSTNode LeftNode;
BSTNode RightNode;
}
By following specific rules when adding new nodes to a BST, you can create a data structure that is very fast to traverse. Other answers here have detailed these rules, I just wanted to show at the code level the difference between node classes.
It is important to note that if you insert sorted data into a BST, you'll end up with a linked list, and you lose the advantage of using a tree.
Because of this, a linkedList is an O(N) traversal data structure, while a BST is a O(N) traversal data structure in the worst case, and a O(log N) in the best case.
They do have similarities, but the main difference is that a Binary Search Tree is designed to support efficient searching for an element, or "key".
A binary search tree, like a doubly-linked list, points to two other elements in the structure. However, when adding elements to the structure, rather than just appending them to the end of the list, the binary tree is reorganized so that elements linked to the "left" node are less than the current node and elements linked to the "right" node are greater than the current node.
In a simple implementation, the new element is compared to the first element of the structure (the root of the tree). If it's less, the "left" branch is taken, otherwise the "right" branch is examined. This continues with each node, until a branch is found to be empty; the new element fills that position.
With this simple approach, if elements are added in order, you end up with a linked list (with the same performance). Different algorithms exist for maintaining some measure of balance in the tree, by rearranging nodes. For example, AVL trees do the most work to keep the tree as balanced as possible, giving the best search times. Red-black trees don't keep the tree as balanced, resulting in slightly slower searches, but do less work on average as keys are inserted or removed.
Linked lists and BSTs don't really have much in common, except that they're both data structures that act as containers. Linked lists basically allow you to insert and remove elements efficiently at any location in the list, while maintaining the ordering of the list. This list is implemented using pointers from one element to the next (and often the previous).
A binary search tree on the other hand is a data structure of a higher abstraction (i.e. it's not specified how this is implemented internally) that allows for efficient searches (i.e. in order to find a specific element you don't have to look at all the elements.
Notice that a linked list can be thought of as a degenerated binary tree, i.e. a tree where all nodes only have one child.
It's actually pretty simple. A linked list is just a bunch of items chained together, in no particular order. You can think of it as a really skinny tree that never branches:
1 -> 2 -> 5 -> 3 -> 9 -> 12 -> |i. (that last is an ascii-art attempt at a terminating null)
A Binary Search Tree is different in 2 ways: the binary part means that each node has 2 children, not one, and the search part means that those children are arranged to speed up searches - only smaller items to the left, and only larger ones to the right:
5
/ \
3 9
/ \ \
1 2 12
9 has no left child, and 1, 2, and 12 are "leaves" - they have no branches.
Make sense?
For most "lookup" kinds of uses, a BST is better. But for just "keeping a list of things to deal with later First-In-First-Out or Last-In-First-Out" kinds of things, a linked list might work well.
The issue with a linked list is searching within it (whether for retrieval or insert).
For a single-linked list, you have to start at the head and search sequentially to find the desired element. To avoid the need to scan the whole list, you need additional references to nodes within the list, in which case, it's no longer a simple linked list.
A binary tree allows for more rapid searching and insertion by being inherently sorted and navigable.
An alternative that I've used successfully in the past is a SkipList. This provides something akin to a linked list but with extra references to allow search performance comparable to a binary tree.
A linked list is just that... a list. It's linear; each node has a reference to the next node (and the previous, if you're talking of a doubly-linked list). A tree branches---each node has a reference to various child nodes. A binary tree is a special case in which each node has only two children. Thus, in a linked list, each node has a previous node and a next node, and in a binary tree, a node has a left child, right child, and parent.
These relationships may be bi-directional or uni-directional, depending on how you need to be able to traverse the structure.
Linked List is straight Linear data with adjacent nodes connected with each other e.g. A->B->C. You can consider it as a straight fence.
BST is a hierarchical structure just like a tree with the main trunk connected to branches and those branches in-turn connected to other branches and so on. The "Binary" word here means each branch is connected to a maximum of two branches.
You use linked list to represent straight data only with each item connected to a maximum of one item; whereas you can use BST to connect an item to two items. You can use BST to represent a data such as family tree, but that'll become n-ary search tree as there can be more than two children to each person.
A binary search tree can be implemented in any fashion, it doesn't need to use a linked list.
A linked list is simply a structure which contains nodes and pointers/references to other nodes inside a node. Given the head node of a list, you may browse to any other node in a linked list. Doubly-linked lists have two pointers/references: the normal reference to the next node, but also a reference to the previous node. If the last node in a doubly-linked list references the first node in the list as the next node, and the first node references the last node as its previous node, it is said to be a circular list.
A binary search tree is a tree that splits up its input into two roughly-equal halves based on a binary search comparison algorithm. Thus, it only needs a very few searches to find an element. For instance, if you had a tree with 1-10 and you needed to search for three, first the element at the top would be checked, probably a 5 or 6. Three would be less than that, so only the first half of the tree would then be checked. If the next value is 3, you have it, otherwise, a comparison is done, etc, until either it is not found or its data is returned. Thus the tree is fast for lookup, but not nessecarily fast for insertion or deletion. These are very rough descriptions.
Linked List from wikipedia, and Binary Search Tree, also from wikipedia.
They are totally different data structures.
A linked list is a sequence of element where each element is linked to the next one, and in the case of a doubly linked list, the previous one.
A binary search tree is something totally different. It has a root node, the root node has up to two child nodes, and each child node can have up to two child notes etc etc. It is a pretty clever data structure, but it would be somewhat tedious to explain it here. Check out the Wikipedia artcle on it.

Resources