The rope data structure, redundancy on wikipedia or am I missing something? - algorithm

Why are there duplicate nodes like 9, 1 and 6 in the wikipedia article on rope?
Am I missing something, or those nodes are fully redundant?

they (the non-leaf nodes with single children) seem completely pointless. there doesn't seem to be anything in the linked to paper from boehm et al that is equivalent (there they use "normal" balanced trees).
they make no sense to me.

From the article:
Each node has a "weight" equal to the length of its string plus the sum of all the weights in its left subtree.
Those numbers appear to represent a weight of the node, based on the size of its children. So two nodes with value 6 do not have to have the same values. There is a Hello_ of weight 6 and a _Simon of weight 6.
Edit
For non-leaf values, the duplicates appear to be there to make the leaves at the same depth.

Those nodes could come about after a delete. Eventually you would want to rebalance so every node has two children (or a leaf) and the depth of each branch would be the same.

Related

What is the number of nodes at a particular level in a balanced binary search tree?

I was asked this question in a phone screen interview and I was not able to answer it. For example, in a BST, I know that the maximum number of nodes is given by 2^h (assuming the root node at height = 0)
I wanted to ask, is there a similar mathematical outcome for a balanced binary search tree as well (For AVL, Red Black trees?), i.e. the number of nodes at a particular level k.
Thanks!
A balanced binary tree starts with one node, which has two descendants. Each of those then has two descendants again. So there will be 1, 2, 4, 8 and so on nodes per level.
As a formula you can use 2^(level-1). The last row might not be completely full, so it can have less elements.
As the balancing step is costly, implementations usually do not rebalance after every mutation of the tree. They will rather apply a heuristic to find out when a rebalancing will make the most sense. So in practice, levels might have less nodes than if the tree were perfectly balanced and there might be additional levels from nodes being inserted in the wrong places.

Constructing a binary tree from a list of its edges (node pairs)

I'd like to construct a binary tree from a quite unusual input. The input contains:
Total number of nodes.
The integer label of the root.
A list of all edges (vertices/nodes that are connected to each
other). The edges in the list are UNSORTED, there is only one rule for
determining left/right children - the child in the edge that appears
first in list is always on the left. The order of child/parent in the vertices pair is also random.
I've come up with some straighforward solutions but they require multiple searches through the list of all edges (I'd basically find the 2 edges that have the labeled root in them and repeat this process for all the subtrees.)
I imagine this straightforward approach would be VERY inefficient for trees with a big amount of nodes, but I can't come up with anything else.
Any ideas for more efficient algorithms to solve this?
Here's an example for better visualization:
INPUT: 5 NODES, ROOT LABELED 2, LIST OF EDGES: [(1,0),(1,2),(2,3),(1,4)]
The tree would look like this:
2
1 3
0 4
It is important to clarify whether the given edge list is stated to be directed or not.
If edges are given in a directed fashion (i.e. it is stated that any given edge A-B also includes the information that A is a parent of B) storing the edges in an adjacency list while recording number of incoming edges for each vertex in an array should be sufficient. Once you go through the array for the incoming edges, the vertex with 0 incoming edges(i.e. parents) should be the root. Then you can run a DFS in linear time complexity to traverse the graph and put it in any data structure that is best for your needs.
If the edges given are stated to be undirected, the scheme changes a bit. In that case, you don't have the concept of incoming and outgoing edge. In that case, as no structure for the array is specified(e.g. BST, etc.) you can basically consider any node with less than 3 edges as root and run DFS as mentioned above. (all the leaves and intermediary nodes with single child nodes)
A simple solution is: "Link all the edges in the tree that it!"
Start preparing a dictionary. If nodes don't exist by the start and end point, create them nodes. As it is random in nature, you can set their left and right pointers to NULL initially.
You have rule - " the child in the edge that appears first in list is always on the left.". So create child accordingly.
Also, you already know the root of the tree so you can iterate across the nodes you have constructed so far.
Through this you can generate tree in one shot.
Hope this helps!

How to save the memory when storing color information in Red-Black Trees?

I've bumped into this question at one of Coursera algorithms course and realized that I have no idea how to do that. But still, I have some thoughts about it. The first thing that comes into my mind was using optimized bit set (like Java's BitSet) to get mapping node's key -> color. So, all we need is to allocate a one bit set for whole tree and use it as color information source. If there is no duplicatate elements in the tree - it should work.
Would be happy to see other's ideas about this task.
Just modify the BST. For black node, do nothing. And for red node, exchange its left child and right child. In this case, a node can be justified red or black according to if its right child is larger than its left child.
Use the least significant bit of one of the pointers in the node to store the color information. The node pointers should contain even addresses on most platforms. See details here.
There's 2 rules we can use:
since the root node is always black, then a red node will always have a parent node.
RB BST is always with the order that left_child < parent < right_child
Then we will do this:
keep the black node unchanged.
for the red node, we call it as R, we suppose it as the left child node for it's parent node, called P.
change the red node value from R to R', while R' = P + P - R
now that R' > P, but as it's the left child tree, we will find the order mismatch.
If we find an order mismatch, then we will know it's a red node.
and it's easy to go back to the original R = P + P - R'
One option is to use a tree that requires less bookkeeping, e.g. a splay tree. However, splay trees in particular aren't very good for iteration (they're much better at random lookup), so they may not be a good fit for the domain you're working in.
You can also use one BitSet for the entire red-black tree based on node position, e.g. the root is the 0th bit, the root's left branch is the 1st bit, the right branch is the 2nd bit, the left branch's left branch is the 3rd bit, etc; this way it shouldn't matter if there are duplicate elements. While traversing the tree make note of which bit position you're at.
It's much more efficient in terms of space to use one bitset for the tree instead of assigning a boolean to each node; each boolean will take up at least a byte and may take up a word depending on alignment, whereas the bitset will only take up one bit per node (plus 2x bits to account for a maximally unbalanced tree where the shortest branch is half the length of the longest branch).
Instead of using boolean property on a child we could define a red node as the one who has a child in the wrong place.
If we go this way all leaf nodes are guaranteed to to be black and we should swap parent with his sibling (making him red) when inserting a new node.

What defines left and right in graphs and trees?

I'm learning about tree traversals and I can't seem to find any clear rules for how DFS or BFS algorithms decide which path to take first. I've seen variations of left first or least first.
Is left taken as being first child in the list?
Does this mean that (for a given node) the depth of a vertex in a graph that is part of a cycle is taken using the leftward path?
Also doesn't using a 'least first' rule make for a slower algorithm?
Thanks
Left is only meaningful for trees where the child nodes are oldered. Otherwise usually the author refers to first in the list of child nodes. Depth of a vertex is also not well defined in a graph that is not a tree, but if you refer to depth with respect to a given node that would usually be the shortest distance from the starting node.
I am not sure what does least first mean but if it refers to the key values of nodes and there is no ordering in the child nodes, finding the least will take more time of course.
Hope this helps.

IOI 2003 : how to calculate the node that has the minimum balance in a tree?

here is the Balancing Act problem that demands to find the node that has the minimum balance in a tree. Balance is defined as :
Deleting any node
from the tree yields a forest : a collection of one or more trees. Define the balance of a node to be the size of the largest tree in the forest T created by deleting that node from T
For the sample tree like :
2 6 1 2 1 4 4 5 3 7 3 1
Explanation is :
Deleting node 4 yields two trees whose member nodes are {5} and {1,2,3,6,7}. The
larger of these two trees has five nodes, thus the balance of node 4 is five. Deleting node
1 yields a forest of three trees of equal size: {2,6}, {3,7}, and {4,5}. Each of these trees
has two nodes, so the balance of node 1 is two.
What kind of algorithm can you offer to this problem?
Thanks
I am going to assume that you have had a looong look at this problem: reading the solution does not help, you only get better at solving these problems by solving them yourself.
So one thing to observe is, the input is a tree. That means that each edge joins 2 smaller trees together. Removing an edge yields 2 disconnected trees (a forest of 2 trees).
So, if you calculate the size of the tree on one side of the edge, and then on the other, you should be able to look at a node's edges and ask "What is the size of the tree on the other side of this edge?"
You can calculate the sizes of trees using dynamic programming - your recurrence state is "What edge am I on? What side of the edge am I on?" and it calculates the size of the tree "hung" at that node. That is the crux of the problem.
Having that data, it is sufficient to iterate through all the nodes, look at their edges and ask "What is the size of the tree on the other side of this edge?" From there, you just pick the minimum.
Hope that helps.
You basically want to check 3 things for every node:
The size of its left subtree.
The size of its right subtree.
The size of the rest of the tree. (size of tree - left - right)
You can use this algorithm and expand it to any kind of tree (different number of subnodes).
Go over the tree in an in-order sequence.
Do this recursively:
Every time you just before you back up from a node to the "father" node, you need to add 1+size of node's total sub trees, to the "father" node.
Then store a value, let's call it maxTree, in the node that holds the maximum between all its subtrees, and the (sum of all subtrees)-(size of tree).
This way you can calculate all the subtree sizes in O(N).
While traversing the tree, you can hold a variable that hold the minimum value found so far.

Resources