convert a tree into a heap using minimum number of changes - algorithm

Given a k-ary tree, i want to convert it into a min-heap with minimum number of changes. Change is defined as relabelling a node.
one solution i have found is that, i can try a dp solution of changing a nodes value or not changing. But its going to be exponential in time complexity ?
Any ideas, (preferable with optimality proofs).
Example : Say the tree is, 1-3, 3-2, 1-4, 4-5. where 1 is root. Then i can relabel node 3 to 1 or 2, that is in 1 change it becomes a min-heap.

If all you want to do is make sure that the tree satisfies the heap property (the key stored in each node is less than or equal to the keys stored in the node's children), then you should be able to use something like the build-heap algorithm, which operates in O(n).
Consider this tree:
8
-------------
| | |
15 6 19
/ \ | / | \
7 3 5 12 9 22
Now, working from the bottom up, you push each node down the tree as far as it can go. That is, if the node is larger than any of its children, you replace it with the smallest of its children, and you do so until you reach the leaf level, if necessary.
For example, you look at the node valued 15. It's larger than its smallest child, so you swap it, making the subtree:
3
/ \
7 15
Also, 6 swaps places with 5, and 19 swaps places with 9, giving you this tree:
8
-------------
| | |
3 5 9
/ \ | / | \
7 15 6 12 19 22
Note that at the next to leaf level, each node is smaller than its smallest child.
Now, the root. Since the rule is to swap the node with its smallest child, you swap 8 with 3, giving:
3
-------------
| | |
8 5 9
/ \ | / | \
7 15 6 12 19 22
But you're not done because 8 is greater than 7. You swap 8 with 7, and you get this tree, which meets your conditions:
3
-------------
| | |
7 5 9
/ \ | / | \
8 15 6 12 19 22
If the tree is balanced, the entire procedure has complexity O(n). If the tree is severely unbalanced, the complexity is O(n^2). There is a way to guarantee O(n), regardless of the tree's initial order, but it requires changing the shape of the tree.
I won't claim that the algorithm guarantees the "minimal number of changes" for any given tree. I can prove, however, that with a balanced tree the algorithm is O(n). See https://stackoverflow.com/a/9755805/56778, which explains it for binary heap. The explanation also applies to d-ary heap.

Related

binary tree compaction of same subtree

Given a tree, find the common subtrees and replace the common subtrees and compact the tree.
e.g.
1
/ \
2 3
/ | /\
4 5 4 5
should be converted to
1
/ \
2 3
/ | /\
4 5 | |
^ ^ | |
|__|___| |
|_____|
this was asked in my interview. The approach i shared was not optimal O(n^2), i would be grateful if someone could help in solutioning or redirect me to a similar problem. I couldn't find any. Thenks!
edit- more complex eg:
1
/ \
2 3
/ | /\
4 5 2 7
/\
4 5
whole subtree rooted at 2 should be replaced.
1
/ \
2 <--3
/ | \
4 5 7
You can do this in a single DFS traversal using a hash map from (value, left_pointer, right_pointer) -> node to collapse repeated occurrences of the subtree.
As you leave each node in your DFS, you just look it up in the map. If a matching node already exists, then replace it with the pre-existing one. Otherwise, add it to the map.
This takes O(n) time, because you are comparing the actual pointers to the left + right subtrees, instead of traversing the trees to compare them. The pointer comparison gives the same result, because the left and right subtrees have already been canonicalized.
Firstly, we need to store the node values that appear in a hash table. If the tree already exists, we can iterate the tree and if a node value is already in the set of nodes and delete the branches of that node. Otherwise, store the values in a hash map and each time, when a new node is made, check if the value appears in the map.

What is the point of choosing closest node in Dijkstra algorithm?

In all articles which I read, neighbor to process first is "closest" neighbor. But finally it's needed to visit all nodes to figure out all possible paths. So, the question is - why do we do this? I believe the same result can be achieved if we simply traverse Graph in BFS way and will perform calculation of costs. For example:
first step- 0, costs table:
2 - 6 |
3 - 2 |
second step- 2, costs table:
2 - 6 |
3 - 2 |
1 - 9 |
third step- 3, costs table:
2 - 6 |
3 - 2 |
1 - 9 |
4 - 12 |
forth step- 1, costs table:
2 - 6 |
3 - 2 |
1 - 9 |
4 - 12 |
5 - 12 |
fifth step- 4, costs table:
2 - 6 |
3 - 2 |
1 - 9 |
4 - 12 |
5 - 12 |
With simple BFS traversing the cheapest way was find out. What do I missing?
Suppose the path from A to B and B to C are both cost 1, and the direct route from A to C is cost 3. (In the real world, the first two are highways that bypass a mountain while the third is a tiny trail over a mountain pass.)
Dijkstra will route you A -> B -> C for a total cost of 2 while BFS will route you A -> C for a total cost of 3.
Therefore you have to process lowest cost first to get the right answer.
At each step, Dijkstra's algorithm extends the lowest-cost path known so far. Thus, when you finally encounter the goal state, you know that all other, unfinished paths have a greater cost. Therefore, the one you just found is the shortest path.

For Ternary Huffman problem, can we make a tree (or encoding scheme) for "4" characters?

For Ternary Huffman problem, can we make a tree (or encoding scheme) for "4" characters?"
Say I have 4 characters with these frequencies:
freq(a)=5 freq(b)=3 freq(c)=2 freq(d)=2
How will I encode them in the form of 0,1,2 such that no code word is a prefix of another code word?
The standard algorithm for generating the optimal ternary Huffman code (as alluded to by rici) involves first making sure there are an odd number of symbols -- by adding a dummy symbol (of frequency 0) if necessary.
In this case, we start with an even number of symbols, so we need to add the dummy symbol that I call Z:
freq(a)=5 freq(b)=3 freq(c)=2 freq(d)=2 freq(Z)=0.
Then as Photon described, we repeatedly combine the 3 nodes with the lowest frequencies into 1 combined symbol. Each time we replace 3 nodes with 1 node, we reduce the total number of nodes by 2, and so the total number of nodes remains odd at each step. In the last step (if we've added the correct number of dummy symbols) we will combine 3 final nodes into a single root node.
abcdZ:12
/ | \
2/ 1| 0\
cdZ:4 b:3 a:5
/ | \
2/ 1| 0\
Z:0 d:2 c:2
So in this case one optimal (Huffman) ternary coding is:
a: 0
b: 1
c: 20
d: 21
Z: 22 (should never occur).
See
https://en.wikipedia.org/wiki/Huffman_coding#n-ary_Huffman_coding
for more details.
Well for classical huffman you just keep merging 2 lowest frequency nodes at a time to build a tree, when assign 1 to left (or right) edge and 0 to other edge and dfs path to some node is that nodes code.
i.e.
So in this case coding is:
a - 1
b - 01
c - 001
d - 000
On ternary huffman you just join nodes 3 lowest frequencies at a time (and less nodes if not enough nodes for last step)
i.e.
So in this case coding is:
a - 2
b - 12
c - 11
d - 10

Why does pairing heap need that special two passes when delete_min?

I am reading the Pairing heap.
It is quite simple, the only tricky part is the delete_min operation.
The only non-trivial fundamental operation is the deletion of the
minimum element from the heap. The standard strategy first merges the
subheaps in pairs (this is the step that gave this datastructure its
name) from left to right and then merges the resulting list of heaps
from right to left:
I don't think I need copy/paste the code here, as it is in the wiki link.
My questions are
why they do this two pass merging?
Why they first merge pairs? not directly merge them all?
also why after merging pairs, merge specifically from right to left?
With pairing heap, adding an item to the heap is an O(1) operation because all it does is add the node either as the new root (if it's smaller than the current root), or as the first child of the current root. So if you created a pairing heap and added the numbers 0 through 9 to it, in order, you would end up with:
0
|
-----------------
| | | | | | | | |
9 8 7 6 5 4 3 2 1
If you then do a delete-min, you then have to look at each child to determine the minimum item and build the new heap. If you use the naive left to right combining method, you end up with this tree:
1
|
---------------
| | | | | | | |
9 8 7 6 5 4 3 2
And the next time you do a delete-min you have to look at the 8 remaining children, etc. Using this technique, creating and then removing all items from the heap would be an O(n^2) operation.
The two-pass method of combining in pairs and then combining the pairs results in a much more efficient structure. Consider the first case. After deleting the minimum item, we're left with the nine children. They're combined in pairs from left to right to produce:
8 6 4 2 1
/ / / /
9 7 5 3
Then we combine the the pairs right to left. In steps:
8 6 4 1
/ / / /
9 7 5 2
/
3
8 6 1
/ / / \
9 7 2 4
/ /
3 5
8 1
/ |
9 ---------
6 4 2
/ / /
7 5 3
1
|
----------
8 6 4 2
/ / / /
9 7 5 3
Now, the next time we call delete-min, there are only four nodes to check, and the next time after that there will only be two. Using the two-pass combining method reduces the number of nodes at the child level by at least half. The arrangement I showed is the worst case. If the items were in ascending order, the first delete-min operation would result in a tree with only two child nodes below the root.
This is a particularly good example of the amortized complexity of pairing heap. insert is O(1), but the first delete-min after a bunch of insert operations is O(n), where n is the number of items that were inserted since the last delete-min. The beauty of the two-pass combining rule is that it quickly reorganizes the heap to reduce that O(n) complexity.
With this combining rule, the amortized complexity of delete-min is O(log n). With the strict left-to-right rule, it's O(n).

Balanced Binary Search Tree for numbers

I wanted to draw a balanced binary search tree for numbers from 1 to 20.
_______10_______
/ \
___5___ 15
/ \ / \
3 8 13 18
/ \ / \ / \ / \
2 4 7 9 12 14 17 19
/ / / /
1 6 11 16
Is the above tree correct and balanced?
In answer to your original question as to whether or not you need to first calculate the height, no, you don't need to. You just have to understand that a balanced tree is one where the height difference between the tallest and shortest node is zero or one, and the simplest way to achieve this is to ensure that you always pick the midpoint of the possible list, when populating the top node in a sub-tree.
Your sample tree is balanced since all leaf nodes are either at the bottom or next-to-bottom level, hence the difference in heights between any two leaf nodes is at most one.
To create a balanced tree from the numbers 1 through 20 inclusive, you can just make the root entry 10 or 11 (the midpoint being 10.5 for those numbers), so that there's an equal quantity of numbers in either sub-tree.
Then just do that recursively for each sub-tree. On the lower side of 10, 5 is the midpoint:
10
/ \
5 11-thru-19 sub-tree
/ \
1-thru-4 6-thru-9
sub-tree sub-tree
Just expand on that and you'll end up with something like:
_______10_______
/ \
___5___ 15
/ \ / \
2 7 13 17
/ \ / \ / / \
1 3 6 8 11 16 18 <- depth of highest leaf node
\ \ \ \
4 9 12 19 <- depth of lowest leaf node
^
|
Difference is 1
The midpoint can be found at the number where the difference between quantities above and below that numbers is one or zero. For the whole list of numbers 1 through 20 inclusive, there are nine less than 10 and ten greater than 10 (or, if you chose 11 as the midpoint, the quantities are ten and nine).
The difference between your sample and mine is probably to do with the fact that I preferred to pick the midpoint by rounding down where there was a choice (meaning my right sub-trees tend to be "heavier"). Because your left sub-trees are heavier, you appear to have rounded up.
After choosing 10 as the initial midpoint, there's no leeway on the left sub-tree, you have to choose 5 since it has four above and below it. Any other midpoint would result in a difference of at least two between the two halves (for example, choosing 4 as the midpoint would have the two halves of size three and five). This can still give you a balanced sub-tree depending on the data but it's "safer" to choose the midpoint.

Resources