Why does pairing heap need that special two passes when delete_min? - algorithm

I am reading the Pairing heap.
It is quite simple, the only tricky part is the delete_min operation.
The only non-trivial fundamental operation is the deletion of the
minimum element from the heap. The standard strategy first merges the
subheaps in pairs (this is the step that gave this datastructure its
name) from left to right and then merges the resulting list of heaps
from right to left:
I don't think I need copy/paste the code here, as it is in the wiki link.
My questions are
why they do this two pass merging?
Why they first merge pairs? not directly merge them all?
also why after merging pairs, merge specifically from right to left?

With pairing heap, adding an item to the heap is an O(1) operation because all it does is add the node either as the new root (if it's smaller than the current root), or as the first child of the current root. So if you created a pairing heap and added the numbers 0 through 9 to it, in order, you would end up with:
0
|
-----------------
| | | | | | | | |
9 8 7 6 5 4 3 2 1
If you then do a delete-min, you then have to look at each child to determine the minimum item and build the new heap. If you use the naive left to right combining method, you end up with this tree:
1
|
---------------
| | | | | | | |
9 8 7 6 5 4 3 2
And the next time you do a delete-min you have to look at the 8 remaining children, etc. Using this technique, creating and then removing all items from the heap would be an O(n^2) operation.
The two-pass method of combining in pairs and then combining the pairs results in a much more efficient structure. Consider the first case. After deleting the minimum item, we're left with the nine children. They're combined in pairs from left to right to produce:
8 6 4 2 1
/ / / /
9 7 5 3
Then we combine the the pairs right to left. In steps:
8 6 4 1
/ / / /
9 7 5 2
/
3
8 6 1
/ / / \
9 7 2 4
/ /
3 5
8 1
/ |
9 ---------
6 4 2
/ / /
7 5 3
1
|
----------
8 6 4 2
/ / / /
9 7 5 3
Now, the next time we call delete-min, there are only four nodes to check, and the next time after that there will only be two. Using the two-pass combining method reduces the number of nodes at the child level by at least half. The arrangement I showed is the worst case. If the items were in ascending order, the first delete-min operation would result in a tree with only two child nodes below the root.
This is a particularly good example of the amortized complexity of pairing heap. insert is O(1), but the first delete-min after a bunch of insert operations is O(n), where n is the number of items that were inserted since the last delete-min. The beauty of the two-pass combining rule is that it quickly reorganizes the heap to reduce that O(n) complexity.
With this combining rule, the amortized complexity of delete-min is O(log n). With the strict left-to-right rule, it's O(n).

Related

binary tree compaction of same subtree

Given a tree, find the common subtrees and replace the common subtrees and compact the tree.
e.g.
1
/ \
2 3
/ | /\
4 5 4 5
should be converted to
1
/ \
2 3
/ | /\
4 5 | |
^ ^ | |
|__|___| |
|_____|
this was asked in my interview. The approach i shared was not optimal O(n^2), i would be grateful if someone could help in solutioning or redirect me to a similar problem. I couldn't find any. Thenks!
edit- more complex eg:
1
/ \
2 3
/ | /\
4 5 2 7
/\
4 5
whole subtree rooted at 2 should be replaced.
1
/ \
2 <--3
/ | \
4 5 7
You can do this in a single DFS traversal using a hash map from (value, left_pointer, right_pointer) -> node to collapse repeated occurrences of the subtree.
As you leave each node in your DFS, you just look it up in the map. If a matching node already exists, then replace it with the pre-existing one. Otherwise, add it to the map.
This takes O(n) time, because you are comparing the actual pointers to the left + right subtrees, instead of traversing the trees to compare them. The pointer comparison gives the same result, because the left and right subtrees have already been canonicalized.
Firstly, we need to store the node values that appear in a hash table. If the tree already exists, we can iterate the tree and if a node value is already in the set of nodes and delete the branches of that node. Otherwise, store the values in a hash map and each time, when a new node is made, check if the value appears in the map.

Modified algorithm for building a Heap

I am quite new to programming and I am trying to understand a certain problem regarding heap sort. In a book I'm reading, there is a modified algorithm for building a max heap, which is:
BuildHeap(A)
A.heap-size = 1
for i = 2 to A.length
Heap-Insert(A, A[i])
So from my understanding, this algorithm takes in an array and defines the size of the heap to be 1 and then iterates from 2 to the total length of the array and then inserts the value into the heap.
But how would this build a max heap? If I had an array of [4, 7, 2, 3, 9, 1], then wouldn't the algorithm start at value 2 and then simply add all the values from the A[2] to A.length to the heap without actually building a max heap?
I do not understand how the heap-size = 1 does anything in the algorithm other than restrict the total size of the heap. I am confused as to how you would build a max heap.
From what it states in the book, the normal max heap works by first inserting every array value into a heap, and then starting at the A/2 place, then working backwards and swapping values that are larger than the current value being assessed by calling Max-Heapify.
So how would this max heap work since there is no Max-Heapify(A, largest) call, but instead there is simply a heap-insert(A, A[i])?
First of all, this question is not about heap sort, which is just one of the applications for a heap. You are asking about the heap construction.
The pseudo code you presented is indeed an alternative (and less efficient) way of building a heap, and this would actually be the algorithm that many would come up with when they wouldn't have known about the standard algorithm of Floyd.
So taking a look at the code:
BuildHeap(A)
A.heap-size = 1
for i = 2 to A.length
Heap-Insert(A, A[i])
Most of the logic of this algorithm is berried inside the Heap-Insert function, which is not just a simple "append" to an array: it does much more than that. Wikipedia describes that hidden algorithm as follows:
Add the element to the bottom level of the heap at the leftmost open space.
Compare the added element with its parent; if they are in the correct order, stop.
If not, swap the element with its parent and return to the previous step.
You write in your question:
there is no Max-Heapify(A, largest)
Indeed, it would be too simple if you already knew what the largest value was before using the heap. You need to first insert a value (any value) in a heap, and let the heap do its magic (inside Heap-Insert) to make sure that the largest value ends up in the first (top) position in the array A, i.e. in A[1].
The first step of the quoted algorithm is thus important: Heap-Insert expects the new value to be inserted at the end.
Let's work through the example [4, 7, 2, 3, 9, 1], and let's put a pipe symbol to indicate the end of the heap. At the start, the heap size is 1, so we have:
4 | 7 2 3 9 1
Let's also represent a more visually appealing binary tree at the right side -- it just has a root element:
4 | 7 2 3 9 1 4
Then we call Heap-Insert(A, A[2]), which is Heap-Insert(A, 7). The implementation of Heap-Insert will increase the size of the heap, and put that value in the last slot, so we get:
4 7 | 2 3 9 1 4
/
7
Heap-Insert has not finished yet -- this was just the first step it performs. Now it "bubbles up" that 7 following steps 2 and 3 of that quoted algorithm, and so we get:
7 4 | 2 3 9 1 7
/
4
At the second iteration of the pseudo code loop, we call Heap-Insert(A, 2), so Heap-Insert performs its first step:
7 4 2 | 3 9 1 7
/ \
4 2
...and finds out that nothing needs to change when performing step 2 and 3.
We continue inserting 3:
7 4 2 3 | 9 1 7
/ \
4 2
/
3
...and again nothing needs to change as 3 is less than 4 (remember that A[2] is the parent of A[4].
We continue inserting 9:
7 4 2 3 9 | 1 7
/ \
4 2
/ \
3 9
And here 9 > 4, and also 9 > 7, so Heap-Insert will further modify A to this:
9 7 2 3 4 | 1 9
/ \
7 2
/ \
3 4
One more to go:
9 7 2 3 4 1 9
/ \
7 2
/ \ /
3 4 1
And Heap-Insert has nothing more to do as 1 < 2.

convert a tree into a heap using minimum number of changes

Given a k-ary tree, i want to convert it into a min-heap with minimum number of changes. Change is defined as relabelling a node.
one solution i have found is that, i can try a dp solution of changing a nodes value or not changing. But its going to be exponential in time complexity ?
Any ideas, (preferable with optimality proofs).
Example : Say the tree is, 1-3, 3-2, 1-4, 4-5. where 1 is root. Then i can relabel node 3 to 1 or 2, that is in 1 change it becomes a min-heap.
If all you want to do is make sure that the tree satisfies the heap property (the key stored in each node is less than or equal to the keys stored in the node's children), then you should be able to use something like the build-heap algorithm, which operates in O(n).
Consider this tree:
8
-------------
| | |
15 6 19
/ \ | / | \
7 3 5 12 9 22
Now, working from the bottom up, you push each node down the tree as far as it can go. That is, if the node is larger than any of its children, you replace it with the smallest of its children, and you do so until you reach the leaf level, if necessary.
For example, you look at the node valued 15. It's larger than its smallest child, so you swap it, making the subtree:
3
/ \
7 15
Also, 6 swaps places with 5, and 19 swaps places with 9, giving you this tree:
8
-------------
| | |
3 5 9
/ \ | / | \
7 15 6 12 19 22
Note that at the next to leaf level, each node is smaller than its smallest child.
Now, the root. Since the rule is to swap the node with its smallest child, you swap 8 with 3, giving:
3
-------------
| | |
8 5 9
/ \ | / | \
7 15 6 12 19 22
But you're not done because 8 is greater than 7. You swap 8 with 7, and you get this tree, which meets your conditions:
3
-------------
| | |
7 5 9
/ \ | / | \
8 15 6 12 19 22
If the tree is balanced, the entire procedure has complexity O(n). If the tree is severely unbalanced, the complexity is O(n^2). There is a way to guarantee O(n), regardless of the tree's initial order, but it requires changing the shape of the tree.
I won't claim that the algorithm guarantees the "minimal number of changes" for any given tree. I can prove, however, that with a balanced tree the algorithm is O(n). See https://stackoverflow.com/a/9755805/56778, which explains it for binary heap. The explanation also applies to d-ary heap.

Which sorting algorithm produces these steps?

This was a multiple-choice question in an exam today, and (at least) one of the answers should be true, but to me they all look wrong.
The sorting steps are:
5 2 6 1 3 4
4 2 6 1 3 5
4 2 5 1 3 6
4 2 3 1 5 6
1 2 3 4 5 6
The available answers were: Bubble Sort, Insertion Sort, Selection Sort, Merge Sort and Quick Sort.
I think that is a Quick sort. Here we can see the following steps:
A random selection of the reference element in the array (pivotValue), with respect to which reorders the elements of the array.
Move all of the values that are larger than the reference to the right, and all the values that the lower support left
Repeat algorithm for unsorted the left and right side of the array, while each element will not appear on its position
Why I think so:
It definitely isn't a Bubble Sort because it compares the first two elements of the array beginning so, the first step should be 2 5 6 1 3 4
It isn't a Insertion Sort because it's a sequential algorithm. In the first step we see that compared the first and the last element
It isn't a Selection Sort because it find the lowest value and move it to the top so, the first step should be 1 5 2 6 3 4
It isn't a Merge Sort because the array is divided into two subarrays. In this case we see interaction "first" and "second" parts
None of them.
bubble sort: no. After k steps, the last k elements should be the k largest, sorted.
insertion sort: no. After k steps, the k first elements should be sorted.
selection sort: no. After k steps, the k first elements should be the s smallest, sorted.
merge sort: no. After k steps, a value can only have moved 2^k - 1 places. (5 moves 5 places at k=1)
quick sort: no. Whatever the pivot is, 1 and 6 being the extreme values, they can stay in this initial position.
On the quick sort: To make it clear that it is not possible, lets enumerate the results of each pivot for the first step:
5 : [2134] - 5 - [6]. (2134 may be in any order)
2 : [1] - 2 - [5634]
6 : [52134] - 6
1 : 1 - [52634]
3 : [21] - 3 - [564]
4 : [213] - 4 - [56]
One obvious way of seeing that all those are incompatible with the OP's output is that in each case, the 1 is before the 6, no matter how you implement the pivot or the partition.
To solve this all you have to do is make a function for each sort algorithm but include a statement to print the array out after each swap. Then apply your print friendly sort algorithms to the initial array [5 2 6 1 3 4] and see which sort method produces the same output. Additionally, this will help you compare all the different methods.

An interview question from Google [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Given a 2d array sorted in increasing order from left to right and top to bottom, what is the best way to search for a target number?
The following was asked in a Google interview:
You are given a 2D array storing integers, sorted vertically and horizontally.
Write a method that takes as input an integer and outputs a bool saying whether or not the integer is in the array.
What is the best way to do this? And what is its time complexity?
Start at the Bottom-Left corner of the Matrix and follow the rules stated below to traverse the matrix:
The matrix traversal is based on these conditions:
If the input number is greater than current number: Move Right
If the input number is less than current number: Move Up.
If the input number is equal to current number: Return Success
If the input number is not equal to current number and no transition is possible: Return Fail
Time Complexity: (Thanks to Martinho Fernandes)
The time complexity is O(N+M). In the worst case, the element searched for is in the upper-left corner, meaning you'll go up N times, and left M times.
Example
Input matrix:
--------------
| 1 | 4 | 6 |
--------------
| 2 | 5 | 9 |
--------------
| *3* | 8 | 10 |
--------------
Number to search: 4
Step 1:
Start at the cell where you have 3 (Bottom-Left).
3 < 4: Move Right
| 1 | 4 | 6 |
--------------
| 2 | 5 | 9 |
--------------
| 3 | *8* | 10 |
--------------
Step 2:
8 > 4: Move Up
| 1 | 4 | 6 |
--------------
| 2 | *5* | 9 |
--------------
| 3 | 8 | 10 |
--------------
Step 3:
5 > 4: Move Up
| 1 | *4* | 6 |
--------------
| 2 | 5 | 9 |
--------------
| 3 | 8 | 10 |
--------------
Step 4:
4=4: Return the index of the number
I would start by asking details about what it means to be "sorted vertically and horizontally"
If the matrix is sorted in a way that the last element of each row is less than the first element of the next row, you can run a binary search on the first column to find out in what row that number is, and then run another binary search on the row. This algorithm will take O(log C + log R) time, where C and R are, respectively the number of rows and columns. Using a property of the logarithm, one can write that as O(log(C*R)), which is the same as O(log N), if N is the number of elements in the array. This is almost the same as treating the array as 1D and running a binary search on it.
But the matrix could be sorted in a way that the last element of each row is not less than the first element of the next row:
1 2 3 4 5 6 7 8 9
2 3 4 5 6 7 8 9 10
3 4 5 6 7 8 9 10 11
In this case, you could run some sort of horizontal an vertical binary search simultaneously:
Test the middle number of the first column. If it's less than the target, consider the lines above it. If it's greater, consider those below;
Test the middle number of the first considered line. If it's less, consider the columns left of it. If it's greater, consider those to the right;
Lathe, rinse, repeat until you find one, or you're left with no more elements to consider;
This method is also logarithmic on the number of elements.
The first method that comes to mind is a vertical binary search, followed by a horizontal one when you find the row it should be in. Complexity will be O(log NM) where N and M are the dimensions of the array.
Further explanation:
Consider just the first number of every row. When you perform a binary search of these first numbers for the specified number, the result will be either the specified number if you're lucky, otherwise it will be the position before or after where the specified number would go depending on the binary search implementation. Once you find the two of the first numbers that the specified number should go between, you know that the number is in that row, and a second binary search will find the number if it is in the row.

Resources