Why isn't heapsort stable? - sorting

I'm trying to understand why heapsort isn't stable.
I've googled this, but haven't found a good, intuitive explanation.
I understand the importance of stable sorting - it allows us to sort based on more than one key, which can be very beneficial (i.e., do multiple sortings, each based on a different key. Since every sort will preserve the relative order of elements, previous sortings can add up to give a final list of elements sorted by multiple criteria).
However, why wouldn't heapsort preserve this as well?
Thanks for your help!

Heap sort unstable example
Consider array 21 20a 20b 12 11 8 7 (already in max-heap format)
here 20a = 20b just to differentiate the order we represent them as 20a and 20b
While heapsort first 21 is removed and placed in the last index then 20a is removed and placed in last but one index and 20b in the last but two index so after heap sort the array looks like
7 8 11 12 20b 20a 21.
It does not preserve the order of elements and hence can't be stable

The final sequence of the results from heapsort comes from removing items from the created heap in purely size order (based on the key field).
Any information about the ordering of the items in the original sequence was lost during the heap creation stage, which came first.

Stable means if the two elements have the same key, they remain in the same order or positions. But that is not the case for Heap sort.
Heapsort is not stable because operations on the heap can change the relative order of equal items.
From here:
When sorting (in ascending order) heapsort first peaks the largest
element and put it in the last of the list. So, the element that have
been picked first, stays last and the element that have been picked
second stays to the second last element in the sorted list.
Again, Build-Max-Heap procedure works such that it preserve the order
of same value (ex:3a,3b) in building the heap tree. For extracting
the maximum element it also works from the root and try to preserve
the structure of the tree (except the change for Heapify).
So, what happens, for elements with same value [3a,3b] heapsort picks
3a before 3b but puts 3a to the right of 3b. So, As the list is
sorted in ascending order we get 3b before 3a in the list .
If you try heapsort with (3a,3b,3b) then you can visualize the
situation.

Stable sort algorithms sort elements such that order of repeating elements in the input is maintained in the output as well.
Heap-Sort involves two steps:
Heap creation
Removing and adding the root element from heap tree into a new array which will be sorted in order
1. Order breaks during Heap Creation
Let's say the input array is {1, 5, 2, 3, 2, 6, 2} and for the purpose of seeing the order of 2's, say they are 2a, 2b and 2c so the array would be {1, 5, 2a, 3, 2b, 6, 2c}
Now if you create a heap (min-heap here) out of it, it's array representation will be {1, 2b, 2a, 3, 5, 6, 2c} where order of 2a and 2b has already changed.
2. Order breaks during removal of root element
Now when we have to remove root element (1 in our case) from the heap to put it into another new array, we swap it with the last position and remove it from there, hence changing the heap into {2c, 2b, 2a, 3, 5, 6}. We repeat the same and this time we will remove '2c' from the heap and put it at end of the array where we had put '1'.
When we finish repeating this step until the heap is empty and every element is transferred to the new array, the new array (sorted) it will look like {1, 2c, 2b, 2a, 3, 5, 6}.
Input to Heap-Sort: {1, 5, 2a, 3, 2b, 6, 2c} --> Output: {1, 2c, 2b, 2a, 3, 5, 6}
Hence we see that repeating elements (2's) are not in same order in heap-sorted array as they appear in the input and therefore Heap-Sort is not stable !

I know this is a late answers but I will add my 2 cents here.
Consider a simple array of 3 integers. 2,2,2 now if you build a max heap using build max heap function, you will find that the array storing the input has not changed as it is already in Max heap form. Now when we put the root of the tree at the end of the array in first iteration of heap sort the stability of array is already gone. So there you have an simple example of instability of heap sort.

Suppose take an array of size n (arbitrary value) and if there are two consecutive elements(assume 15) in heap and if their parent indices have values like 4 and 20.(this is the actual order (....4,20 ,.....,15,15.....). the relative order of 4 and 1st 15 remains same but as 20>15,the 2nd 15 comes to front(swap) as defined in heap sort algorithm, the relative order is gone.

Related

Longest substring in which all bits are the same (DP algorithm)

You're given a bit string that consists of 1 <= n <= 32 bits.
You are also given a sequence of changes that invert some of the bits. If the original string is 001011, and the change is "3", then the changed bit string would be 000011. (The 3rd bit from the right was flipped)
I have to find after each change, the length of the longest substring in which each bit is the same. Therefore, for 000011, the answer would be 4.
I think brute force would just be a sliding window that starts at size of the string and shrinks until the first instance where all the bits in the window are the same.
How would that be altered for a dynamic programming solution?
You can solve this by maintaining a list of indices at which the bit flips. You start by creating that list: shift the sequence by one bit (losing the one on the end), and compare to the original. Any mismatch here is a bit flip. Make a list of those indices:
001011
01011
-234--
In this case, you have bit flips after locations 2, 3, and 4.
Now, you need to develop a simple function to process your change operation. This is very simple: for a change of bit n, you need to change whether indices n-1 and n are in the list: if it's not in the list, add it; if it is in the list, remove it. In the case of changing bit 3, both are in the list, so you now remove them:
---4--
Any time you want to check the longest substring, you need merely check adjacent indices for the largest different. Include 0 and the string length as endpoints. Thus, when you have the list [0, 2, 3, 4, 6], you have a maximum difference of 2 at 2-0 and 6-4. After the change, with the list [0, 4, 6], you have the maximum difference of 4 at 4-0.
If you have a large list with many indices, you can simply maintain differences, altering only the adjacent intervals affected by the single change.
This should get you going; I leave the implementation details to the student. :-)

Minimal number of cuts to partition sequence into pieces that can form a non-decreasing sequence

I have N integers, for example 3, 1, 4, 5, 2, 8, 7. There may be some duplicates. I want to divide this sequence into contiguous subsequences such that we can form from them non-decreasing sequence. How to calculate minimal number of cuts? For the example mentioned above, the answer is 6, because we can partition this sequence into {3}, {1}, {4, 5}, {2}, {7}, {8} and then form {1, 2, 3, 4, 5, 7, 8}. What is the fastest way to do this?
Does anyone know how to solve it assuming that some numbers may be equal?
I would cut the array into non-decreasing segments at points where the values decrease, and then use these segments as input into a (single) merge phase - as in a sort-merge - keeping with the same segment, where possible, in the case of ties. Create additional locations for cuts when you have to switch from one segment to another.
The output is sorted, so this produces enough cuts to do the job. Cuts are produced at points where the sequence decreases, or at points where a gap must be created because the original sequence jumps across a number present elsewhere - so no sequence without all of these cuts can be rearranged into sorted order.
Worst case for the merge overhead is if the initial sequence is decreasing. If you use a heap to keep track of what sequences to pick next then this turns into heapsort with cost n log n. Handle ties by pulling all occurrences of the same value from the heap and only then deciding what to do.
This approach works if the list does not contain duplicates. Perhaps those could be taken care of efficiently first.
We can compute the permutation inversion vector in O(n * log n) time and O(n) space using a Fenwick tree. Continuous segments of the vector with the same number can represent sections that need not be cut. Unfortunately, they can also return false positives, like,
Array: {8,1,4,5,7,6,3}
Vector: 0,1,1,1,1,2,5
where the 6 and 3 imply cuts in the sequence, [1,4,5,7]. To counter this, we take a second inversion vector representing the number of smaller elements following each element. Continuous segments parallel in both vectors need not be cut:
Array: {8,1,4,5,7,6,3,0}
Vector: 0,1,1,1,1,2,5,7 // # larger preceding
Vector: 7,1,2,2,3,2,1,0 // # smaller following
|---| // not cut
Array: {3,1,4,5,2,8,7}
Vectors: 0,1,0,0,3,0,1
2,0,1,1,0,1,0
|---| // not cut
Array: {3,1,2,4}
Vectors: 0,1,1,0
2,0,0,0
|---| // not cut

Unique combination of 10 questions

How to form a combination of say 10 questions so that each student (total students = 10) get unique combination.
I don't want to use factorial.
you can use circular queue data structure
now you can cut this at any point you like , and it then it will give you a unique string
for example , if you cut this at point between 2 and 3 and then iterate your queue, you will get :
3, 4, 5, 6, 7, 8, 9, 10, 1, 2
so you need to implement a circular queue, then cut it from 10 different points (after 1, after 2[shown in picture 2],after 3,....)
There are 3,628,800 different permutations of 10 items taken 10 at a time.
If you only need 10 of them you could start with an array that has the values 1-10 in it. Then shuffle the array. That becomes your first permutation. Shuffle the array again and check to see that you haven't already generated that permutation. Repeat that process: shuffle, check, save, until you have 10 unique permutations.
It's highly unlikely (although possible) that you'll generate a duplicate permutation in only 10 tries.
The likelihood that you generate a duplicate increases as you generate more permutations, increasing to 50% by the time you've generated about 2,000. But if you just want a few hundred or less, then this method will do it for you pretty quickly.
The proposed circular queue technique works, too, and has the benefit of simplicity, but the resulting sequences are simply rotations of the original order, and it can't produce more than 10 without a shuffle. The technique I suggest will produce more "random" looking orderings.

Quick sorting algorithm states using middle element as pivot

I need help understanding exactly how the quick sort algorithm works. I've been watching teaching videos and still fail to really grasp it completely.
I have an unsorted list: 1, 2, 9, 5, 6, 4, 7, 8, 3
And I have to quick sort it using 6 as the pivot.
I need to see the state of the list after each partition procedure.
My main problem is understanding what the order of the elements are before and after the pivot. So in this case if we made 6 the pivot, I know the numbers 1 - 5 will be before 6 and 7 - 9 will go after that. But what will the order of the numbers 1 - 5 be and 7 - 9 be in the first partition given my list above?
Here is the partition algorithm that I want to use (bear in my I'm using the middle element as my initial pivot):
Determine the pivot, and swap the pivot with the first element of the list.
Suppose that the index smallIndex points to the last element smaller than the pivot. The index smallIndex is initialized to the first element of the list.
For the remaining elements in the list (starting at the second element)
If the current element is smaller than the pivot
a. Increment smallIndex
b. Swap the current element with the array element pointed to by smallIndex.
Swap the first element, that is the pivot, with the array element pointed to by smallIndex.
It would be amazing if anyone could show the list after each single little change that occurs to the list in the algorithm.
It doesn't matter.
All that matters - all that the partitioning process asserts - is that, after it has been run, there are no values on the left-hand side of the center point that emerges that are greater than the pivot and that there are no values on the right-hand side that are less than the pivot value.
The internal order of the two partitions is then handled in the subsequent recursive calls for each half.

Is deleting random node from a heap possible?

I have a situation where I want to delete a random node from the heap, what choices do I have? I know we can easily delete the last node and the first node of the heap. However if we say delete the last node, then I am not sure if the behavior is correctly defined for deleting a random node from the heap.
e.g.
_______________________
|X|12|13|14|18|20|21|22|
------------------------
So in this case I can delete the node 12 and 22, this is defined, but can I for example delete a random node, e.g. say 13, and still somehow maintain the complete tree property of the heap (along with other properties)?
I'm assuming that you're describing a binary heap maintained in an array, with the invariant that A[N] <= A[N*2] and A[N] <= A[N*2 + 1] (a min-heap).
If yes, then the approach to deletion is straightforward: replace the deleted element with the last element, and perform a sift-down to ensure that it ends up in the proper place. And, of course, decrement the variable that holds the total number of entries in the heap.
Incidentally, if you're working through heap examples, I find it better to use examples that do not have a total ordering. There's nothing in the definition of a heap that requires (eg) A[3] <= A[5], and it's easy to get misled if your examples have such an ordering.
I don't this think it is possible to remove random element from a heap. Let's take this example (following same convention):
3, 10, 4, 15, 20, 6, 5.
Now if I delete element 15, the heap becomes: 3, 10, 4, 5, 20, 6
This makes heap inconsistent because of 5 being child of 10.
The reason I think random deletion won't work is because you may substitute an inside node (instead of root or a leaf) in the heap, and thus there are two paths (parents and children) to heapify (as compared to 1 path in case of pop() or insert()).
Please let me know in case I am missing something here.

Resources