Show heapsort repeats comparisons - algorithm

how would one prove that heapsort repeats comparisons that it has made before? (i.e. it would perform a comparison that has been done previously)
Thanks

The two elements may take comparisons in build heap step(heapify) and also in reorder step in heap sort. This is the wiki.
For example, sort by max-heap:
origin array: 4 6 10 7 3 8 5
heapify to a new heap array by shift-up.
The comparisons: 4<6, 6<10, 4<7, 6<8
(10) (7 8) (4 3 6 5) // each layer is grouped by parenthesis
re-order step
swap the first with the last, put the big one to end
reduce the heap size by 1
use shift-down
The comparisons: 5<8, 6<7, 3<6, 3<4, 3<5, 3<4
Because, in the heapify the comparisons based on the order of elements. And after heapify, the order may be not sorted too. So there may be other comparisons.

Related

algorithm interview: mth frequent element in n sorted arrays

There is an algorithm interview question:
We have n sorted arrays, how to find the m-th frequent element in the aggregated array of n arrays? Moreover, how to save space? Even compromise on some time complexity.
What I can think is that enumerate all the elements of n arrays and use a hashmap to record their frequency, then sort hashmap respect to the value (frequency). But then there is no difference with the one array case.
Walk over all arrays in parallel using n pointers
1 4 7 12 34
2 6 9 12 25
The walk would look like this
1 1 4 7 7 12 12 34
* 2 2 2 9 12 25 34
You do need a hash map in order to count the number of occurrences of elements in the cut. E.g. at the second step in the example, your cut contains 1 and 2.
Also, you need two min-heaps, one for every cut to be able to choose the array to advance along and another one to store m most repetitive elements.
The complexity would be expected O(#elements * (log(n) + log(m))). The space requirement is O(n + m).
But if you really need to save space you can consider all these n sorted arrays as one big unsorted, sort it with something like heapsort and choose the longest subarray of duplicates. This would require O(#elements * log(#elements)) time but only O(1) space.
You do an n-way merge, but instead of writing out the merged array, you just count the length of each run of duplicate values and remember the longest m in a min-heap.
This takes O(total_length * (log n + log m)) time, and O(n) space.
It's a combination of common SO questions. Search above on "merge k sorted lists" and "kth largest"

How to merge sorted lists into a single list in O(n * log(k))

(I got this as an interview question and would love some help with it.)
You have k sorted lists containing n different numbers in total.
Show how to create a single sorted list containing all the element from the k lists in O(n * log(k))
The idea is to use a min heap of size k.
Push all the k lists on the heap (one heap-entry per list), keyed by their minimum (i.e. first) value
Then repeatedly do this:
Extract the top list (having the minimal key) from the heap
Extract the minimum value from that list and push it on the result list
Push the shortened list back (if it is not empty) on the heap, now keyed by its new minimum value
Repeat until all values have been pushed on the result list.
The initial step will have a time complexity of O(klogk).
The 3 steps above will be repeated n times. At each iteration the cost of each is:
O(1)
O(1) if the extraction is implemented using a pointer/index (not shifting all values in the list)
O(log k) as the heap size is never greater than k
So the resulting complexity is O(nlogk) (as k < n, the initial step is not significant).
As the question is stated, there's no need for a k-way merge (or a heap). A standard 2 way merge used repeatedly to merge pairs of lists, in any order, until a single sorted list is produced will also have time complexity O(n log(k)). If the question had instead asked how to merge k lists in a single pass, then a k-way merge would be needed.
Consider the case for k == 32, and to simplify the math, assume all lists are merged in order so that each merge pass merges all n elements. After the first pass, there are k/2 lists, after the 2nd pass, k/4 lists, after log2(k) = 5 passes, all k (32) lists are merged into a single sorted list. Other than simplifying the math, the order in which lists are merged doesn't matter, the time complexity remains the same at O(n log2(k)).
Using a k-way merge is normally only advantageous when merging data using an external device, such as one or more disk drives (or classic usage tape drives), where the I/O time is great enough that heap overhead can be ignored. For a ram based merge / merge sort, the total number of operations is about the same for a 2-way merge / merge sort or a k-way merge / merge sort. On a processor with 16 registers, most of them used as indexes or pointers, an optimized (no heap) 4-way merge (using 8 of the registers as indexes or pointers to current and ending location of each run) can be a bit faster than a 2-way merge due to being more cache friendly.
When N=2, you merge the two lists by iteratively popping the front of the list which is the smallest. In a way, you create a virtual list that supports a pop_front operation implemented as:
pop_front(a, b): return if front(a) <= front(b) then pop_front(a) else pop_front(b)
You can very well arrange a tree-like merging scheme where such virtual lists are merged in pairs:
pop_front(a, b, c, d): return if front(a, b) <= front(c, d) then pop_front(a, b) else pop_front(c, d)
Every pop will involve every level in the tree once, leading to a cost O(Log k) per pop.
The above reasoning is wrong because it doesn't account for the front operations, that involves the comparison between two elements, which will cascade and finally require a total of k-1 comparisons per output element.
This can be circumvented by "memoizing" the front element, i.e. keeping it next to the two lists after a comparison has been made. Then, when an element is popped, this front element is updated.
This directly leads to the binary min-heap device, as suggested by #trincot.
5 7 32 21
5
6 4 8 23 40
2
7 7 20 53
2
2 4 6 8 10

What are the number of swaps required in selection sort for each case?

I believe that selection sort has the following behavior:
Best case: No swaps required as all elements are properly arranged
Worst case: n-1 swaps required i.e a swap required for each pass and there are n-1 passes as we know where n is number of elements in array
Average case: Not able to find out this. What is the procedure for finding it out?
Is the above information correct?
This says time complexity of swaps in best case is O(n)
http://ocw.utm.my/file.php/31/Module/ocwChp5SelectionSort.pdf
Each iteration of selection sort consists of scanning across the array, finding the minimum element that hasn't already been placed yet, then swapping it to the appropriate position. In a naive implementation of selection sort, this means that there will always be n - 1 swaps made regardless of distribution of elements in the input array.
If you want to minimize the number of swaps, though, you can implement selection sort so that it doesn't perform a swap in the case where the element to be moved is already in the right place. If you add in this restriction, then you're correct that zero swaps would be made in the best case. (I'm not sure whether it's worthwhile to modify selection sort this way, since swaps are pretty fast in most cases).
Really, it depends on the implementation. You could potentially have a weird implementation of selection sort that constantly swaps the candidate minimum element to its tentative final spot on each iteration, which would dramatically increase the number of swaps in the worst case. I'm not sure why you'd do this, though. It's little details like this that accounts for why your explanation seems at odds with what you've found online - depending on how the code is put together, the number of swaps can be different.
The best case and worst case running time of selection sort are n^2. This is because regardless of how the elements are initially arranged, on the i iteration of the main for loop, the algorithm always inspects each of the remaining n-i elements to find the smallest one remaining.
Selection sort is the algorithm which takes minimum number of swaps, and in the best case it takes ZERO (0) swaps, when the input is in the sorted array like 1,2,3,4. But the more pertinent question is what is the worst case of number of swaps in selection sort? And for which input does it occur?
Answer: Worst case of number of swaps is n-1. But it does not occur for the just the oppositely ordered input, rather the oppositely ordered input like 6,5,3,2,1 does not take the worst number of swaps rather it takes n/2 swaps. So what is really the input for which the number of swaps takes N-1 swaps, if you analyse a bit more you’ll see that the worst case occurs for “SINE WAVE KIND OF AN INPUT”. That is alternatively increasing and decreasing input, same as the crest and trough.
7 6 8 5 9 4 10 3 - input of eight (8) elements will therefore require 7 swaps
3 6 8 5 9 4 10 7 (1)
3 4 8 5 9 6 10 7 (2)
3 4 5 8 9 6 10 7 (3)
3 4 5 6 9 8 10 7 (4)
3 4 5 6 7 8 10 9 (5)
3 4 5 6 7 8 10 9 (6)
3 4 5 6 7 8 9 10 (7)
Hence proved that the worst case of number of swaps in selection sort is n-1, best case is 0, and average is (n-1)/2 swaps.

Interview Algorithm: find two largest elements in array of size n

This is an interview question I saw online and I am not sure I have correct idea for it.
The problem is here:
Design an algorithm to find the two largest elements in a sequence of n numbers.
Number of comparisons need to be n + O(log n)
I think I might choose quick sort and stop when the two largest elements are find?
But not 100% sure about it. Anyone has idea about it please share
Recursively split the array, find the largest element in each half, then find the largest element that the largest element was ever compared against. That first part requires n compares, the last part requires O(log n). Here is an example:
1 2 5 4 9 7 8 7 5 4 1 0 1 4 2 3
2 5 9 8 5 1 4 3
5 9 5 4
9 5
9
At each step I'm merging adjacent numbers and taking the larger of the two. It takes n compares to get down to the largest number, 9. Then, if we look at every number that 9 was compared against (5, 5, 8, 7), we see that the largest one was 8, which must be the second largest in the array. Since there are O(log n) levels in this, it will take O(log n) compares to do this.
For only 2 largest element, a normal selection may be good enough. it's basically O(2*n).
For a more general "select k elements from an array size n" question, quick Sort is a good thinking, but you don't have to really sort the whole array.
try this
you pick a pivot, split the array to N[m] and N[n-m].
if k < m, forget the N[n-m] part, do step 1 in N[m].
if k > m, forget the N[m] part, do step in in N[n-m]. this time, you try to find the first k-m element in the N[n-m].
if k = m, you got it.
It's basically like locate k in an array N. you need log(N) iteration, and move (N/2)^i elements in average. so it's a N + log(N) algorithm (which meets your requirement), and has very good practical performance (faster than plain quick sort, since it avoid any sorting, so the output is not ordered).

external sorting: multiway merge

In multiway merge The task is to find the smallest element out of k elements
Solution: priority queues
Idea: Take the smallest elements from the first k runs, store them into main memory in a heap tree.
Then repeatedly output the smallest element from the heap. The smallest element is replaced with the next element from the run from which it came.
When finished with the first set of runs, do the same with the next set of runs.
Assume my main memory of size ( M )less than k, how we can sort the elements, in other words,how multi way merge algorithm merge works if memory size M is less than K
For example if my M = 3 and i have following
Tape1: 8 9 10
Tape2: 11 12 13
Tape3: 14 15 16
Tape4: 4 5 6
My question how muliway merge will work because we will read 8, 11, 14 and build priority queue, we place 8 to output tape and then forward Tape1, i am not getting when Tape4 is read and how we will compare with already written to output tape?
Thanks!
It won't work. You must choose a k small enough for available memory.
In this case, you could do a 3-way merge of the first 3 tapes, then a 2-way merge between the result of that and the one remaining tape. Or you could do 3 2-way merges (two pairs of tapes, then combine the results), which is simpler to implement but does more tape access.
In theory you could abandon the priority queue. Then you wouldn't need to store k elements in memory, but you would frequently need to look at the next element on all k tapes in order to find the smallest.

Resources