Removing elements to sort array - algorithm

I'm looking for an algorithm to sort an array, but not by moving the values. Rather, I'd like to delete as few values as possible and end up with a sorted list. Basically I want to find the longest ascending sub-array.
To illustrate:
1 4 5 6 7 2 3 8
Should become (2 removes)
1 4 5 6 7 8
And not (5 removes)
1 2 3
I can see how I can do this in a naive way, i.e. by recursively checking both the 'remove' and 'dont remove' tree for each element. I was just wondering if there was a faster / more efficient way to do this. Is there a common go-to algorithm for this kind of problem?

You're looking for the longest increasing subsequence problem. There is an algorithm that solves it in O(n log n) time.

There is one O(NlogN) algorithm from the site which is faster than the recursive algorithm .
http://www.algorithmist.com/index.php/Longest_Increasing_Subsequence

Related

Show heapsort repeats comparisons

how would one prove that heapsort repeats comparisons that it has made before? (i.e. it would perform a comparison that has been done previously)
Thanks
The two elements may take comparisons in build heap step(heapify) and also in reorder step in heap sort. This is the wiki.
For example, sort by max-heap:
origin array: 4 6 10 7 3 8 5
heapify to a new heap array by shift-up.
The comparisons: 4<6, 6<10, 4<7, 6<8
(10) (7 8) (4 3 6 5) // each layer is grouped by parenthesis
re-order step
swap the first with the last, put the big one to end
reduce the heap size by 1
use shift-down
The comparisons: 5<8, 6<7, 3<6, 3<4, 3<5, 3<4
Because, in the heapify the comparisons based on the order of elements. And after heapify, the order may be not sorted too. So there may be other comparisons.

What are the number of swaps required in selection sort for each case?

I believe that selection sort has the following behavior:
Best case: No swaps required as all elements are properly arranged
Worst case: n-1 swaps required i.e a swap required for each pass and there are n-1 passes as we know where n is number of elements in array
Average case: Not able to find out this. What is the procedure for finding it out?
Is the above information correct?
This says time complexity of swaps in best case is O(n)
http://ocw.utm.my/file.php/31/Module/ocwChp5SelectionSort.pdf
Each iteration of selection sort consists of scanning across the array, finding the minimum element that hasn't already been placed yet, then swapping it to the appropriate position. In a naive implementation of selection sort, this means that there will always be n - 1 swaps made regardless of distribution of elements in the input array.
If you want to minimize the number of swaps, though, you can implement selection sort so that it doesn't perform a swap in the case where the element to be moved is already in the right place. If you add in this restriction, then you're correct that zero swaps would be made in the best case. (I'm not sure whether it's worthwhile to modify selection sort this way, since swaps are pretty fast in most cases).
Really, it depends on the implementation. You could potentially have a weird implementation of selection sort that constantly swaps the candidate minimum element to its tentative final spot on each iteration, which would dramatically increase the number of swaps in the worst case. I'm not sure why you'd do this, though. It's little details like this that accounts for why your explanation seems at odds with what you've found online - depending on how the code is put together, the number of swaps can be different.
The best case and worst case running time of selection sort are n^2. This is because regardless of how the elements are initially arranged, on the i iteration of the main for loop, the algorithm always inspects each of the remaining n-i elements to find the smallest one remaining.
Selection sort is the algorithm which takes minimum number of swaps, and in the best case it takes ZERO (0) swaps, when the input is in the sorted array like 1,2,3,4. But the more pertinent question is what is the worst case of number of swaps in selection sort? And for which input does it occur?
Answer: Worst case of number of swaps is n-1. But it does not occur for the just the oppositely ordered input, rather the oppositely ordered input like 6,5,3,2,1 does not take the worst number of swaps rather it takes n/2 swaps. So what is really the input for which the number of swaps takes N-1 swaps, if you analyse a bit more you’ll see that the worst case occurs for “SINE WAVE KIND OF AN INPUT”. That is alternatively increasing and decreasing input, same as the crest and trough.
7 6 8 5 9 4 10 3 - input of eight (8) elements will therefore require 7 swaps
3 6 8 5 9 4 10 7 (1)
3 4 8 5 9 6 10 7 (2)
3 4 5 8 9 6 10 7 (3)
3 4 5 6 9 8 10 7 (4)
3 4 5 6 7 8 10 9 (5)
3 4 5 6 7 8 10 9 (6)
3 4 5 6 7 8 9 10 (7)
Hence proved that the worst case of number of swaps in selection sort is n-1, best case is 0, and average is (n-1)/2 swaps.

Better than O(log(N)) base 2

I am solving Segment tree and Quad Tree related problems; while I noticed that in segment tree we split the 1D array into 2 (2^1) segments and recursively do this until base case arrives. Similarly, in Quad tree We subdivide the 2D grid into 4 (2^2) segments in every step. All these divide-and-Conquer mechanism is for achieving logarithmic time complexity. No offense!
But why don't we subdivide the array into 4 (4^1) parts or more instead of 2 parts in segment tree? And why we don't split the grid into 16 (4^2) parts instead of 4? By doing these, We can achieve O(log(N)) performance, but it would be a better log as log(N)(base 4) is better than log(N)(base 2).
I know in this case, the implementation would be little bit difficult. Is there a memory overhead problem? Or anything?
Please correct me if I am wrong anywhere. Thanks!
It wouldn't actually work faster. Let's assume that we divided it into 4 parts. Then we would have to merge 4 values instead of 2 in each node to answer the query. Assuming that merging 4 values takes 3 times longer(for example, to get the maximum of 2 numbers we need 1 call to max function, but to get the maximum of 4 values 3 calls are required), we have log4(n) * 3 > log2(n) * 1. Moreover, it would be harder to implement(more cases to be considered and so on).
log 4 (N) = log 2 (N) / log 2 (4) = log 2 (N) / 2
in general,the time complexity are both O(logn) , while four segments is much harder than two segments to maintain. In fact ,(In acm/icpc) two segments are much easy to code and it is suffcient to work.

Interview Algorithm: find two largest elements in array of size n

This is an interview question I saw online and I am not sure I have correct idea for it.
The problem is here:
Design an algorithm to find the two largest elements in a sequence of n numbers.
Number of comparisons need to be n + O(log n)
I think I might choose quick sort and stop when the two largest elements are find?
But not 100% sure about it. Anyone has idea about it please share
Recursively split the array, find the largest element in each half, then find the largest element that the largest element was ever compared against. That first part requires n compares, the last part requires O(log n). Here is an example:
1 2 5 4 9 7 8 7 5 4 1 0 1 4 2 3
2 5 9 8 5 1 4 3
5 9 5 4
9 5
9
At each step I'm merging adjacent numbers and taking the larger of the two. It takes n compares to get down to the largest number, 9. Then, if we look at every number that 9 was compared against (5, 5, 8, 7), we see that the largest one was 8, which must be the second largest in the array. Since there are O(log n) levels in this, it will take O(log n) compares to do this.
For only 2 largest element, a normal selection may be good enough. it's basically O(2*n).
For a more general "select k elements from an array size n" question, quick Sort is a good thinking, but you don't have to really sort the whole array.
try this
you pick a pivot, split the array to N[m] and N[n-m].
if k < m, forget the N[n-m] part, do step 1 in N[m].
if k > m, forget the N[m] part, do step in in N[n-m]. this time, you try to find the first k-m element in the N[n-m].
if k = m, you got it.
It's basically like locate k in an array N. you need log(N) iteration, and move (N/2)^i elements in average. so it's a N + log(N) algorithm (which meets your requirement), and has very good practical performance (faster than plain quick sort, since it avoid any sorting, so the output is not ordered).

finding middle element of an array [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How to find the kth largest element in an unsorted array of length n in O(n)?
Hi all,
I came cross a question in my interview.
Question:
Array of integers will be given as the input and you should find out the middle element when sorted , but without sorting.
For Example.
Input: 1,3,5,4,2
Output: 3
When you sort the given input array, it will be 1,2,3,4,5 where middle element is 3.
You should find this in one pass without sorting.
Any solutions for this?
This is a selection algorithm problem which is O(n).
Edit: but if you sure items are consecutive you can compute smallest and biggest and count of elements (in one pass) and return [smallest + (biggest - smallest + 1)/ 2]
To me, it sounds like you can use std::nth_element straight off - don't know if that is an acceptable answer.
You can use a "modified" quicksort to find it. It runs in O(n^2) but should be fairly fast on average. What you do is every time you choose a pivot, you check how many elements were less than the pivot and how many were greater. If there are same elements less and greater than the pivot, the pivot is the median. If not, you can recurse only to the portion where the element is contained. Worst case scenario, you will be performing a complete sorting though.
Example:
Array with 7 elements, we are looking for the 4-th smallest element.
5 3 8 6 7 1 9
Suppose quicksort chooses 3 as pivot, than you'll get:
1 3 5 8 6 7 9
Now, you want the 2nd smallest in the subarray [5, 8, 6, 7, 9]. Keep going until the pivot is the k-th smallest you are searching in the current iteration.
I think this solution is pretty good for an interview question although you should mention that there is an O(n) deterministic solution.

Resources