Insertion sort with sentinel - algorithm

I would like to know if there is a purpose to add a sentinel to this code?
public void Sort(ArrayToSort<T> array) {
for (var i = 0; i < array.Length; i++) {
for (var j = i; j > 0; j--) {
if (array.isLess(j, j - 1)) {
array.Swap(j, j - 1);
} else {
break;
}
}
}
}
If the answer is yes, how should I do it? Cauz if I copy all the tab I'm pretty sure that's better to do without sentinel...
thanks ;)

There is way to make natural sentinel in insertion sort. Make the first traversal through the whole array, find the smallest element and shift it into the first position.
After that you get rid off index checking in inner loop. Example code for the second stage from Sedgewick book (Alg. in C):
for (i = l+2; i <= r; i++)
{ int j = i; Item v = a[i];
while (less(v, a[j-1]))
{ a[j] = a[j-1]; j--; }
a[j] = v;
}
Also note that insertion sort uses element shifts, not swaps - for effectivity.
Using this method in the worst case you have about n^2/2 element comparisons versus (n^2/2 element comparisons + n^2/2 index comparisons in trivial case).
I believe that speed gain should exist, but it is not very large (element comparisons might be heavier, and there is also the same number of shift operations in both cases). You can profile both approaches and know result for your specific case.

Related

(with example) Why is KMP string matching O(n). Shouldn't it be O(n*m)?

Why is KMP O(n + m)?
I know this question has probably been asked a million times on here but I haven't find a solution that convinced me/I understood or a question that matched my example.
/**
* KMP algorithm of pattern matching.
*/
public boolean KMP(char []text, char []pattern){
int lps[] = computeTemporaryArray(pattern);
int i=0;
int j=0;
while(i < text.length && j < pattern.length){
if(text[i] == pattern[j]){
i++;
j++;
}else{
if(j!=0){
j = lps[j-1];
}else{
i++;
}
}
}
if(j == pattern.length){
return true;
}
return false;
}
n = size of text
m = size of pattern
I know why its + m, thats the runtime it takes to create the lsp array to do lookups. I'm not sure why the code I passed above is O(n).
I see that above "i" always progresses forwards EXCEPT when it doesn't match and j!= 0. In that case, we can do iterations of the while loop where i doesn't move forward, so its not exactly O(n)
If the lps array is incrementing like [1,2,3,4,5,6,0]. If we fail to match at index 6, j gets updated to 5, and then 4, and then 3.... and etc and we effectively go through m extra iterations (assuming all mismatch). This can occur at every step.
so it would look like
for (int i = 0; i < n; i++) {
for (int j = i; j >=0; j--) {
}
}
and to put all the possible i j combinations aka states would require a nm array so wouldn't the runtime be O(nm).
So is my reading of the code wrong, or the runtime analysis of the for loop wrong, or my example is impossible?
Actually, now that I think about it. It is O(n+m). Just visualized it as two windows shifting.

Insertion sort in best case

With reference to Algorithm - Fourth Edition by Robert and Kevin, I am having difficulty in understanding the best case complexity for Insertion sort as per below code:
public class Insertion
{
public static void sort(Comparable[] a)
{ // Sort a[] into increasing order.
int N = a.length;
for (int i = 1; i < N; i++)
{ // Insert a[i] among a[i-1], a[i-2], a[i-3]... ..
for (int j = i; j > 0 && less(a[j], a[j-1]); j--)
exch(a, j, j-1);
}
}
// See page 245 for less(), exch(), isSorted(), and main().
}
It says in the book that in best case (sorted array), the number of exchanges is 0 and number of compares is N-1. While I understood exchanges to be 0, I am having a hard time how can number of compares be N-1 in best case?
If the array is already sorted, then in the specific implementation of insertion-sort that you provide, each element will only be compared to its immediate predecessor. Since it's not less than that predecessor, the inner for-loop then aborts immediately, without requiring any further comparisons or exchanges.
Note that other implementations of insertion-sort do not necessarily have that property.
how can number of compares be N-1 in best case?
The best case happens when you have an already sorted array. The number of comparison is n-1 because the comparison is made from the 2nd element onwards till the last element.
This can also be observed from your given code:
for (int i = 1; i < N; i++) //int i=1 (start comparing from 2nd element)
The source code for the specific implementation is:
public class Insertion
{
public static void sort(Comparable[] a)
{ // Sort a[] into increasing order.
int N = a.length;
bool exc = false;
for (int i = 1; i < N; i++)
{ // Insert a[i] among a[i-1], a[i-2], a[i-3]... ..
for (int j = i; j > 0 && less(a[j], a[j-1]); j--) {
exch(a, j, j-1);
exc = true;
}
if (!exc)
break;
}
}
// See page 245 for less(), exch(), isSorted(), and main().
}

Sort a given array whose elements range from 1 to n , in which one element is missing and one is repeated

I have to sort this array in O(n) time and O(1) space.
I know how to sort an array in O(n) but that doesn't work with missing and repeated numbers. If I find the repeated and missing numbers first (It can be done in O(n)) and then sort , that seems costly.
static void sort(int[] arr)
{
for(int i=0;i<arr.length;i++)
{
if(i>=arr.length)
break;
if(arr[i]-1 == i)
continue;
else
{
while(arr[i]-1 != i)
{
int temp = arr[arr[i]-1];
arr[arr[i]-1] = arr[i];
arr[i] = temp;
}
}
}
}
First, you need to find missing and repeated numbers. You do this by solving following system of equations:
Left sums are computed simultaneously by making one pass over array. Right sums are even simpler -- you may use formulas for arithmetic progression to avoid looping. So, now you have system of two equations with two unknowns: missing number m and repeated number r. Solve it.
Next, you "sort" array by filling it with numbers 1 to n left to right, omitting m and duplicating r. Thus, overall algorithm requires only two passes over array.
void sort() {
for (int i = 1; i <= N; ++i) {
while (a[i] != a[a[i]]) {
std::swap(a[i], a[a[i]]);
}
}
for (int i = 1; i <= N; ++i) {
if (a[i] == i) continue;
for (int j = a[i] - 1; j >= i; --j) a[j] = j + 1;
for (int j = a[i] + 1; j <= i; ++j) a[j] = j - 1;
break;
}
}
Explanation:
Let's denote m the missing number and d the duplicated number
Please note in the while loop, the break condition is a[i] != a[a[i]] which covers both a[i] == i and a[i] is a duplicate.
After the first for, every non-duplicate number i is encountered 1-2 time and moved into the i-th position of the array at most 1 time.
The first-found number d is moved to d-th position, at most 1 time
The second d is moved around at most N-1 times and ends up in m-th position because every other i-th slot is occupied by number i
The second outer for locate the first i where a[i] != i. The only i satisfies that is i = m
The 2 inner fors handle 2 cases where m < d and m > d respectively
Full implementation at http://ideone.com/VDuLka
After
int temp = arr[arr[i]-1];
add a check for duplicate in the loop:
if((temp-1) == i){ // found duplicate
...
} else {
arr[arr[i]-1] = arr[i];
arr[i] = temp;
}
See if you can figure out the rest of the code.

Bubble sort worst case, best case and average case complexity

What is the (a) worst case, (b) best case, and (c) average case complexity of the following function which does bubble sorting
for i=1 to n-1 do
for j=i to n-1 do
if x[j]>x[j+1] then
temp=x[j]
x[j]=x[j+1]
x[j+1]=temp
end {if}
end {for}
end {for}
How would you justify the complexity?
The worst case is O(n2).
The average case is also O(n2).
The worst case too is O(n2), even though the code inside the if statement will not get executed in this case. The quadratic complexity is due to the fact that the two for loops will execute completely in all the three cases irrespective of the content of the list.
Thats true with below BubbleSort algorithm as well, since while is O(n) as well.
public static void BubbleSort( int [ ] num )
{
int j;
boolean flag = true;
int temp;
while ( flag )
{
flag= false;
for( j=0; j < num.length -1; j++ )
{
if ( num[ j ] > num[j+1] )
{
temp = num[ j ]; //swap elements
num[ j ] = num[ j+1 ];
num[ j+1 ] = temp;
flag = true; //shows a swap occurred
}
}
}
}
If you want a bubble sort algorithm which changes dramatically for best, worst and average case efficiency, try this:
int count = n - 1; // The input size
bool sFlag = true; // A flag variable allowing the inner outerloop to
break early and fall through
while (sFlag ){
sFlag = false; // Set false so that the loop can break if no swaps occur
for (int j = 0; j < count; j++){
if (A[j+1] < A[j]){
int temp; // Swap the two elements
temp = A[j];
A[j] = A[j+1];
A[j+1] = temp;
sFlag = true; // A swap has occured, iterate again
}
}
count--; //Next time, don't bother looking at the last element, it is
in order
}
The Worst case for this is Cworst(n) = 1/2n(n+1), best case is Cbest(n) = n-1.
This is because the count variable makes the inner loop iterate less based on the amount of iteration already done relative to the input size.
This is the most efficient bubble sort I've come across so far.

Array of size n, with one element n/2 times

Given an array of n integers, where one element appears more than n/2 times. We need to find that element in linear time and constant extra space.
YAAQ: Yet another arrays question.
I have a sneaking suspicion it's something along the lines of (in C#)
// We don't need an array
public int FindMostFrequentElement(IEnumerable<int> sequence)
{
// Initial value is irrelevant if sequence is non-empty,
// but keeps compiler happy.
int best = 0;
int count = 0;
foreach (int element in sequence)
{
if (count == 0)
{
best = element;
count = 1;
}
else
{
// Vote current choice up or down
count += (best == element) ? 1 : -1;
}
}
return best;
}
It sounds unlikely to work, but it does. (Proof as a postscript file, courtesy of Boyer/Moore.)
Find the median, it takes O(n) on an unsorted array. Since more than n/2 elements are equal to the same value, the median is equal to that value as well.
int findLeader(int n, int* x){
int leader = x[0], c = 1, i;
for(i=1; i<n; i++){
if(c == 0){
leader = x[i];
c = 1;
} else {
if(x[i] == leader) c++;
else c--;
}
}
if(c == 0) return NULL;
else {
c = 0;
for(i=0; i<n; i++){
if(x[i] == leader) c++;
}
if(c > n/2) return leader;
else return NULL;
}
}
I'm not the author of this code, but this will work for your problem. The first part looks for a potential leader, the second checks if it appears more than n/2 times in the array.
This is what I thought initially.
I made an attempt to keep the invariant "one element appears more than n/2 times", while reducing the problem set.
Lets start comparing a[i], a[i+1]. If they're equal we compare a[i+i], a[i+2]. If not, we remove both a[i], a[i+1] from the array. We repeat this until i>=(current size)/2. At this point we'll have 'THE' element occupying the first (current size)/2 positions.
This would maintain the invariant.
The only caveat is that we assume that the array is in a linked list [for it to give a O(n) complexity.]
What say folks?
-bhupi
Well you can do an inplace radix sort as described here[pdf] this takes no extra space and linear time. then you can make a single pass counting consecutive elements and terminating at count > n/2.
How about:
randomly select a small subset of K elements and look for duplicates (e.g. first 4, first 8, etc). If K == 4 then the probability of not getting at least 2 of the duplicates is 1/8. if K==8 then it goes to under 1%. If you find no duplicates repeat the process until you do. (assuming that the other elements are more randomly distributed, this would perform very poorly with, say, 49% of the array = "A", 51% of the array ="B").
e.g.:
findDuplicateCandidate:
select a fixed size subset.
return the most common element in that subset
if there is no element with more than 1 occurrence repeat.
if there is more than 1 element with more than 1 occurrence call findDuplicate and choose the element the 2 calls have in common
This is a constant order operation (if the data set isn't bad) so then do a linear scan of the array in order(N) to verify.
My first thought (not sufficient) would be to:
Sort the array in place
Return the middle element
But that would be O(n log n), as would any recursive solution.
If you can destructively modify the array (and various other conditions apply) you could do a pass replacing elements with their counts or something. Do you know anything else about the array, and are you allowed to modify it?
Edit Leaving my answer here for posterity, but I think Skeet's got it.
in php---pls check if it's correct
function arrLeader( $A ){
$len = count($A);
$B = array();
$val=-1;
$counts = array_count_values(array); //return array with elements as keys and occurrences of each element as values
for($i=0;$i<$len;$i++){
$val = $A[$i];
if(in_array($val,$B,true)){//to avoid looping again and again
}else{
if($counts[$val]>$len/2){
return $val;
}
array_push($B, $val);//to avoid looping again and again
}
}
return -1;
}
int n = A.Length;
int[] L = new int[n + 1];
L[0] = -1;
for (int i = 0; i < n; i++)
{
L[i + 1] = A[i];
}
int count = 0;
int pos = (n + 1) / 2;
int candidate = L[pos];
for (int i = 1; i <= n; i++)
{
if (L[i] == candidate && L[pos++] == candidate)
return candidate;
}
if (count > pos)
return candidate;
return (-1);

Resources