What is the Average Big-Ο complexity of Gnome sort? - algorithm

There is nothing about it in wikipedia.
Anyone knows that?
I only want to know the average Big-O Complexity of that algorithm.

The performance of the gnome sort algorithm is at least O(f(n)) where f(n) is the sum for each element in the input list of the distance to where that element should end up. For a "random" list of length L, an element at the beginning of the list is expected to be an average of L / 2 distance away from its sorted location. An element in the middle of the list is expected to be an average of L / 4 distance away from its sorted location. Since there are L total elements, f(n) is at least L * L / 4. Therefore, on average, gnome sort is O(n * n).
Sorry if this is hard to follow.

Here is a simple comparison of bubble and gnome sort of an array of random values, values in reverse order, 3 concatenated arrays of ordered values and ordered values. Gnome sort on average seems to be a bit cheaper on the comparison side of things.
Note that the comparisons/swaps when sorting random values is always a bit different, but close to these results.
N = 100, attempts = 1000
random:
bubble sort: comparisons = 8791794, swaps = 2474088
gnome sort: comparisons = 5042930, swaps = 2474088
reversed:
bubble sort: comparisons = 9900000, swaps = 4950000
gnome sort: comparisons = 9900000, swaps = 4950000
3 ordered sets:
bubble sort: comparisons = 6435000, swaps = 1584000
gnome sort: comparisons = 3267000, swaps = 1584000
ordered:
bubble sort: comparisons = 99000, swaps = 0
gnome sort: comparisons = 99000, swaps = 0
... And here is the code used to get these results:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
const int N = 100;
int x[N];
int main()
{
srand((unsigned int)time(0));
int comparisons = 0;
int swaps = 0;
int attempts = 1000;
while (--attempts >= 0)
{
// random:
for (int i = 0; i < N; ++i)
x[i] = rand();
// reversed:
/*for (int i = 0; i < N; ++i)
x[i] = N - 1 - i;*/
// 3 ordered sets:
/*for (int i = 0; i < N/3; ++i)
x[i] = i;
for (int i = N/3, j = 0; i < 2*N/3; ++i, ++j)
x[i] = j;
for (int i = 2*N/3, j = 0; i < N; ++i, ++j)
x[i] = j;*/
// ordered:
/*for (int i = 0; i < N; ++i)
x[i] = i;*/
// bubble sort:
/*{
bool swapped;
do
{
swapped = false;
for (int i = 0; i < (N - 1); ++i)
{
++comparisons;
if (x[i] > x[i + 1])
{
++swaps;
int t = x[i];
x[i] = x[i + 1];
x[i + 1] = t;
swapped = true;
}
}
} while (swapped);
}*/
// gnome sort:
{
int i = 1;
while (i < N)
{
++comparisons;
if (x[i] >= x[i - 1])
++i;
else
{
++swaps;
int t = x[i];
x[i] = x[i - 1];
x[i - 1] = t;
if (i > 1)
--i;
}
}
}
}
printf("comparisons = %d\n", comparisons);
printf("swaps = %d\n", swaps);
}
Obviously this is not a full test by far, but it gives an idea.

Rather the contrary, Wikipedia says it's O(n2), and from the description, I can't see how there would be any real doubt about that.

"Average" cannot really be answered without looking at the input data. If you know what you are sorting you could do some analysis to get a better idea how it would perform in your application.

It seems intuitive to me that if insertion sort has an average-case running time that is O(n^2), and gnome sort is a slightly worse version of insertion sort, then gnome sort's average running time would also be O(n^2) (well, Θ(n^2)).
This pdf has this to say about insertion-sort's average-case running time of Θ(n^2):
The proof of this is not trivial, but
it is based on the intuitive fact that
on the average, the while loop test
"list[pos-1] > value" is true about
half the time, so that on average, the
number of executions of the while loop
is one-half of the maximum number.
Since the maximum number is n(n-1)/2,
the average number of executions of
the while loop is n(n-1)/4 , which is
still Θ(n^2).
The same reasoning would apply to gnome sort. You know gnome sort can't be better because the "gnome" first has to scan backwards (via swapping) to find where the item goes (equivalent to insertion sort's scan forward), and then has to walk back up the list looking at elements that it's already sorted. Any run-time differences between scanning methods I believe are negligible to the complexity bound, but I'll defer to you to prove it.

Wikipedia clearly says that it has a worst case run time of O(n**2).

http://en.wikipedia.org/wiki/Gnome_sort
This is O(n^2). Worst case scenario, every time you entered the while the current position needs to be exchanged with the previous position, which makes pos smaller and you have to exchange again. Test ascending sorting on a descendent sorted array (ie to get [1 2 3 4] from [4 3 2 1] you'd have the worst case).

Related

How does Sort algorithm performance vary with repeated elements in data

Does the performance of a sorting algorithms(Merge,Quick) reduce with repeated elements for the same sized array.
can I say
var arr = [6,4,7,1,8,3,9,2,10,5]; --> Sorts faster because of unique elements
var arr = [1,1,3,3,3,2,4,4,5,10]; --> Slower because repeated elements
I tried to write a simple quicksort implementation and it gives me the above mentioned behavior . Am I doing something wrong
var Quicksort = function(arr, si, ei) {
if (si >= ei) return;
var pivot = arr[ei];
var pi = si;
for (var i = si; i < ei; i++) {
if (arr[i] < pivot) {
Swap(arr, pi, i);
pi++;
}
}
Swap(arr, pi, ei);
Quicksort(arr, 0, pi - 1);
Quicksort(arr, pi + 1, ei);
return arr;
}
function Swap(arr, i, j) {
var temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
// Execute
var input = [];
for (var i = 0; i < 1000; i++) {
input.push(Math.floor(Math.random() * 100000));
}
console.log(input);
var result = Quicksort(input, 0, input.length - 1);
console.log(result);
Thanks
The number of equal or different values does not affect the efficiency of either merge or quicksort.
what affects the efficiency of quicksort is how you choose the pivot value and if the array is already sorted.
read: http://www.geeksforgeeks.org/when-does-the-worst-case-of-quicksort-occur/
when it comes to mergesort, what affects its efficiency is how many comparisons your end up doing during the merge phase.
read: When will the worst case of Merge Sort occur?
Therefore, the frequency of numbers does not affect the sorting algorithms.
of course, assuming you have an array of size n, where all values are equal to x, quicksort will perform well, since the array will already be sorted.
like: [6,6,6,6,6], quicksort would performe in n².
I believe that there are other sorting algorithms that would do a better job for arrays where values repeat a lot, I would say counting sort (http://www.geeksforgeeks.org/counting-sort/) would be a good approach.

Give a pseudocode for an algorithm that, given a list of n integers from the set {0, 1,

Problem statement:
Give a pseudocode for an algorithm that, given a list of n integers from the set {0, 1, . . . , k−1},
preprocesses its input to extract and store information that makes it possible to answer any query asking
how many of the n integers fall in the range [a..b] (with a and b being input parameters to the query) in
O(1) time. Explain how your algorithm works.
The preprocessing time should be O(n + k) in the worst case. Provide an argument showing that your
preprocessing algorithm meets that bound.
My attempt:
Counting Sort Pseudo Code
function countingSort(array, min, max)
count: array of (max – min + 1) elements //max is highest number, min is lowest
initialize count with 0 //set count = 0
for each number in array do
count[number – min] := count[number-min] + 1 //element i – min element = pos.
//pos + 1
done
z:= 0
for i from min to max do
while(count[ i – min] >0) do
array[z] := i
z := z + 1
count[i – min] := count [i – min] – 1
done
done
Find Pseudo Code
find(a, b)
??
Time Complexity Analysis:
We find that the total time complexity of Counting Sort takes O(k) time to initialize the array, O(n) time to read in the numbers and increment the appropriate element of counts. Another O(k) to create the array z, and another O(n) to scan and read through the list of numbers for a toal runtime of O(n+k).
Question:
The only problem I am having is that I do not know how I will report back to the user the number of integers that lie in between the range they have chosen [a..b] in O(1) time.. The only way I can think of retrieving that information is by looping through my array of sorted integers and having a counter to increment each time we find a number such that some some element is >= a && some element is <= b. Also should I include the actual numbers they have inputted in my search or rather should I just count the numbers in between them? The problem with looping through the array and having a counter to count the numbers between [a..b] is that this requires a for loop and is O(n). Any help would be greatly appreciated
The answer was trivial, just didn't think about it. After I use counting sort it resorts my list so that all I have to do is take the difference of the range asked of from the user. So for example
find(a,b)
numberofIntegersBetweenAandB = count[b] - count[a]
Working C++ example. Since the goal here is psuedo code, there are no error checks.
int * GenerateSums(int a[], size_t n, int min, int max)
{
size_t k = max + 2 - min;
int *sums = new int[k];
for(size_t i = 0; i < k; i++) // clear sums
sums[i] = 0;
for(size_t i = 0; i < n; i++) // set number of instances
sums[1+a[i]-min]++;
for(size_t i = 1; i < k; i++) // convert to cumulative sums
sums[i] += sums[i-1];
return sums;
}
int CountInRange(int sums[], int a, int b)
{
return sums[b+1] - sums[a];
}
int main()
{
int a[] = {4,0,3,4,2,4,1,4,3,4,3,2,4,2,3,1};
int *sums = GenerateSums(a, sizeof(a)/sizeof(a[0]), 0, 4);
int cnt;
cnt = CountInRange(sums, 0, 0); // returns 1
cnt = CountInRange(sums, 3, 4); // returns 10
cnt = CountInRange(sums, 0, 4); // returns 16
delete[] sums;
return 0;
}

Interview Ques - find the KTHSMALLEST element in an unsorted array

This problem asked to find k'th smallest element in an unsorted array of non-negative integers.
Here main problem is memory limit :( Here we can use constant extra space.
First I tried a O(n^2) method [without any extra memory] which gave me TLE.
Then i tried to use priority queue [extra memory] which gave me MLE :(
Any idea how to solve the problem with constant extra space and within time limit.
You can use a O(n^2) method with some pruning, which will make the program like O(nlogn) :)
Declare two variable low = maximum value which position is less than k and high = lowest value which position is greater than k
Keep track of the low and high value you already processed.
Whenever a new value comes check if it is in the [low , high] boundary. If yes then process it otherwise skip the value.
That's it :) I think it will pass both TLE and MLE :)
Have a look at my code :
int low=0,high=1e9;
for(int i=0;i<n;i++) // n is the total number of element
{
if(!(A[i]>=low&&A[i]<=high)) // A is the array in which the element are saved
continue;
int cnt=0,cnt1=0; // cnt is for the strictly less value and cnt1 for same value. Because value can be duplicate.
for(int j=0;j<n;j++)
{
if(i!=j&&A[i]>A[j])
cnt++;
if(A[i]==A[j])
cnt1++;
if(cnt>k)
break;
}
if(cnt+cnt1<k)
low=A[i]+1;
else if(cnt>=k)
high=A[i]-1;
if(cnt<k&&(cnt+cnt1)>=k)
{
return A[i];
}
}
You can do an in-place Selection Algorithm.
The idea is similar to quicksort, but recurse only on the relevant part of the array, not all of it. Note that the algorithm can be implemented with O(1) extra space pretty easily - since it's recursive call is a tail call.
This leads to an O(n) solution on average case (just be sure to pick a pivot at random in order to make sure you don't fall into pre-designed edge cases such as a sorted list). That can be improved to worst case O(n) using median of median technique, but with significantly worse constants.
Binary search on the answer for the problem.
2 major observations here :
Given that all values in the array are of type 'int', their range can be defined as [0, 2^31]. That would be your search space.
Given a value x, I can always tell in O(n) if the kth smallest element is smaller than x or greater than x.
A rough pseudocode :
start = 0, end = 2^31 - 1
while start <= end
x = (start + end ) / 2
less = number of elements less than or equal to x
if less > k
end = x - 1
elif less < k
start = x + 1
else
ans = x
end = x - 1
return ans
Hope this helps.
I believe I found a solution that is similar to that of #AliAkber but is slightly easier to understand (I keep track of fewer variables).
It passed all tests on InterviewBit
Here's the code (Java):
public int kthsmallest(final List<Integer> a, int k) {
int lo = Integer.MIN_VALUE;
int hi = Integer.MAX_VALUE;
int champ = -1;
for (int i = 0; i < a.size(); i++) {
int iVal = a.get(i);
int count = 0;
if (!(iVal > lo && iVal < hi)) continue;
for (int j = 0; j < a.size(); j++) {
if (a.get(j) <= iVal) count++;
if (count > k) break;
}
if (count > k && iVal < hi) hi = iVal;
if (count < k && iVal > lo) lo = iVal;
if (count >= k && (champ == -1 || iVal < champ))
champ = iVal;
}
return champ;
}

Why does Radix sort has larger number of instructions than Quick sort in C?

I set 2 functions of sorting algorithms, radix sort and quick sort in C
But when I check those fuctions with gdb, it turns out that quick sort has smaller number of instructions than radix sort. And feels like even faster...
As i know, Radix sort is the fastest sorting algorithm.
Belows are my sorting codes from wiki.
1. Quick sort
void q_sort(int numbers[], int left, int right)
{
if(left == right) return;
int pivot, l_hold, r_hold;
l_hold = left;
r_hold = right;
pivot = numbers[left];
while (left < right)
{
while ((numbers[right] >= pivot) && (left < right))
right--;
if (left != right)
{
numbers[left] = numbers[right];
left++;
}
while ((numbers[left] <= pivot) && (left < right))
left++;
if (left != right)
{
numbers[right] = numbers[left];
right--;
}
}
numbers[left] = pivot;
pivot = left;
left = l_hold;
right = r_hold;
if (left < pivot)
q_sort(numbers, left, pivot-1);
if (right > pivot)
q_sort(numbers, pivot+1, right);
}
2.Radix sort
/**
* #data array
* #size the number of array
* #p cipher of the biggest number
* #k notation( in case of decimal, it is 10)
*/
void rxSort(int *data, int size, int p, int k) {
int *counts,
*temp;
int index, pval, i, j, n;
if ( (counts = (int*) malloc(k * sizeof(int))) == NULL )
return;
if ( (temp = (int*) malloc(size * sizeof(int))) == NULL )
return;
for (n=0; n<p; n++) {
for (i=0; i<k; i++)
counts[i] = 0; // initialize
// n:0 => 1, 1 => 10, 2 => 100
pval = (int)pow((double)k, (double)n);
for (j=0; j<size; j++) {
// if the number is 253
// n:0 => 3, 1 => 5, 2 => 2
index = (int)(data[j] / pval) % k;
counts[index] = counts[index] + 1;
}
for (i=1; i<k; i++) {
counts[i] = counts[i] + counts[i-1];
}
for (j=size-1; j>=0; j--) {
index = (int)(data[j] / pval) % k;
temp[counts[index] -1] = data[j];
counts[index] = counts[index] - 1;
}
memcpy(data, temp, size * sizeof(int));
}
}
There are some restrictions as belows
1. The size of array should be set to 256.
2. the number is range from 0 ~ 64.
3. It operates four times with different arrays.
When I tested, I set the size of array as 50
Then, the number of instrunctions
Radix : 15030
Quick : 7484
Quick wins...T_T.. so sad about Radix... is it true that Quick sort is faster?
Quicksort is, generally, the best choice you have when sorting an array, especially when you don't have information on the range of the numbers and the array is pretty big. That's because qsort has an expected time complexity proportional to the size of its input times the logarithm of that size, O(nlogn), which is the best you can have with an algorithm based on comparisons. Moreover, it has a small hidden constant factor and it sorts in place. Radix sort doesn't sort with comparisons, you need to have some information about the size of the input (n) and the average number of digits per number(k), because its time complexity is proportional to k*n.
In your case you have quite a small array to do tests on, so any observable difference between the behaviour of the two algorithms is asymptotically irrelevant. Quicksort wins because, as said, it has a small constant factor of operations hidden in that O(nlogn). If you try to run Insertion sort and Merge sort on a small array, despite InsSort having O(n^2) and MergeSort O(nlogn) in the worst case, there is a great chance that Insertion sort will be quicker, for the same reason as above.
But rest assure that if you try them on an array of 10^8 numbers, the result will change a lot.
Also keep in mind there is no such a thing as the best sorting algorithm, you just have to see each time which of them suits better the nature of your problem. :)

3-PARTITION problem

here is another dynamic programming question (Vazirani ch6)
Consider the following 3-PARTITION
problem. Given integers a1...an, we
want to determine whether it is
possible to partition of {1...n} into
three disjoint subsets I, J, K such
that
sum(I) = sum(J) = sum(K) = 1/3*sum(ALL)
For example, for input (1; 2; 3; 4; 4;
5; 8) the answer is yes, because there
is the partition (1; 8), (4; 5), (2;
3; 4). On the other hand, for input
(2; 2; 3; 5) the answer is no. Devise
and analyze a dynamic programming
algorithm for 3-PARTITION that runs in
time poly- nomial in n and (Sum a_i)
How can I solve this problem? I know 2-partition but still can't solve it
It's easy to generalize 2-sets solution for 3-sets case.
In original version, you create array of boolean sums where sums[i] tells whether sum i can be reached with numbers from the set, or not. Then, once array is created, you just see if sums[TOTAL/2] is true or not.
Since you said you know old version already, I'll describe only difference between them.
In 3-partition case, you keep array of boolean sums, where sums[i][j] tells whether first set can have sum i and second - sum j. Then, once array is created, you just see if sums[TOTAL/3][TOTAL/3] is true or not.
If original complexity is O(TOTAL*n), here it's O(TOTAL^2*n).
It may not be polynomial in the strictest sense of the word, but then original version isn't strictly polynomial too :)
I think by reduction it goes like this:
Reducing 2-partition to 3-partition:
Let S be the original set, and A be its total sum, then let S'=union({A/2},S).
Hence, perform a 3-partition on the set S' yields three sets X, Y, Z.
Among X, Y, Z, one of them must be {A/2}, say it's set Z, then X and Y is a 2-partition.
The witnesses of 3-partition on S' is the witnesses of 2-partition on S, thus 2-partition reduces to 3-partition.
If this problem is to be solvable; then sum(ALL)/3 must be an integer. Any solution must have SUM(J) + SUM(K) = SUM(I) + sum(ALL)/3. This represents a solution to the 2-partition problem over concat(ALL, {sum(ALL)/3}).
You say you have a 2-partition implementation: use it to solve that problem. Then (at least) one of the two partitions will contain the number sum(ALL)/3 - remove the number from that partion, and you've found I. For the other partition, run 2-partition again, to split J from K; after all, J and K must be equal in sum themselves.
Edit: This solution is probably incorrect - the 2-partition of the concatenated set will have several solutions (at least one for each of I, J, K) - however, if there are other solutions, then the "other side" may not consist of the union of two of I, J, K, and may not be splittable at all. You'll need to actually think, I fear :-).
Try 2: Iterate over the multiset, maintaining the following map: R(i,j,k) :: Boolean which represents the fact whether up to the current iteration the numbers permit division into three multisets that have sums i, j, k. I.e., for any R(i,j,k) and next number n in the next state R' it holds that R'(i+n,j,k) and R'(i,j+n,k) and R'(i,j,k+n). Note that the complexity (as per the excersize) depends on the magnitude of the input numbers; this is a pseudo-polynomialtime algorithm. Nikita's solution is conceptually similar but more efficient than this solution since it doesn't track the third set's sum: that's unnecessary since you can trivially compute it.
As I have answered in same another question like this, the C++ implementation would look something like this:
int partition3(vector<int> &A)
{
int sum = accumulate(A.begin(), A.end(), 0);
if (sum % 3 != 0)
{
return false;
}
int size = A.size();
vector<vector<int>> dp(sum + 1, vector<int>(sum + 1, 0));
dp[0][0] = true;
// process the numbers one by one
for (int i = 0; i < size; i++)
{
for (int j = sum; j >= 0; --j)
{
for (int k = sum; k >= 0; --k)
{
if (dp[j][k])
{
dp[j + A[i]][k] = true;
dp[j][k + A[i]] = true;
}
}
}
}
return dp[sum / 3][sum / 3];
}
Let's say you want to partition the set $X = {x_1, ..., x_n}$ in $k$ partitions.
Create a $ n \times k $ table. Assume the cost $M[i,j]$ be the maximum sum of $i$ elements in $j$ partitions. Just recursively use the following optimality criterion to fill it:
M[n,k] = min_{i\leq n} max ( M[i, k-1], \sum_{j=i+1}^{n} x_i )
Using these initial values for the table:
M[i,1] = \sum_{j=1}^{i} x_i and M[1,j] = x_j
The running time is $O(kn^2)$ (polynomial )
Create a three dimensional array, where size is count of elements, and part is equal to to sum of all elements divided by 3. So each cell of array[seq][sum1][sum2] tells can you create sum1 and sum2 using max seq elements from given array A[] or not. So compute all values of array, result will be in cell array[using all elements][sum of all element / 3][sum of all elements / 3], if you can create two sets without crossing equal to sum/3, there will be third set.
Logic of checking: exlude A[seq] element to third sum(not stored), check cell without element if it has same two sums; OR include to sum1 - if it is possible to get two sets without seq element, where sum1 is smaller by value of element seq A[seq], and sum2 isn't changed; OR include to sum2 check like previous.
int partition3(vector<int> &A)
{
int part=0;
for (int a : A)
part += a;
if (part%3)
return 0;
int size = A.size()+1;
part = part/3+1;
bool array[size][part][part];
//sequence from 0 integers inside to all inside
for(int seq=0; seq<size; seq++)
for(int sum1=0; sum1<part; sum1++)
for(int sum2=0;sum2<part; sum2++) {
bool curRes;
if (seq==0)
if (sum1 == 0 && sum2 == 0)
curRes = true;
else
curRes= false;
else {
int curInSeq = seq-1;
bool excludeFrom = array[seq-1][sum1][sum2];
bool includeToSum1 = (sum1>=A[curInSeq]
&& array[seq-1][sum1-A[curInSeq]][sum2]);
bool includeToSum2 = (sum2>=A[curInSeq]
&& array[seq-1][sum1][sum2-A[curInSeq]]);
curRes = excludeFrom || includeToSum1 || includeToSum2;
}
array[seq][sum1][sum2] = curRes;
}
int result = array[size-1][part-1][part-1];
return result;
}
Another example in C++ (based on the previous answers):
bool partition3(vector<int> const &A) {
int sum = 0;
for (int i = 0; i < A.size(); i++) {
sum += A[i];
}
if (sum % 3 != 0) {
return false;
}
vector<vector<vector<int>>> E(A.size() + 1, vector<vector<int>>(sum / 3 + 1, vector<int>(sum / 3 + 1, 0)));
for (int i = 1; i <= A.size(); i++) {
for (int j = 0; j <= sum / 3; j++) {
for (int k = 0; k <= sum / 3; k++) {
E[i][j][k] = E[i - 1][j][k];
if (A[i - 1] <= k) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j][k - A[i - 1]] + A[i - 1]);
}
if (A[i - 1] <= j) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j - A[i - 1]][k] + A[i - 1]);
}
}
}
}
return (E.back().back().back() / 2 == sum / 3);
}
You really want Korf's Complete Karmarkar-Karp algorithm (http://ac.els-cdn.com/S0004370298000861/1-s2.0-S0004370298000861-main.pdf, http://ijcai.org/papers09/Papers/IJCAI09-096.pdf). A generalization to three-partitioning is given. The algorithm is surprisingly fast given the complexity of the problem, but requires some implementation.
The essential idea of KK is to ensure that large blocks of similar size appear in different partitions. One groups pairs of blocks, which can then be treated as a smaller block of size equal to the difference in sizes that can be placed as normal: by doing this recursively, one ends up with small blocks that are easy to place. One then does a two-coloring of the block groups to ensure that the opposite placements are handled. The extension to 3-partition is a bit complicated. The Korf extension is to use depth-first search in KK order to find all possible solutions or to find a solution quickly.

Resources