Non-Recursive Merge Sort - algorithm

Can someone explain in English how does Non-Recursive merge sort works ?
Thanks

Non-recursive merge sort works by considering window sizes of 1,2,4,8,16..2^n over the input array. For each window ('k' in code below), all adjacent pairs of windows are merged into a temporary space, then put back into the array.
Here is my single function, C-based, non-recursive merge sort.
Input and output are in 'a'. Temporary storage in 'b'.
One day, I'd like to have a version that was in-place:
float a[50000000],b[50000000];
void mergesort (long num)
{
int rght, wid, rend;
int i,j,m,t;
for (int k=1; k < num; k *= 2 ) {
for (int left=0; left+k < num; left += k*2 ) {
rght = left + k;
rend = rght + k;
if (rend > num) rend = num;
m = left; i = left; j = rght;
while (i < rght && j < rend) {
if (a[i] <= a[j]) {
b[m] = a[i]; i++;
} else {
b[m] = a[j]; j++;
}
m++;
}
while (i < rght) {
b[m]=a[i];
i++; m++;
}
while (j < rend) {
b[m]=a[j];
j++; m++;
}
for (m=left; m < rend; m++) {
a[m] = b[m];
}
}
}
}
By the way, it is also very easy to prove this is O(n log n). The outer loop over window size grows as power of two, so k has log n iterations. While there are many windows covered by inner loop, together, all windows for a given k exactly cover the input array, so inner loop is O(n). Combining inner and outer loops: O(n)*O(log n) = O(n log n).

Loop through the elements and make every adjacent group of two sorted by swapping the two when necessary.
Now, dealing with groups of two groups (any two, most likely adjacent groups, but you could use the first and last groups) merge them into one group be selecting the lowest valued element from each group repeatedly until all 4 elements are merged into a group of 4. Now, you have nothing but groups of 4 plus a possible remainder. Using a loop around the previous logic, do it all again except this time work in groups of 4. This loop runs until there is only one group.

Quoting from Algorithmist:
Bottom-up merge sort is a
non-recursive variant of the merge
sort, in which the array is sorted by
a sequence of passes. During each
pass, the array is divided into blocks
of size m. (Initially, m = 1).
Every two adjacent blocks are merged
(as in normal merge sort), and the
next pass is made with a twice larger
value of m.

Both recursive and non-recursive merge sort have same time complexity of O(nlog(n)). This is because both the approaches use stack in one or the other manner.
In non-recursive approach
the user/programmer defines and uses stack
In Recursive approach stack is used internally by the system to store return address of the function which is called recursively

The main reason you would want to use a non-recursive MergeSort is to avoid recursion stack overflow. I for example am trying to sort 100 million records, each record about 1 kByte in length (= 100 gigabytes), in alphanumeric order. An order(N^2) sort would take 10^16 operations, ie it would take decades to run even at 0.1 microsecond per compare operation. An order (N log(N)) Merge Sort will take less than 10^10 operations or less than an hour to run at the same operational speed. However, in the recursive version of MergeSort, the 100 million element sort results in 50-million recursive calls to the MergeSort( ). At a few hundred bytes per stack recursion, this overflows the recursion stack even though the process easily fits within heap memory. Doing the Merge sort using dynamically allocated memory on the heap-- I am using the code provided by Rama Hoetzlein above, but I am using dynamically allocated memory on the heap instead of using the stack-- I can sort my 100 million records with the non-recursive merge sort and I don't overflow the stack. An appropriate conversation for website "Stack Overflow"!
PS: Thanks for the code, Rama Hoetzlein.
PPS: 100 gigabytes on the heap?!! Well, it's a virtual heap on a Hadoop cluster, and the MergeSort will be implemented in parallel on several machines sharing the load...

I am new here.
I have modified Rama Hoetzlein solution( thanks for the ideas ). My merge sort does not use the last copy back loop. Plus it falls back on insertion sort. I have benchmarked it on my laptop and it is the fastest. Even better than the recursive version. By the way it is in java and sorts from descending order to ascending order. And of course it is iterative. It can be made multithreaded. The code has become complex. So if anyone interested, please have a look.
Code :
int num = input_array.length;
int left = 0;
int right;
int temp;
int LIMIT = 16;
if (num <= LIMIT)
{
// Single Insertion Sort
right = 1;
while(right < num)
{
temp = input_array[right];
while(( left > (-1) ) && ( input_array[left] > temp ))
{
input_array[left+1] = input_array[left--];
}
input_array[left+1] = temp;
left = right;
right++;
}
}
else
{
int i;
int j;
//Fragmented Insertion Sort
right = LIMIT;
while (right <= num)
{
i = left + 1;
j = left;
while (i < right)
{
temp = input_array[i];
while(( j >= left ) && ( input_array[j] > temp ))
{
input_array[j+1] = input_array[j--];
}
input_array[j+1] = temp;
j = i;
i++;
}
left = right;
right = right + LIMIT;
}
// Remainder Insertion Sort
i = left + 1;
j = left;
while(i < num)
{
temp = input_array[i];
while(( j >= left ) && ( input_array[j] > temp ))
{
input_array[j+1] = input_array[j--];
}
input_array[j+1] = temp;
j = i;
i++;
}
// Rama Hoetzlein method
int[] temp_array = new int[num];
int[] swap;
int k = LIMIT;
while (k < num)
{
left = 0;
i = k;// The mid point
right = k << 1;
while (i < num)
{
if (right > num)
{
right = num;
}
temp = left;
j = i;
while ((left < i) && (j < right))
{
if (input_array[left] <= input_array[j])
{
temp_array[temp++] = input_array[left++];
}
else
{
temp_array[temp++] = input_array[j++];
}
}
while (left < i)
{
temp_array[temp++] = input_array[left++];
}
while (j < right)
{
temp_array[temp++] = input_array[j++];
}
// Do not copy back the elements to input_array
left = right;
i = left + k;
right = i + k;
}
// Instead of copying back in previous loop, copy remaining elements to temp_array, then swap the array pointers
while (left < num)
{
temp_array[left] = input_array[left++];
}
swap = input_array;
input_array = temp_array;
temp_array = swap;
k <<= 1;
}
}
return input_array;

Just in case anyone's still lurking in this thread ... I've adapted Rama Hoetzlein's non-recursive merge sort algorithm above to sort double linked lists. This new sort is in-place, stable and avoids time costly list dividing code that's in other linked list merge sorting implementations.
// MergeSort.cpp
// Angus Johnson 2017
// License: Public Domain
#include "io.h"
#include "time.h"
#include "stdlib.h"
struct Node {
int data;
Node *next;
Node *prev;
Node *jump;
};
inline void Move2Before1(Node *n1, Node *n2)
{
Node *prev, *next;
//extricate n2 from linked-list ...
prev = n2->prev;
next = n2->next;
prev->next = next; //nb: prev is always assigned
if (next) next->prev = prev;
//insert n2 back into list ...
prev = n1->prev;
if (prev) prev->next = n2;
n1->prev = n2;
n2->prev = prev;
n2->next = n1;
}
void MergeSort(Node *&nodes)
{
Node *first, *second, *base, *tmp, *prev_base;
if (!nodes || !nodes->next) return;
int mul = 1;
for (;;) {
first = nodes;
prev_base = NULL;
//sort each successive mul group of nodes ...
while (first) {
if (mul == 1) {
second = first->next;
if (!second) {
first->jump = NULL;
break;
}
first->jump = second->next;
}
else
{
second = first->jump;
if (!second) break;
first->jump = second->jump;
}
base = first;
int cnt1 = mul, cnt2 = mul;
//the following 'if' condition marginally improves performance
//in an unsorted list but very significantly improves
//performance when the list is mostly sorted ...
if (second->data < second->prev->data)
while (cnt1 && cnt2) {
if (second->data < first->data) {
if (first == base) {
if (prev_base) prev_base->jump = second;
base = second;
base->jump = first->jump;
if (first == nodes) nodes = second;
}
tmp = second->next;
Move2Before1(first, second);
second = tmp;
if (!second) { first = NULL; break; }
--cnt2;
}
else
{
first = first->next;
--cnt1;
}
} //while (cnt1 && cnt2)
first = base->jump;
prev_base = base;
} //while (first)
if (!nodes->jump) break;
else mul <<= 1;
} //for (;;)
}
void InsertNewNode(Node *&head, int data)
{
Node *tmp = new Node;
tmp->data = data;
tmp->next = NULL;
tmp->prev = NULL;
tmp->jump = NULL;
if (head) {
tmp->next = head;
head->prev = tmp;
head = tmp;
}
else head = tmp;
}
void ClearNodes(Node *head)
{
if (!head) return;
while (head) {
Node *tmp = head;
head = head->next;
delete tmp;
}
}
int main()
{
srand(time(NULL));
Node *nodes = NULL, *n;
const int len = 1000000; //1 million nodes
for (int i = 0; i < len; i++)
InsertNewNode(nodes, rand() >> 4);
clock_t t = clock();
MergeSort(nodes); //~1/2 sec for 1 mill. nodes on Pentium i7.
t = clock() - t;
printf("Sort time: %d msec\n\n", t * 1000 / CLOCKS_PER_SEC);
n = nodes;
while (n)
{
if (n->prev && n->data < n->prev->data) {
printf("oops! sorting's broken\n");
break;
}
n = n->next;
}
ClearNodes(nodes);
printf("All done!\n\n");
getchar();
return 0;
}
Edited 2017-10-27: Fixed a bug affecting odd numbered lists

Any interest in this anymore? Probably not. Oh well. Here goes nothing.
The insight of merge-sort is that you can merge two (or several) small sorted runs of records into one larger sorted run, and you can do so with simple stream-like operations "read first/next record" and "append record" -- which means you don't need a big data set in RAM at once: you can get by with just two records, each taken from a distinct run. If you can just keep track of where in your file the sorted runs start and end, you can simply merge pairs of adjacent runs (into a temp file) repeatedly until the file is sorted: this takes a logarithmic number of passes over the file.
A single record is trivially sorted: each time you merge two adjacent runs, the size of each run doubles. So that's one way to keep track. The other is to work on a priority queue of runs. Take the two smallest runs from the queue, merge them, and enqueue the result -- until there is only one remaining run. This is appropriate if you expect your data to naturally start with sorted runs.
In practice with enormous data sets you'll want to exploit the memory hierarchy. Suppose you have gigabytes of RAM and terabytes of data. Why not merge a thousand runs at once? Indeed you can do this, and a priority-queue of runs can help. That will significantly decrease the number of passes you have to make over a file to get it sorted. Some details are left as an exercise for the reader.

Related

why this while loop performs worse than the other very similar while loop?

I am trying to write a variation of insertion sort. In my algorithm, the swapping of values doesn't happen when finding the correct place for item in hand. Instead, it uses a lookup table (an array containing "links" to smaller values in the main array at corresponding positions) to find the correct position of the item. When we are done with all n elements in the main array, we haven't actually changed any of the elements in the main array itself, but an array named smaller will contain the links to immediate smaller values at positions i, i+1, ... n in correspondence to every element i, i+1, ... n in the main array. Finally, we iterate through the array smaller, starting from the index where the largest value in the main array existed, and populate another empty array in backward direction to finally get the sorted sequence.
Somewhat hacky/verbose implementation of the algorithm just described:
public static int [] sort (int[] a) {
int length = a.length;
int sorted [] = new int [length];
int smaller [] = new int [length];
//debug helpers
long e = 0, t = 0;
int large = 0;
smaller[large] = -1;
here:
for (int i = 1; i < length; i++) {
if (a[i] > a[large]) {
smaller[i] = large;
large = i;
continue;
}
int prevLarge = large;
int temp = prevLarge;
long st = System.currentTimeMillis();
while (prevLarge > -1 && a[prevLarge] >= a[i]) {
e++;
if (smaller[prevLarge] == -1) {
smaller[i] = -1;
smaller[prevLarge] = i;
continue here;
}
temp = prevLarge;
prevLarge = smaller[prevLarge];
}
long et = System.currentTimeMillis();
t += (et - st);
smaller[i] = prevLarge;
smaller[temp] = i;
}
for (int i = length - 1; i >= 0; i--) {
sorted[i] = a[large];
large = smaller[large];
}
App.print("DevSort while loop execution: " + (e));
App.print("DevSort while loop time: " + (t));
return sorted;
}
The variables e and t contain the number of times the inner while loop is executed and total time taken to execute the while loop e times, respectively.
Here is a modified version of insertion sort:
public static int [] sort (int a[]) {
int n = a.length;
//debug helpers
long e = 0, t = 0;
for (int j = 1; j < n; j++) {
int key = a[j];
int i = j - 1;
long st = System.currentTimeMillis();
while ( (i > -1) && (a[i] >= key)) {
e++;
// simply crap
if (1 == 1) {
int x = 0;
int y = 1;
int z = 2;
}
a[i + 1] = a[i];
i--;
}
long et = System.currentTimeMillis();
t += (et - st);
a[i+1] = key;
}
App.print("InsertSort while loop execution: " + (e));
App.print("InsertSort while loop time: " + (t));
return a;
}
if block inside the while loop is introduced just to match the number of statements inside the while loop of my "hacky" algorithm. Note that two variables e and t are introduced also in the modified insertion sort.
The thing that's confusing is that even though the while loop of insertion sort runs exactly equal number of times the while loop inside my "hacky" algorithm, t for insertion sort is significantly smaller than t for my algorithm.
For a particular run, if n = 10,000:
Total time taken by insertion sort's while loop: 20ms
Total time taken by my algorithm's while loop: 98ms
if n = 100,000;
Total time taken by insertion sort's while loop: 1100ms
Total time taken by my algorithm's while loop: 25251ms
In fact, because the condition 1 == 1 is always true, insertion sort's if block inside the while loop must execute more often than the one inside while loop of my algorithm. Can someone explain what's going on?
Two arrays containing same elements in the same order are being sorted using each algorithm.

Iterative MergeSort Time Complexity (Bottom-Up)

I have a problem with finding the time-complexity.
Firtly, speaking about the outer FOR in MergeSort, i think that the repetitions are (1+ Sumation(from i=1, to sizeOfArray)(2*i) = 1+(2+4+8+16+32+...+size) but i also think that i am very wrong.
I also have a problem measuring the inside FOR-loop repetitions.
MergeSort(){ //Iterative Version (Bottom-Up)
for(int currentSize = 1; currentSize < length; currentSize *= 2) {
for(int low = 0; low < length - currentSize; low += 2*currentSize){
int mid = low + currentSize - 1;
//min() is used here so if low is very close to the end of the array, high doesn't take outOfBoundries Value.
int high = Math.min(low + currentSize*2 -1, length - 1);
}
}
}
merge(int low, int middle, int high) {
// Copy both parts into the helper array
for (int i = low; i <= high; i++) {
helper[i] = arrayForMergeSort[i];
}
int i = low;
int j = middle + 1;
int k = low;
// Copy the smallest values from either the left or the right side back
// to the original array
while (i <= middle && j <= high) {
if (helper[i] <= helper[j]) {
arrayForMergeSort[k] = helper[i];
i++;
} else {
arrayForMergeSort[k] = helper[j];
j++;
}
k++;
}
// Copy the rest of the left side of the array into the target array
while (i <= middle) {
arrayForMergeSort[k] = helper[i];
k++;
i++;
}
}
For the outer loop, the number of iterations is ceil(log2(length)).
For the inner loop, the number of runs to be merged on each iteration is ceil(length / currentSize) or floor((length + currentSize - 1) / currentSize). If this is an even number, then the last run's size may be less than currentSize. If this is an odd number, then the last run has no run to merge with and also may be less than currrentSize. I'm not sure there's a way to calculate the total number of merge operations without using iteration to sum the merge operations per iteration.
In a "production" version of merge sort, a one time allocation of a working array the same (or 1/2) the size of the original array is done, and then the direction of merge (original to working or working to original is changed with each outer loop. If the program pre-caculates the number of outer iterations, and it is an odd number, then a pre-pass can be done to swap elements in place on the initial pass, so that an even number of merge passes are done, with the sorted data ending up in the original array.

mergesort running faster than radix sort

I sorted a million random positive long numbers about 20 digits in length using my implementations of Merge sort and Radix sort.
The merge sort is significantly, almost 6 times, faster than the Radix sort.
I understand the time complexity of Radix sort also depends on the number of digits of the integers, but my merge implementation is beating my Radix implementation on all input sizes.
I am using my own queue class that has constant time push() and pop() in my radix sort. I am using arrays in the merge sort. Does this have something to do with this?
public static void RadixSort(long arr[]) {
//Using 10 queues for each digit from 0-9.
Queue q[] = new Queue[10];
for (int i = 0; i < 10; i++)
q[i] = new Queue();
boolean allNumbersNotBucketed = true;
long divisor = 1;
while (allNumbersNotBucketed) {
allNumbersNotBucketed = false;
for (int i = 0; i < arr.length; i++) {
long digit = (arr[i] / divisor) % 10;
q[(int) digit].enqueue(arr[i]);
//Put number into appropriate queue.
if(digit > 0) allNumbersNotBucketed = true;
}
int pos = 0;
divisor *= 10;
for (int i = 0; i < 10; i++)
while (!q[i].isEmpty())
arr[pos++] = q[i].dequeue();
//Put queue contents back into array
}
}
Here is the merge sort
public static void mergeSort(long[] a) {
long[] tmp = new long[a.length];
mergeSort(a, tmp, 0, a.length - 1);
}
private static void mergeSort(long[] a, long[] tmp, int left, int right) {
if (left < right) {
int center = (left + right) / 2;
mergeSort(a, tmp, left, center); //Divide 0 to middle
mergeSort(a, tmp, center + 1, right); // Divide middle to center
merge(a, tmp, left, center + 1, right); //Merge sorted lists
}
}
private static void merge(long[] a, long[] tmp, int left, int right,
int rightEnd) {
long leftEnd = right - 1;
int k = left;
long num = rightEnd - left + 1;
//Put smallest element into tmp while both lists
//are non empty.
while (left <= leftEnd && right <= rightEnd)
if (a[left] < a[right])
tmp[k++] = a[left++];
else
tmp[k++] = a[right++];
while (left <= leftEnd)
// Copy rest of first half
tmp[k++] = a[left++];
while (right <= rightEnd)
// Copy rest of right half
tmp[k++] = a[right++];
// Copy tmp back
for (long i = 0; i < num; i++, rightEnd--)
a[rightEnd] = tmp[rightEnd];
}
EDIT:
I was rather stupidly using a LinkedList style Queue. I changed it to use a native array and now the merge sort is only twice as fast as compared to 6 times as fast earlier. The merge sort is still faster even for numbers only 10 digits long. I guess the BigO constants are in play here. Multiple million function calls to push() and pop() could also be to blame here.

Interview - Find magnitude pole in an array

Magnitude Pole: An element in an array whose left hand side elements are lesser than or equal to it and whose right hand side element are greater than or equal to it.
example input
3,1,4,5,9,7,6,11
desired output
4,5,11
I was asked this question in an interview and I have to return the index of the element and only return the first element that met the condition.
My logic
Take two MultiSet (So that we can consider duplicate as well), one for right hand side of the element and one for left hand side of the
element(the pole).
Start with 0th element and put rest all elements in the "right set".
Base condition if this 0th element is lesser or equal to all element on "right set" then return its index.
Else put this into "left set" and start with element at index 1.
Traverse the Array and each time pick the maximum value from "left set" and minimum value from "right set" and compare.
At any instant of time for any element all the value to its left are in the "left set" and value to its right are in the "right set"
Code
int magnitudePole (const vector<int> &A) {
multiset<int> left, right;
int left_max, right_min;
int size = A.size();
for (int i = 1; i < size; ++i)
right.insert(A[i]);
right_min = *(right.begin());
if(A[0] <= right_min)
return 0;
left.insert(A[0]);
for (int i = 1; i < size; ++i) {
right.erase(right.find(A[i]));
left_max = *(--left.end());
if (right.size() > 0)
right_min = *(right.begin());
if (A[i] > left_max && A[i] <= right_min)
return i;
else
left.insert(A[i]);
}
return -1;
}
My questions
I was told that my logic is incorrect, I am not able to understand why this logic is incorrect (though I have checked for some cases and
it is returning right index)
For my own curiosity how to do this without using any set/multiset in O(n) time.
For an O(n) algorithm:
Count the largest element from n[0] to n[k] for all k in [0, length(n)), save the answer in an array maxOnTheLeft. This costs O(n);
Count the smallest element from n[k] to n[length(n)-1] for all k in [0, length(n)), save the answer in an array minOnTheRight. This costs O(n);
Loop through the whole thing and find any n[k] with maxOnTheLeft <= n[k] <= minOnTheRight. This costs O(n).
And you code is (at least) wrong here:
if (A[i] > left_max && A[i] <= right_min) // <-- should be >= and <=
Create two bool[N] called NorthPole and SouthPole (just to be humorous.
step forward through A[]tracking maximum element found so far, and set SouthPole[i] true if A[i] > Max(A[0..i-1])
step backward through A[] and set NorthPole[i] true if A[i] < Min(A[i+1..N-1)
step forward through NorthPole and SouthPole to find first element with both set true.
O(N) in each step above, as visiting each node once, so O(N) overall.
Java implementation:
Collection<Integer> magnitudes(int[] A) {
int length = A.length;
// what's the maximum number from the beginning of the array till the current position
int[] maxes = new int[A.length];
// what's the minimum number from the current position till the end of the array
int[] mins = new int[A.length];
// build mins
int min = mins[length - 1] = A[length - 1];
for (int i = length - 2; i >= 0; i--) {
if (A[i] < min) {
min = A[i];
}
mins[i] = min;
}
// build maxes
int max = maxes[0] = A[0];
for (int i = 1; i < length; i++) {
if (A[i] > max) {
max = A[i];
}
maxes[i] = max;
}
Collection<Integer> result = new ArrayList<>();
// use them to find the magnitudes if any exists
for (int i = 0; i < length; i++) {
if (A[i] >= maxes[i] && A[i] <= mins[i]) {
// return here if first one only is needed
result.add(A[i]);
}
}
return result;
}
Your logic seems perfectly correct (didn't check the implementation, though) and can be implemented to give an O(n) time algorithm! Nice job thinking in terms of sets.
Your right set can be implemented as a stack which supports a min, and the left set can be implemented as a stack which supports a max and this gives an O(n) time algorithm.
Having a stack which supports max/min is a well known interview question and can be done so each operation (push/pop/min/max is O(1)).
To use this for your logic, the pseudo code will look something like this
foreach elem in a[n-1 to 0]
right_set.push(elem)
while (right_set.has_elements()) {
candidate = right_set.pop();
if (left_set.has_elements() && left_set.max() <= candidate <= right_set.min()) {
break;
} else if (!left.has_elements() && candidate <= right_set.min() {
break;
}
left_set.push(candidate);
}
return candidate
I saw this problem on Codility, solved it with Perl:
sub solution {
my (#A) = #_;
my ($max, $min) = ($A[0], $A[-1]);
my %candidates;
for my $i (0..$#A) {
if ($A[$i] >= $max) {
$max = $A[$i];
$candidates{$i}++;
}
}
for my $i (reverse 0..$#A) {
if ($A[$i] <= $min) {
$min = $A[$i];
return $i if $candidates{$i};
}
}
return -1;
}
How about the following code? I think its efficiency is not good in the worst case, but it's expected efficiency would be good.
int getFirstPole(int* a, int n)
{
int leftPole = a[0];
for(int i = 1; i < n; i++)
{
if(a[j] >= leftPole)
{
int j = i;
for(; j < n; j++)
{
if(a[j] < a[i])
{
i = j+1; //jump the elements between i and j
break;
}
else if (a[j] > a[i])
leftPole = a[j];
}
if(j == n) // if no one is less than a[i] then return i
return i;
}
}
return 0;
}
Create array of ints called mags, and int variable called maxMag.
For each element in source array check if element is greater or equal to maxMag.
If is: add element to mags array and set maxMag = element.
If isn't: loop through mags array and remove all elements lesser.
Result: array of magnitude poles
Interesting question, I am having my own solution in C# which I have given below, read the comments to understand my approach.
public int MagnitudePoleFinder(int[] A)
{
//Create a variable to store Maximum Valued Item i.e. maxOfUp
int maxOfUp = A[0];
//if list has only one value return this value
if (A.Length <= 1) return A[0];
//create a collection for all candidates for magnitude pole that will be found in the iteration
var magnitudeCandidates = new List<KeyValuePair<int, int>>();
//add the first element as first candidate
var a = A[0];
magnitudeCandidates.Add(new KeyValuePair<int, int>(0, a));
//lets iterate
for (int i = 1; i < A.Length; i++)
{
a = A[i];
//if this item is maximum or equal to all above items ( maxofUp will hold max value of all the above items)
if (a >= maxOfUp)
{
//add it to candidate list
magnitudeCandidates.Add(new KeyValuePair<int, int>(i, a));
maxOfUp = a;
}
else
{
//remote all the candidates having greater values to this item
magnitudeCandidates = magnitudeCandidates.Except(magnitudeCandidates.Where(c => c.Value > a)).ToList();
}
}
//if no candidate return -1
if (magnitudeCandidates.Count == 0) return -1;
else
//return value of first candidate
return magnitudeCandidates.First().Key;
}

Array of size n, with one element n/2 times

Given an array of n integers, where one element appears more than n/2 times. We need to find that element in linear time and constant extra space.
YAAQ: Yet another arrays question.
I have a sneaking suspicion it's something along the lines of (in C#)
// We don't need an array
public int FindMostFrequentElement(IEnumerable<int> sequence)
{
// Initial value is irrelevant if sequence is non-empty,
// but keeps compiler happy.
int best = 0;
int count = 0;
foreach (int element in sequence)
{
if (count == 0)
{
best = element;
count = 1;
}
else
{
// Vote current choice up or down
count += (best == element) ? 1 : -1;
}
}
return best;
}
It sounds unlikely to work, but it does. (Proof as a postscript file, courtesy of Boyer/Moore.)
Find the median, it takes O(n) on an unsorted array. Since more than n/2 elements are equal to the same value, the median is equal to that value as well.
int findLeader(int n, int* x){
int leader = x[0], c = 1, i;
for(i=1; i<n; i++){
if(c == 0){
leader = x[i];
c = 1;
} else {
if(x[i] == leader) c++;
else c--;
}
}
if(c == 0) return NULL;
else {
c = 0;
for(i=0; i<n; i++){
if(x[i] == leader) c++;
}
if(c > n/2) return leader;
else return NULL;
}
}
I'm not the author of this code, but this will work for your problem. The first part looks for a potential leader, the second checks if it appears more than n/2 times in the array.
This is what I thought initially.
I made an attempt to keep the invariant "one element appears more than n/2 times", while reducing the problem set.
Lets start comparing a[i], a[i+1]. If they're equal we compare a[i+i], a[i+2]. If not, we remove both a[i], a[i+1] from the array. We repeat this until i>=(current size)/2. At this point we'll have 'THE' element occupying the first (current size)/2 positions.
This would maintain the invariant.
The only caveat is that we assume that the array is in a linked list [for it to give a O(n) complexity.]
What say folks?
-bhupi
Well you can do an inplace radix sort as described here[pdf] this takes no extra space and linear time. then you can make a single pass counting consecutive elements and terminating at count > n/2.
How about:
randomly select a small subset of K elements and look for duplicates (e.g. first 4, first 8, etc). If K == 4 then the probability of not getting at least 2 of the duplicates is 1/8. if K==8 then it goes to under 1%. If you find no duplicates repeat the process until you do. (assuming that the other elements are more randomly distributed, this would perform very poorly with, say, 49% of the array = "A", 51% of the array ="B").
e.g.:
findDuplicateCandidate:
select a fixed size subset.
return the most common element in that subset
if there is no element with more than 1 occurrence repeat.
if there is more than 1 element with more than 1 occurrence call findDuplicate and choose the element the 2 calls have in common
This is a constant order operation (if the data set isn't bad) so then do a linear scan of the array in order(N) to verify.
My first thought (not sufficient) would be to:
Sort the array in place
Return the middle element
But that would be O(n log n), as would any recursive solution.
If you can destructively modify the array (and various other conditions apply) you could do a pass replacing elements with their counts or something. Do you know anything else about the array, and are you allowed to modify it?
Edit Leaving my answer here for posterity, but I think Skeet's got it.
in php---pls check if it's correct
function arrLeader( $A ){
$len = count($A);
$B = array();
$val=-1;
$counts = array_count_values(array); //return array with elements as keys and occurrences of each element as values
for($i=0;$i<$len;$i++){
$val = $A[$i];
if(in_array($val,$B,true)){//to avoid looping again and again
}else{
if($counts[$val]>$len/2){
return $val;
}
array_push($B, $val);//to avoid looping again and again
}
}
return -1;
}
int n = A.Length;
int[] L = new int[n + 1];
L[0] = -1;
for (int i = 0; i < n; i++)
{
L[i + 1] = A[i];
}
int count = 0;
int pos = (n + 1) / 2;
int candidate = L[pos];
for (int i = 1; i <= n; i++)
{
if (L[i] == candidate && L[pos++] == candidate)
return candidate;
}
if (count > pos)
return candidate;
return (-1);

Resources