How do I write merge in place? [duplicate] - algorithm

I know the question is not too specific.
All I want is someone to tell me how to convert a normal merge sort into an in-place merge sort (or a merge sort with constant extra space overhead).
All I can find (on the net) is pages saying "it is too complex" or "out of scope of this text".
The only known ways to merge in-place (without any extra space) are too complex to be reduced to practical program. (taken from here)
Even if it is too complex, what is the basic concept of how to make the merge sort in-place?

Knuth left this as an exercise (Vol 3, 5.2.5). There do exist in-place merge sorts. They must be implemented carefully.
First, naive in-place merge such as described here isn't the right solution. It downgrades the performance to O(N2).
The idea is to sort part of the array while using the rest as working area for merging.
For example like the following merge function.
void wmerge(Key* xs, int i, int m, int j, int n, int w) {
while (i < m && j < n)
swap(xs, w++, xs[i] < xs[j] ? i++ : j++);
while (i < m)
swap(xs, w++, i++);
while (j < n)
swap(xs, w++, j++);
}
It takes the array xs, the two sorted sub-arrays are represented as ranges [i, m) and [j, n) respectively. The working area starts from w. Compare with the standard merge algorithm given in most textbooks, this one exchanges the contents between the sorted sub-array and the working area. As the result, the previous working area contains the merged sorted elements, while the previous elements stored in the working area are moved to the two sub-arrays.
However, there are two constraints that must be satisfied:
The work area should be within the bounds of the array. In other words, it should be big enough to hold elements exchanged in without causing any out-of-bound error.
The work area can be overlapped with either of the two sorted arrays; however, it must ensure that none of the unmerged elements are overwritten.
With this merging algorithm defined, it's easy to imagine a solution, which can sort half of the array; The next question is, how to deal with the rest of the unsorted part stored in work area as shown below:
... unsorted 1/2 array ... | ... sorted 1/2 array ...
One intuitive idea is to recursive sort another half of the working area, thus there are only 1/4 elements haven't been sorted yet.
... unsorted 1/4 array ... | sorted 1/4 array B | sorted 1/2 array A ...
The key point at this stage is that we must merge the sorted 1/4 elements B
with the sorted 1/2 elements A sooner or later.
Is the working area left, which only holds 1/4 elements, big enough to merge
A and B? Unfortunately, it isn't.
However, the second constraint mentioned above gives us a hint, that we can exploit it by arranging the working area to overlap with either sub-array if we can ensure the merging sequence that the unmerged elements won't be overwritten.
Actually, instead of sorting the second half of the working area, we can sort the first half, and put the working area between the two sorted arrays like this:
... sorted 1/4 array B | unsorted work area | ... sorted 1/2 array A ...
This setup effectively arranges the work area overlap with the sub-array A. This idea
is proposed in [Jyrki Katajainen, Tomi Pasanen, Jukka Teuhola. ``Practical in-place mergesort''. Nordic Journal of Computing, 1996].
So the only thing left is to repeat the above step, which reduces the working area from 1/2, 1/4, 1/8, … When the working area becomes small enough (for example, only two elements left), we can switch to a trivial insertion sort to end this algorithm.
Here is the implementation in ANSI C based on this paper.
void imsort(Key* xs, int l, int u);
void swap(Key* xs, int i, int j) {
Key tmp = xs[i]; xs[i] = xs[j]; xs[j] = tmp;
}
/*
* sort xs[l, u), and put result to working area w.
* constraint, len(w) == u - l
*/
void wsort(Key* xs, int l, int u, int w) {
int m;
if (u - l > 1) {
m = l + (u - l) / 2;
imsort(xs, l, m);
imsort(xs, m, u);
wmerge(xs, l, m, m, u, w);
}
else
while (l < u)
swap(xs, l++, w++);
}
void imsort(Key* xs, int l, int u) {
int m, n, w;
if (u - l > 1) {
m = l + (u - l) / 2;
w = l + u - m;
wsort(xs, l, m, w); /* the last half contains sorted elements */
while (w - l > 2) {
n = w;
w = l + (n - l + 1) / 2;
wsort(xs, w, n, l); /* the first half of the previous working area contains sorted elements */
wmerge(xs, l, l + n - w, n, u, w);
}
for (n = w; n > l; --n) /*switch to insertion sort*/
for (m = n; m < u && xs[m] < xs[m-1]; ++m)
swap(xs, m, m - 1);
}
}
Where wmerge is defined previously.
The full source code can be found here and the detailed explanation can be found here
By the way, this version isn't the fastest merge sort because it needs more swap operations. According to my test, it's faster than the standard version, which allocates extra spaces in every recursion. But it's slower than the optimized version, which doubles the original array in advance and uses it for further merging.

Including its "big result", this paper describes a couple of variants of in-place merge sort (PDF):
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.5514&rep=rep1&type=pdf
In-place sorting with fewer moves
Jyrki Katajainen, Tomi A. Pasanen
It is shown that an array of n
elements can be sorted using O(1)
extra space, O(n log n / log log n)
element moves, and n log2n + O(n log
log n) comparisons. This is the first
in-place sorting algorithm requiring
o(n log n) moves in the worst case
while guaranteeing O(n log n)
comparisons, but due to the constant
factors involved the algorithm is
predominantly of theoretical interest.
I think this is relevant too. I have a printout of it lying around, passed on to me by a colleague, but I haven't read it. It seems to cover basic theory, but I'm not familiar enough with the topic to judge how comprehensively:
http://comjnl.oxfordjournals.org/cgi/content/abstract/38/8/681
Optimal Stable Merging
Antonios Symvonis
This paper shows how to stably merge
two sequences A and B of sizes m and
n, m ≤ n, respectively, with O(m+n)
assignments, O(mlog(n/m+1))
comparisons and using only a constant
amount of additional space. This
result matches all known lower bounds...

It really isn't easy or efficient, and I suggest you don't do it unless you really have to (and you probably don't have to unless this is homework since the applications of inplace merging are mostly theoretical). Can't you use quicksort instead? Quicksort will be faster anyway with a few simpler optimizations and its extra memory is O(log N).
Anyway, if you must do it then you must. Here's what I found: one and two. I'm not familiar with the inplace merge sort, but it seems like the basic idea is to use rotations to facilitate merging two arrays without using extra memory.
Note that this is slower even than the classic merge sort that's not inplace.

The critical step is getting the merge itself to be in-place. It's not as difficult as those sources make out, but you lose something when you try.
Looking at one step of the merge:
[...list-sorted...|x...list-A...|y...list-B...]
We know that the sorted sequence is less than everything else, that x is less than everything else in A, and that y is less than everything else in B. In the case where x is less than or equal to y, you just move your pointer to the start of A on one. In the case where y is less than x, you've got to shuffle y past the whole of A to sorted. That last step is what makes this expensive (except in degenerate cases).
It's generally cheaper (especially when the arrays only actually contain single words per element, e.g., a pointer to a string or structure) to trade off some space for time and have a separate temporary array that you sort back and forth between.

An example of bufferless mergesort in C.
#define SWAP(type, a, b) \
do { type t=(a);(a)=(b);(b)=t; } while (0)
static void reverse_(int* a, int* b)
{
for ( --b; a < b; a++, b-- )
SWAP(int, *a, *b);
}
static int* rotate_(int* a, int* b, int* c)
/* swap the sequence [a,b) with [b,c). */
{
if (a != b && b != c)
{
reverse_(a, b);
reverse_(b, c);
reverse_(a, c);
}
return a + (c - b);
}
static int* lower_bound_(int* a, int* b, const int key)
/* find first element not less than #p key in sorted sequence or end of
* sequence (#p b) if not found. */
{
int i;
for ( i = b-a; i != 0; i /= 2 )
{
int* mid = a + i/2;
if (*mid < key)
a = mid + 1, i--;
}
return a;
}
static int* upper_bound_(int* a, int* b, const int key)
/* find first element greater than #p key in sorted sequence or end of
* sequence (#p b) if not found. */
{
int i;
for ( i = b-a; i != 0; i /= 2 )
{
int* mid = a + i/2;
if (*mid <= key)
a = mid + 1, i--;
}
return a;
}
static void ip_merge_(int* a, int* b, int* c)
/* inplace merge. */
{
int n1 = b - a;
int n2 = c - b;
if (n1 == 0 || n2 == 0)
return;
if (n1 == 1 && n2 == 1)
{
if (*b < *a)
SWAP(int, *a, *b);
}
else
{
int* p, * q;
if (n1 <= n2)
p = upper_bound_(a, b, *(q = b+n2/2));
else
q = lower_bound_(b, c, *(p = a+n1/2));
b = rotate_(p, b, q);
ip_merge_(a, p, b);
ip_merge_(b, q, c);
}
}
void mergesort(int* v, int n)
{
if (n > 1)
{
int h = n/2;
mergesort(v, h); mergesort(v+h, n-h);
ip_merge_(v, v+h, v+n);
}
}
An example of adaptive mergesort (optimized).
Adds support code and modifications to accelerate the merge when an auxiliary buffer of any size is available (still works without additional memory). Uses forward and backward merging, ring rotation, small sequence merging and sorting, and iterative mergesort.
#include <stdlib.h>
#include <string.h>
static int* copy_(const int* a, const int* b, int* out)
{
int count = b - a;
if (a != out)
memcpy(out, a, count*sizeof(int));
return out + count;
}
static int* copy_backward_(const int* a, const int* b, int* out)
{
int count = b - a;
if (b != out)
memmove(out - count, a, count*sizeof(int));
return out - count;
}
static int* merge_(const int* a1, const int* b1, const int* a2,
const int* b2, int* out)
{
while ( a1 != b1 && a2 != b2 )
*out++ = (*a1 <= *a2) ? *a1++ : *a2++;
return copy_(a2, b2, copy_(a1, b1, out));
}
static int* merge_backward_(const int* a1, const int* b1,
const int* a2, const int* b2, int* out)
{
while ( a1 != b1 && a2 != b2 )
*--out = (*(b1-1) > *(b2-1)) ? *--b1 : *--b2;
return copy_backward_(a1, b1, copy_backward_(a2, b2, out));
}
static unsigned int gcd_(unsigned int m, unsigned int n)
{
while ( n != 0 )
{
unsigned int t = m % n;
m = n;
n = t;
}
return m;
}
static void rotate_inner_(const int length, const int stride,
int* first, int* last)
{
int* p, * next = first, x = *first;
while ( 1 )
{
p = next;
if ((next += stride) >= last)
next -= length;
if (next == first)
break;
*p = *next;
}
*p = x;
}
static int* rotate_(int* a, int* b, int* c)
/* swap the sequence [a,b) with [b,c). */
{
if (a != b && b != c)
{
int n1 = c - a;
int n2 = b - a;
int* i = a;
int* j = a + gcd_(n1, n2);
for ( ; i != j; i++ )
rotate_inner_(n1, n2, i, c);
}
return a + (c - b);
}
static void ip_merge_small_(int* a, int* b, int* c)
/* inplace merge.
* #note faster for small sequences. */
{
while ( a != b && b != c )
if (*a <= *b)
a++;
else
{
int* p = b+1;
while ( p != c && *p < *a )
p++;
rotate_(a, b, p);
b = p;
}
}
static void ip_merge_(int* a, int* b, int* c, int* t, const int ts)
/* inplace merge.
* #note works with or without additional memory. */
{
int n1 = b - a;
int n2 = c - b;
if (n1 <= n2 && n1 <= ts)
{
merge_(t, copy_(a, b, t), b, c, a);
}
else if (n2 <= ts)
{
merge_backward_(a, b, t, copy_(b, c, t), c);
}
/* merge without buffer. */
else if (n1 + n2 < 48)
{
ip_merge_small_(a, b, c);
}
else
{
int* p, * q;
if (n1 <= n2)
p = upper_bound_(a, b, *(q = b+n2/2));
else
q = lower_bound_(b, c, *(p = a+n1/2));
b = rotate_(p, b, q);
ip_merge_(a, p, b, t, ts);
ip_merge_(b, q, c, t, ts);
}
}
static void ip_merge_chunk_(const int cs, int* a, int* b, int* t,
const int ts)
{
int* p = a + cs*2;
for ( ; p <= b; a = p, p += cs*2 )
ip_merge_(a, a+cs, p, t, ts);
if (a+cs < b)
ip_merge_(a, a+cs, b, t, ts);
}
static void smallsort_(int* a, int* b)
/* insertion sort.
* #note any stable sort with low setup cost will do. */
{
int* p, * q;
for ( p = a+1; p < b; p++ )
{
int x = *p;
for ( q = p; a < q && x < *(q-1); q-- )
*q = *(q-1);
*q = x;
}
}
static void smallsort_chunk_(const int cs, int* a, int* b)
{
int* p = a + cs;
for ( ; p <= b; a = p, p += cs )
smallsort_(a, p);
smallsort_(a, b);
}
static void mergesort_lower_(int* v, int n, int* t, const int ts)
{
int cs = 16;
smallsort_chunk_(cs, v, v+n);
for ( ; cs < n; cs *= 2 )
ip_merge_chunk_(cs, v, v+n, t, ts);
}
static void* get_buffer_(int size, int* final)
{
void* p = NULL;
while ( size != 0 && (p = malloc(size)) == NULL )
size /= 2;
*final = size;
return p;
}
void mergesort(int* v, int n)
{
/* #note buffer size may be in the range [0,(n+1)/2]. */
int request = (n+1)/2 * sizeof(int);
int actual;
int* t = (int*) get_buffer_(request, &actual);
/* #note allocation failure okay. */
int tsize = actual / sizeof(int);
mergesort_lower_(v, n, t, tsize);
free(t);
}

This answer has a code example, which implements the algorithm described in the paper Practical In-Place Merging by Bing-Chao Huang and Michael A. Langston. I have to admit that I do not understand the details, but the given complexity of the merge step is O(n).
From a practical perspective, there is evidence that pure in-place implementations are not performing better in real world scenarios. For example, the C++ standard defines std::inplace_merge, which is as the name implies an in-place merge operation.
Assuming that C++ libraries are typically very well optimized, it is interesting to see how it is implemented:
1) libstdc++ (part of the GCC code base): std::inplace_merge
The implementation delegates to __inplace_merge, which dodges the problem by trying to allocate a temporary buffer:
typedef _Temporary_buffer<_BidirectionalIterator, _ValueType> _TmpBuf;
_TmpBuf __buf(__first, __len1 + __len2);
if (__buf.begin() == 0)
std::__merge_without_buffer
(__first, __middle, __last, __len1, __len2, __comp);
else
std::__merge_adaptive
(__first, __middle, __last, __len1, __len2, __buf.begin(),
_DistanceType(__buf.size()), __comp);
Otherwise, it falls back to an implementation (__merge_without_buffer), which requires no extra memory, but no longer runs in O(n) time.
2) libc++ (part of the Clang code base): std::inplace_merge
Looks similar. It delegates to a function, which also tries to allocate a buffer. Depending on whether it got enough elements, it will choose the implementation. The constant-memory fallback function is called __buffered_inplace_merge.
Maybe even the fallback is still O(n) time, but the point is that they do not use the implementation if temporary memory is available.
Note that the C++ standard explicitly gives implementations the freedom to choose this approach by lowering the required complexity from O(n) to O(N log N):
Complexity:
Exactly N-1 comparisons if enough additional memory is available. If the memory is insufficient, O(N log N) comparisons.
Of course, this cannot be taken as a proof that constant space in-place merges in O(n) time should never be used. On the other hand, if it would be faster, the optimized C++ libraries would probably switch to that type of implementation.

This is my C version:
void mergesort(int *a, int len) {
int temp, listsize, xsize;
for (listsize = 1; listsize <= len; listsize*=2) {
for (int i = 0, j = listsize; (j+listsize) <= len; i += (listsize*2), j += (listsize*2)) {
merge(& a[i], listsize, listsize);
}
}
listsize /= 2;
xsize = len % listsize;
if (xsize > 1)
mergesort(& a[len-xsize], xsize);
merge(a, listsize, xsize);
}
void merge(int *a, int sizei, int sizej) {
int temp;
int ii = 0;
int ji = sizei;
int flength = sizei+sizej;
for (int f = 0; f < (flength-1); f++) {
if (sizei == 0 || sizej == 0)
break;
if (a[ii] < a[ji]) {
ii++;
sizei--;
}
else {
temp = a[ji];
for (int z = (ji-1); z >= ii; z--)
a[z+1] = a[z];
ii++;
a[f] = temp;
ji++;
sizej--;
}
}
}

I know I'm late to the game, but here's a solution I wrote yesterday. I also posted this elsewhere, but this appears to be the most popular merge-in-place thread on SO. I've also not seen this algorithm posted anywhere else, so hopefully this helps some people.
This algorithm is in its most simple form so that it can be understood. It can be significantly tweaked for extra speed. Average time complexity is: O(n.log₂n) for the stable in-place array merge, and O(n.(log₂n)²) for the overall sort.
// Stable Merge In Place Sort
//
//
// The following code is written to illustrate the base algorithm. A good
// number of optimizations can be applied to boost its overall speed
// For all its simplicity, it does still perform somewhat decently.
// Average case time complexity appears to be: O(n.(log₂n)²)
#include <stddef.h>
#include <stdio.h>
#define swap(x, y) (t=(x), (x)=(y), (y)=t)
// Both sorted sub-arrays must be adjacent in 'a'
// Assumes that both 'an' and 'bn' are always non-zero
// 'an' is the length of the first sorted section in 'a', referred to as A
// 'bn' is the length of the second sorted section in 'a', referred to as B
static void
merge_inplace(int A[], size_t an, size_t bn)
{
int t, *B = &A[an];
size_t pa, pb; // Swap partition pointers within A and B
// Find the portion to swap. We're looking for how much from the
// start of B can swap with the end of A, such that every element
// in A is less than or equal to any element in B. This is quite
// simple when both sub-arrays come at us pre-sorted
for(pa = an, pb = 0; pa>0 && pb<bn && B[pb] < A[pa-1]; pa--, pb++);
// Now swap last part of A with first part of B according to the
// indicies we found
for (size_t index=pa; index < an; index++)
swap(A[index], B[index-pa]);
// Now merge the two sub-array pairings. We need to check that either array
// didn't wholly swap out the other and cause the remaining portion to be zero
if (pa>0 && (an-pa)>0)
merge_inplace(A, pa, an-pa);
if (pb>0 && (bn-pb)>0)
merge_inplace(B, pb, bn-pb);
} // merge_inplace
// Implements a recursive merge-sort algorithm with an optional
// insertion sort for when the splits get too small. 'n' must
// ALWAYS be 2 or more. It enforces this when calling itself
static void
merge_sort(int a[], size_t n)
{
size_t m = n/2;
// Sort first and second halves only if the target 'n' will be > 1
if (m > 1)
merge_sort(a, m);
if ((n-m)>1)
merge_sort(a+m, n-m);
// Now merge the two sorted sub-arrays together. We know that since
// n > 1, then both m and n-m MUST be non-zero, and so we will never
// violate the condition of not passing in zero length sub-arrays
merge_inplace(a, m, n-m);
} // merge_sort
// Print an array */
static void
print_array(int a[], size_t size)
{
if (size > 0) {
printf("%d", a[0]);
for (size_t i = 1; i < size; i++)
printf(" %d", a[i]);
}
printf("\n");
} // print_array
// Test driver
int
main()
{
int a[] = { 17, 3, 16, 5, 14, 8, 10, 7, 15, 1, 13, 4, 9, 12, 11, 6, 2 };
size_t n = sizeof(a) / sizeof(a[0]);
merge_sort(a, n);
print_array(a, n);
return 0;
} // main

Leveraging C++ std::inplace_merge, in-place merge sort can be implemented as follows:
template< class _Type >
inline void merge_sort_inplace(_Type* src, size_t l, size_t r)
{
if (r <= l) return;
size_t m = l + ( r - l ) / 2; // computes the average without overflow
merge_sort_inplace(src, l, m);
merge_sort_inplace(src, m + 1, r);
std::inplace_merge(src + l, src + m + 1, src + r + 1);
}
More sorting algorithms, including parallel implementations, are available in https://github.com/DragonSpit/ParallelAlgorithms repo, which is open source and free.

I just tried in place merge algorithm for merge sort in JAVA by using the insertion sort algorithm, using following steps.
1) Two sorted arrays are available.
2) Compare the first values of each array; and place the smallest value into the first array.
3) Place the larger value into the second array by using insertion sort (traverse from left to right).
4) Then again compare the second value of first array and first value of second array, and do the same. But when swapping happens there is some clue on skip comparing the further items, but just swapping required.
I have made some optimization here; to keep lesser comparisons in insertion sort. The only drawback i found with this solutions is it needs larger swapping of array elements in the second array.
e.g)
First___Array : 3, 7, 8, 9
Second Array : 1, 2, 4, 5
Then 7, 8, 9 makes the second array to swap(move left by one) all its elements by one each time to place himself in the last.
So the assumption here is swapping items is negligible compare to comparing of two items.
https://github.com/skanagavelu/algorithams/blob/master/src/sorting/MergeSort.java
package sorting;
import java.util.Arrays;
public class MergeSort {
public static void main(String[] args) {
int[] array = { 5, 6, 10, 3, 9, 2, 12, 1, 8, 7 };
mergeSort(array, 0, array.length -1);
System.out.println(Arrays.toString(array));
int[] array1 = {4, 7, 2};
System.out.println(Arrays.toString(array1));
mergeSort(array1, 0, array1.length -1);
System.out.println(Arrays.toString(array1));
System.out.println("\n\n");
int[] array2 = {4, 7, 9};
System.out.println(Arrays.toString(array2));
mergeSort(array2, 0, array2.length -1);
System.out.println(Arrays.toString(array2));
System.out.println("\n\n");
int[] array3 = {4, 7, 5};
System.out.println(Arrays.toString(array3));
mergeSort(array3, 0, array3.length -1);
System.out.println(Arrays.toString(array3));
System.out.println("\n\n");
int[] array4 = {7, 4, 2};
System.out.println(Arrays.toString(array4));
mergeSort(array4, 0, array4.length -1);
System.out.println(Arrays.toString(array4));
System.out.println("\n\n");
int[] array5 = {7, 4, 9};
System.out.println(Arrays.toString(array5));
mergeSort(array5, 0, array5.length -1);
System.out.println(Arrays.toString(array5));
System.out.println("\n\n");
int[] array6 = {7, 4, 5};
System.out.println(Arrays.toString(array6));
mergeSort(array6, 0, array6.length -1);
System.out.println(Arrays.toString(array6));
System.out.println("\n\n");
//Handling array of size two
int[] array7 = {7, 4};
System.out.println(Arrays.toString(array7));
mergeSort(array7, 0, array7.length -1);
System.out.println(Arrays.toString(array7));
System.out.println("\n\n");
int input1[] = {1};
int input2[] = {4,2};
int input3[] = {6,2,9};
int input4[] = {6,-1,10,4,11,14,19,12,18};
System.out.println(Arrays.toString(input1));
mergeSort(input1, 0, input1.length-1);
System.out.println(Arrays.toString(input1));
System.out.println("\n\n");
System.out.println(Arrays.toString(input2));
mergeSort(input2, 0, input2.length-1);
System.out.println(Arrays.toString(input2));
System.out.println("\n\n");
System.out.println(Arrays.toString(input3));
mergeSort(input3, 0, input3.length-1);
System.out.println(Arrays.toString(input3));
System.out.println("\n\n");
System.out.println(Arrays.toString(input4));
mergeSort(input4, 0, input4.length-1);
System.out.println(Arrays.toString(input4));
System.out.println("\n\n");
}
private static void mergeSort(int[] array, int p, int r) {
//Both below mid finding is fine.
int mid = (r - p)/2 + p;
int mid1 = (r + p)/2;
if(mid != mid1) {
System.out.println(" Mid is mismatching:" + mid + "/" + mid1+ " for p:"+p+" r:"+r);
}
if(p < r) {
mergeSort(array, p, mid);
mergeSort(array, mid+1, r);
// merge(array, p, mid, r);
inPlaceMerge(array, p, mid, r);
}
}
//Regular merge
private static void merge(int[] array, int p, int mid, int r) {
int lengthOfLeftArray = mid - p + 1; // This is important to add +1.
int lengthOfRightArray = r - mid;
int[] left = new int[lengthOfLeftArray];
int[] right = new int[lengthOfRightArray];
for(int i = p, j = 0; i <= mid; ){
left[j++] = array[i++];
}
for(int i = mid + 1, j = 0; i <= r; ){
right[j++] = array[i++];
}
int i = 0, j = 0;
for(; i < left.length && j < right.length; ) {
if(left[i] < right[j]){
array[p++] = left[i++];
} else {
array[p++] = right[j++];
}
}
while(j < right.length){
array[p++] = right[j++];
}
while(i < left.length){
array[p++] = left[i++];
}
}
//InPlaceMerge no extra array
private static void inPlaceMerge(int[] array, int p, int mid, int r) {
int secondArrayStart = mid+1;
int prevPlaced = mid+1;
int q = mid+1;
while(p < mid+1 && q <= r){
boolean swapped = false;
if(array[p] > array[q]) {
swap(array, p, q);
swapped = true;
}
if(q != secondArrayStart && array[p] > array[secondArrayStart]) {
swap(array, p, secondArrayStart);
swapped = true;
}
//Check swapped value is in right place of second sorted array
if(swapped && secondArrayStart+1 <= r && array[secondArrayStart+1] < array[secondArrayStart]) {
prevPlaced = placeInOrder(array, secondArrayStart, prevPlaced);
}
p++;
if(q < r) { //q+1 <= r) {
q++;
}
}
}
private static int placeInOrder(int[] array, int secondArrayStart, int prevPlaced) {
int i = secondArrayStart;
for(; i < array.length; i++) {
//Simply swap till the prevPlaced position
if(secondArrayStart < prevPlaced) {
swap(array, secondArrayStart, secondArrayStart+1);
secondArrayStart++;
continue;
}
if(array[i] < array[secondArrayStart]) {
swap(array, i, secondArrayStart);
secondArrayStart++;
} else if(i != secondArrayStart && array[i] > array[secondArrayStart]){
break;
}
}
return secondArrayStart;
}
private static void swap(int[] array, int m, int n){
int temp = array[m];
array[m] = array[n];
array[n] = temp;
}
}

Related

Find the number of players cannot win the game?

We are given n players, each player has 3 values assigned A, B and C.
A player i cannot win if there exists another player j with all 3 values A[j] > A[i], B[j] > B[i] and C[j] > C[i]. We are asked to find number of players cannot win.
I tried this problem using brute force, which is a linear search over players array. But it's showing TLE.
For each player i, I am traversing the complete array to find if there exists any other player j for which the above condition holds true.
Code :
int count_players_cannot_win(vector<vector<int>> values) {
int c = 0;
int n = values.size();
for(int i = 0; i < n; i++) {
for(int j = 0; j!= i && j < n; j++) {
if(values[i][0] < values[j][0] && values[i][1] < values[j][1] && values[i][2] < values[j][2]) {
c += 1;
break;
}
}
}
return c;
}
And this approach is O(n^2), as for every player we are traversing the complete array. Thus it is giving the TLE.
Sample testcase :
Sample Input
3(number of players)
A B C
1 4 2
4 3 2
2 5 3
Sample Output :
1
Explanation :
Only player1 cannot win as there exists player3 whose all 3 values(A, B and C) are greater than that of player1.
Contraints :
n(number of players) <= 10^5
What would be optimal way to solve this problem?
Solution:
int n;
const int N = 4e5 + 1;
int tree[N];
int get_max(int i, int l, int r, int L) { // range query of max in range v[B+1: n]
if(r < L || n <= l)
return numeric_limits<int>::min();
else if(L <= l)
return tree[i];
int m = (l + r)/2;
return max(get_max(2*i+1, l, m, L), get_max(2*i+2, m+1, r, L));
}
void update(int i, int l, int r, int on, int v) { // point update in tree[on]
if(r < on || on < l)
return;
else if(l == r) {
tree[i] = max(tree[i], v);
return;
}
int m = (l + r)/2;
update(2*i+1, l, m, on, v);
update(2*i+2, m + 1, r, on, v);
tree[i] = max(tree[2*i+1], tree[2*i+2]);
}
bool comp(vector<int> a, vector<int> b) {
return a[0] != b[0] ? a[0] > b[0] : a[1] < b[1];
}
int solve(vector<vector<int>> &v) {
n = v.size();
vector<int> b(n, 0); // reduce the scale of range from [0,10^9] to [0,10^5]
for(int i = 0; i < n; i++) {
b[i] = v[i][1];
}
for(int i = 0; i < n; i++) {
cin >> v[i][2];
}
// sort on 0th col in reverse order
sort(v.begin(), v.end(), comp);
sort(b.begin(), b.end());
int ans = 0;
for(int i = 0; i < n;) {
int j = i;
while(j < n && v[j][0] == v[i][0]) {
int B = v[j][1];
int pos = lower_bound(b.begin(), b.end(), B) - b.begin(); // position of B in b[]
int mx = get_max(0, 0, n - 1, pos + 1);
if(mx > v[j][2])
ans += 1;
j++;
}
while(i < j) {
int B = v[i][1];
int C = v[i][2];
int pos = lower_bound(b.begin(), b.end(), B) - b.begin(); // position of B in b[]
update(0, 0, n - 1, pos, C);
i++;
}
}
return ans;
}
This solution uses segment tree, and thus solves the problem in
time O(n*log(n)) and space O(n).
Approach is explained in the accepted answer by #Primusa.
First lets assume that our input comes in the form of a list of tuples T = [(A[0], B[0], C[0]), (A[1], B[1], C[1]) ... (A[N - 1], B[N - 1], C[N - 1])]
The first observation we can make is that we can sort on T[0] (in reverse order). Then for each tuple (a, b, c), to determine if it cannot win, we ask if we've already seen a tuple (d, e, f) such that e > b && f > c. We don't need to check the first element because we are given that d > a* since T is sorted in reverse.
Okay, so now how do we check this second criteria?
We can reframe it like so: out of all tuples (d, e, f), that we've already seen with e > b, what is the maximum value of f? If the max value is greater than c, then we know that this tuple cannot win.
To handle this part we can use a segment tree with max updates and max range queries. When we encounter a tuple (d, e, f), we can set tree[e] = max(tree[e], f). tree[i] will represent the third element with i being the second element.
To answer a query like "what is the maximum value of f such that e > b", we do max(tree[b+1...]), to get the largest third element over a range of possible second elements.
Since we are only doing suffix queries, you can get away with using a modified fenwick tree, but it is easier to explain with a segment tree.
This will give us an O(NlogN) solution, for sorting T and doing O(logN) work with our segment tree for every tuple.
*Note: this should actually be d >= a. However it is easier to explain the algorithm when we pretend everything is unique. The only modification you need to make to accommodate duplicate values of the first element is to process your queries and updates in buckets of tuples of the same value. This means that we will perform our check for all tuples with the same first element, and only then do we update tree[e] = max(tree[e], f) for all of those tuples we performed the check on. This ensures that no tuple with the same first value has updated the tree already when another tuple is querying the tree.

Counting inversions in a segment with updates

I'm trying to solve a problem which goes like this:
Problem
Given an array of integers "arr" of size "n", process two types of queries. There are "q" queries you need to answer.
Query type 1
input: l r
result: output number of inversions in [l, r]
Query type 2
input: x y
result: update the value at arr [x] to y
Inversion
For every index j < i, if arr [j] > arr [i], the pair (j, i) is one inversion.
Input
n = 5
q = 3
arr = {1, 4, 3, 5, 2}
queries:
type = 1, l = 1, r = 5
type = 2, x = 1, y = 4
type = 1, l = 1, r = 5
Output
4
6
Constraints
Time: 4 secs
1 <= n, q <= 100000
1 <= arr [i] <= 40
1 <= l, r, x <= n
1 <= y <= 40
I know how to solve a simpler version of this problem without updates, i.e. to simply count the number of inversions for each position using a segment tree or fenwick tree in O(N*log(N)). The only solution I have to this problem is O(q*N*log(N)) (I think) with segment tree other than the O(q*N2) trivial algorithm. This however does not fit within the time constraints of the problem. I would like to have hints towards a better algorithm to solve the problem in O(N*log(N)) (if it's possible) or maybe O(N*log2(N)).
I first came across this problem two days ago and have been spending a few hours here and there to try and solve it. However, I'm finding it non-trivial to do so and would like to have some help/hints regarding the same. Thanks for your time and patience.
Updates
Solution
With the suggestion, answer and help by Tanvir Wahid, I've implemented the source code for the problem in C++ and would like to share it here for anyone who might stumble across this problem and not have an intuitive idea on how to solve it. Thank you!
Let's build a segment tree with each node containing information about how many inversions exist and the frequency count of elements present in its segment of authority.
node {
integer inversion_count : 0
array [40] frequency : {0...0}
}
Building the segment tree and handling updates
For each leaf node, initialise inversion count to 0 and increase frequency of the represented element from the input array to 1. The frequency of the parent nodes can be calculated by summing up frequencies of the left and right childrens. The inversion count of parent nodes can be calculated by summing up the inversion counts of left and right children nodes added with the new inversions created upon merging the two segments of their authority which can be calculated using the frequencies of elements in each child. This calculation basically finds out the product of frequencies of bigger elements in the left child and frequencies of smaller elements in the right child.
parent.inversion_count = left.inversion_count + right.inversion_count
for i in [39, 0]
for j in [0, i)
parent.inversion_count += left.frequency [i] * right.frequency [j]
Updates are handled similarly.
Answering range queries on inversion counts
To answer the query for the number of inversions in the range [l, r], we calculate the inversions using the source code attached below.
Time Complexity: O(q*log(n))
Note
The source code attached does break some good programming habits. The sole purpose of the code is to "solve" the given problem and not to accomplish anything else.
Source Code
/**
Lost Arrow (Aryan V S)
Saturday 2020-10-10
**/
#include "bits/stdc++.h"
using namespace std;
struct node {
int64_t inv = 0;
vector <int> freq = vector <int> (40, 0);
void combine (const node& l, const node& r) {
inv = l.inv + r.inv;
for (int i = 39; i >= 0; --i) {
for (int j = 0; j < i; ++j) {
// frequency of bigger numbers in the left * frequency of smaller numbers on the right
inv += 1LL * l.freq [i] * r.freq [j];
}
freq [i] = l.freq [i] + r.freq [i];
}
}
};
void build (vector <node>& tree, vector <int>& a, int v, int tl, int tr) {
if (tl == tr) {
tree [v].inv = 0;
tree [v].freq [a [tl]] = 1;
}
else {
int tm = (tl + tr) / 2;
build(tree, a, 2 * v + 1, tl, tm);
build(tree, a, 2 * v + 2, tm + 1, tr);
tree [v].combine(tree [2 * v + 1], tree [2 * v + 2]);
}
}
void update (vector <node>& tree, int v, int tl, int tr, int pos, int val) {
if (tl == tr) {
tree [v].inv = 0;
tree [v].freq = vector <int> (40, 0);
tree [v].freq [val] = 1;
}
else {
int tm = (tl + tr) / 2;
if (pos <= tm)
update(tree, 2 * v + 1, tl, tm, pos, val);
else
update(tree, 2 * v + 2, tm + 1, tr, pos, val);
tree [v].combine(tree [2 * v + 1], tree [2 * v + 2]);
}
}
node inv_cnt (vector <node>& tree, int v, int tl, int tr, int l, int r) {
if (l > r)
return node();
if (tl == l && tr == r)
return tree [v];
int tm = (tl + tr) / 2;
node result;
result.combine(inv_cnt(tree, 2 * v + 1, tl, tm, l, min(r, tm)), inv_cnt(tree, 2 * v + 2, tm + 1, tr, max(l, tm + 1), r));
return result;
}
void solve () {
int n, q;
cin >> n >> q;
vector <int> a (n);
for (int i = 0; i < n; ++i) {
cin >> a [i];
--a [i];
}
vector <node> tree (4 * n);
build(tree, a, 0, 0, n - 1);
while (q--) {
int type, x, y;
cin >> type >> x >> y;
--x; --y;
if (type == 1) {
node result = inv_cnt(tree, 0, 0, n - 1, x, y);
cout << result.inv << '\n';
}
else if (type == 2) {
update(tree, 0, 0, n - 1, x, y);
}
else
assert(false);
}
}
int main () {
std::ios::sync_with_stdio(false);
std::cin.tie(nullptr);
std::cout.precision(10);
std::cout << std::fixed << std::boolalpha;
int t = 1;
// std::cin >> t;
while (t--)
solve();
return 0;
}
arr[i] can be at most 40. We can use this to our advantage. What we need is a segment tree. Each node will hold 41 values (A long long int which represents inversions for this range and a array of size 40 for count of each numbers. A struct will do). How do we merge two children of a node. We know inversions for left child and right child. Also know frequency of each numbers in both of them. Inversion of parent node will be summation of inversions of both children plus number of inversions between left and right child. We can easily find inversions between two children from frequency of numbers. Query can be done in similar way. Complexity O(40*qlog(n))

Update in Segment Tree

I am learning segment tree , i came across this question.
There are Array A and 2 type of operation
1. Find the Sum in Range L to R
2. Update the Element in Range L to R by Value X.
Update should be like this
A[L] = 1*X;
A[L+1] = 2*X;
A[L+2] = 3*X;
A[R] = (R-L+1)*X;
How should i handle the second type of query can anyone please give some algorithm to modify by segment tree , or there is a better solution
So, it is needed to update efficiently the interval [L,R] with to the corresponding values of the arithmetic progression with the step X, and to be able to find efficiently the sums over the different intervals.
In order to solve this problem efficiently - let's make use of the Segment Tree with Lazy Propagation.
The basic ideas are following:
The arithmetic progression can be defined by the first and last items and the amount of items
It is possible to obtain a new arithmetic progression by combination of the first and last items of two different arithmetic progressions (which have the same amount of items). The first and last items of the new arithmetic progression will be just a combination of the corresponding items of combined arithmetic progressions
Hence, we can associate with each node of the Segment Tree - the first and last values of the arithmetic progression, which spans over the given interval
During update, for all affected intervals, we can lazily propagate through the Segment Tree - the values of the first and last items, and update the aggregated sums on these intervals.
So, the node of the Segment Tree for given problem will have structure:
class Node {
int left; // Left boundary of the current SegmentTree node
int right; // Right boundary of the current SegmentTree node
int sum; // Sum on the interval [left,right]
int first; // First item of arithmetic progression inside given node
int last; // Last item of arithmetic progression
Node left_child;
Node right_child;
// Constructor
Node(int[] arr, int l, int r) { ... }
// Add arithmetic progression with step X on the interval [l,r]
// O(log(N))
void add(int l, int r, int X) { ... }
// Request the sum on the interval [l,r]
// O(log(N))
int query(int l, int r) { ... }
// Lazy Propagation
// O(1)
void propagate() { ... }
}
The specificity of the Segment Tree with Lazy Propagation is such, that every time, when the node of the tree is traversed - the Lazy Propagation routine (which has complexity O(1)) is executed for the given node. So, below is provided the illustration of the Lazy Propagation logic for some arbitrary node, which has children:
As you can see, during the Lazy Propagation the first and the last items of the arithmetic progressions of the child nodes are updated, also the sum inside the parent node is updated as well.
Implementation
Below provided the Java implementation of the described approach (with additional comments):
class Node {
int left; // Left boundary of the current SegmentTree node
int right; // Right boundary of the current SegmentTree node
int sum; // Sum on the interval
int first; // First item of arithmetic progression
int last; // Last item of arithmetic progression
Node left_child;
Node right_child;
/**
* Construction of a Segment Tree
* which spans over the interval [l,r]
*/
Node(int[] arr, int l, int r) {
left = l;
right = r;
if (l == r) { // Leaf
sum = arr[l];
} else { // Construct children
int m = (l + r) / 2;
left_child = new Node(arr, l, m);
right_child = new Node(arr, m + 1, r);
// Update accumulated sum
sum = left_child.sum + right_child.sum;
}
}
/**
* Lazily adds the values of the arithmetic progression
* with step X on the interval [l, r]
* O(log(N))
*/
void add(int l, int r, int X) {
// Lazy propagation
propagate();
if ((r < left) || (right < l)) {
// If updated interval doesn't overlap with current subtree
return;
} else if ((l <= left) && (right <= r)) {
// If updated interval fully covers the current subtree
// Update the first and last items of the arithmetic progression
int first_item_offset = (left - l) + 1;
int last_item_offset = (right - l) + 1;
first = X * first_item_offset;
last = X * last_item_offset;
// Lazy propagation
propagate();
} else {
// If updated interval partially overlaps with current subtree
left_child.add(l, r, X);
right_child.add(l, r, X);
// Update accumulated sum
sum = left_child.sum + right_child.sum;
}
}
/**
* Returns the sum on the interval [l, r]
* O(log(N))
*/
int query(int l, int r) {
// Lazy propagation
propagate();
if ((r < left) || (right < l)) {
// If requested interval doesn't overlap with current subtree
return 0;
} else if ((l <= left) && (right <= r)) {
// If requested interval fully covers the current subtree
return sum;
} else {
// If requested interval partially overlaps with current subtree
return left_child.query(l, r) + right_child.query(l, r);
}
}
/**
* Lazy propagation
* O(1)
*/
void propagate() {
// Update the accumulated value
// with the sum of Arithmetic Progression
int items_count = (right - left) + 1;
sum += ((first + last) * items_count) / 2;
if (right != left) { // Current node is not a leaf
// Calculate the step of the Arithmetic Progression of the current node
int step = (last - first) / (items_count - 1);
// Update the first and last items of the arithmetic progression
// inside the left and right subtrees
// Distribute the arithmetic progression between child nodes
// [a(1) to a(N)] -> [a(1) to a(N/2)] and [a(N/2+1) to a(N)]
int mid = (items_count - 1) / 2;
left_child.first += first;
left_child.last += first + (step * mid);
right_child.first += first + (step * (mid + 1));
right_child.last += last;
}
// Reset the arithmetic progression of the current node
first = 0;
last = 0;
}
}
The Segment Tree in provided solution is implemented explicitly - using objects and references, however it can be easily modified in order to make use of the arrays instead.
Testing
Below provided the randomized tests, which compare two implementations:
Processing queries by sequential increase of each item of the array with O(N) and calculating the sums on intervals with O(N)
Processing the same queries using Segment Tree with O(log(N)) complexity:
The Java implementation of the randomized tests:
public static void main(String[] args) {
// Initialize the random generator with predefined seed,
// in order to make the test reproducible
Random rnd = new Random(1);
int test_cases_num = 20;
int max_arr_size = 100;
int num_queries = 50;
int max_progression_step = 20;
for (int test = 0; test < test_cases_num; test++) {
// Create array of the random length
int[] arr = new int[rnd.nextInt(max_arr_size) + 1];
Node segmentTree = new Node(arr, 0, arr.length - 1);
for (int query = 0; query < num_queries; query++) {
if (rnd.nextDouble() < 0.5) {
// Update on interval [l,r]
int l = rnd.nextInt(arr.length);
int r = rnd.nextInt(arr.length - l) + l;
int X = rnd.nextInt(max_progression_step);
update_sequential(arr, l, r, X); // O(N)
segmentTree.add(l, r, X); // O(log(N))
}
else {
// Request sum on interval [l,r]
int l = rnd.nextInt(arr.length);
int r = rnd.nextInt(arr.length - l) + l;
int expected = query_sequential(arr, l, r); // O(N)
int actual = segmentTree.query(l, r); // O(log(N))
if (expected != actual) {
throw new RuntimeException("Results are different!");
}
}
}
}
System.out.println("All results are equal!");
}
static void update_sequential(int[] arr, int left, int right, int X) {
for (int i = left; i <= right; i++) {
arr[i] += X * ((i - left) + 1);
}
}
static int query_sequential(int[] arr, int left, int right) {
int sum = 0;
for (int i = left; i <= right; i++) {
sum += arr[i];
}
return sum;
}
Basically you need to make a tree and then make updates using lazy propagation, here is the implementation.
int tree[1 << 20], Base = 1 << 19;
int lazy[1 << 20];
void propagation(int v){ //standard propagation
tree[v * 2] += lazy[v];
tree[v * 2 + 1] += lazy[v];
lazy[v * 2] += lazy[v];
lazy[v * 2 + 1] += lazy[v];
lazy[v] == 0;
}
void update(int a, int b, int c, int v = 1, int p = 1, int k = Base){
if(p > b || k < a) return; //if outside range [a, b]
propagation(v);
if(p >= a && k <= b){ // if fully inside range [a, b]
tree[v] += c;
lazy[v] += c;
return;
}
update(a, b, c, v * 2, p, (p + k) / 2); //left child
update(a, b, c, v * 2 + 1, (p + k) / 2 + 1, k); //right child
tree[v] = tree[v * 2] + tree[v * 2 + 1]; //update current node
}
int query(int a, int b, int v = 1, int p = 1, int k = Base){
if(p > b || k < a) //if outside range [a, b]
return 0;
propagation(v);
if(p >= a && k <= b) // if fully inside range [a, b]
return tree[v];
int res = 0;
res += query(a, b, c, v * 2, p, (p + k) / 2); //left child
res += query(a, b, c, v * 2 + 1, (p + k) / 2 + 1, k); //right child
tree[v] = tree[v * 2] + tree[v * 2 + 1]; //update current node
return res;
}
update function oviously updates the tree so it adds to nodes on interval [a, b] (or [L, R])
update(L, R, value);
query function just gives you sum of elements in range
query(L, R);
The second operation can be regarded as adding a segment to the interval [L,R] with two endpoints (L,x),(R,(R-L+1)*x) and slope 1.
The most important thing to consider about segment tree with interval modifications is whether the lazy tags can be merged. If we regard the modification as adding segments, we can find that two segments can be easily merged - we only need to update the slope and the endpoints. For each interval, we only need to maintain the slope and the starting point of the segment for this interval. By using lazy tag technique, we can easily implement querying interval sums and doing interval modifications in O(nlogn) time complexity.

SUM exactly using K elements solution

Problem: On a given array with N numbers, find subset of size M (exactly M elements) that equal to SUM.
I am looking for a Dynamic Programming(DP) solution for this problem. Basically looking to understand the matrix filled approach. I wrote below program but didn't add memoization as i am still wondering how to do that.
#include <stdio.h>
#define SIZE(a) sizeof(a)/sizeof(a[0])
int binary[100];
int a[] = {1, 2, 5, 5, 100};
void show(int* p, int size) {
int j;
for (j = 0; j < size; j++)
if (p[j])
printf("%d\n", a[j]);
}
void subset_sum(int target, int i, int sum, int *a, int size, int K) {
if (sum == target && !K) {
show(binary, size);
} else if (sum < target && i < size) {
binary[i] = 1;
foo(target, i + 1, sum + a[i], a, size, K-1);
binary[i] = 0;
foo(target, i + 1, sum, a, size, K);
}
}
int main() {
int target = 10;
int K = 2;
subset_sum(target, 0, 0, a, SIZE(a), K);
}
Is the below recurrence solution makes sense?
Let DP[SUM][j][k] sum up to SUM with exactly K elements picked from 0 to j elements.
DP[i][j][k] = DP[i][j-1][k] || DP[i-a[j]][j-1][k-1] { input array a[0....j] }
Base cases are:
DP[0][0][0] = DP[0][j][0] = DP[0][0][k] = 1
DP[i][0][0] = DP[i][j][0] = 0
It means we can either consider this element ( DP[i-a[j]][j-1][k-1] ) or we don't consider the current element (DP[i][j-1][k]). If we consider current element, k is reduced by 1 which reduces the elements that needs to be considered and same goes when current element is not considered i.e. K is not reduced by 1.
Your solution looks right to me.
Right now, you're basically backtracking over all possibilities and printing each solution. If you only want one solution, you could add a flag that you set when one solution was found and check before continuing with recursive calls.
For memoization, you should first get rid of the binary array, after which you can do something like this:
int memo[NUM_ELEMENTS][MAX_SUM][MAX_K];
bool subset_sum(int target, int i, int sum, int *a, int size, int K) {
if (sum == target && !K) {
memo[i][sum][K] = true;
return memo[i][sum][K];
} else if (sum < target && i < size) {
if (memo[i][sum][K] != -1)
return memo[i][sum][K];
memo[i][sum][K] = foo(target, i + 1, sum + a[i], a, size, K-1) ||
foo(target, i + 1, sum, a, size, K);
return memo[i][sum][K]
}
return false;
}
Then, look at memo[_all indexes_][target][K]. If this is true, there exists at least one solution. You can store addition information to get you that next solution, or you can iterate with an i from found_index - 1 to 0 and check for which i you have memo[i][sum - a[i]][K - 1] == true. Then recurse on that, and so on. This will allow you to reconstruct the solution using just the memo array.
To my understanding, if only the feasibility of the input has to be checked, the problem can be solved with a two-dimensional state space
bool[][] IsFeasible = new bool[n][k]
where IsFeasible[i][j] is true if and only if there is a subset of the elements 1 to i which sum up to exactly j for every
1 <= i <= n
1 <= j <= k
and for this state space, the recurrence relation
IsFeasible[i][j] = IsFeasible[i-1][k-a[i]] || IsFeasible[i-1][k]
can be used, where the left-hand side of the or-operator || corresponds to selecting the i-th item and the right-hand side corresponds to to not selecting the i-th item. The actual choice of items could be obtained by backtracking or auxiliary information saved during evaluation.

A data structure problem

Given a sequence of integers, there are a number of queries.
Each query has a range [l, r], and you are to find the median of the given range [l, r]
The number of queries can be as large as 100,000
The length of the sequence can be as large as 100,000
I wonder if there is any data structure can support such query
My solution:
I consult my partner today and he tells to use partition tree.
We can build a partition tree in nlog(n) time and answer each query in log(n) time
The partition tree actually is the process of merge sort, but for each node in the tree, it saves the number of integers that go to the left subtree. Thus, we can use this information to deal with the query.
here is my code:
This program is to find the x in a given interval [l, r], that minimize the following equation.
alt text http://acm.tju.edu.cn/toj/3556_01.jpg
Explanation:
seq saves the sequence
pos saves the position after sort
ind saves the index
cntL saves the number of integers that go to the left tree in a given range
#include <cstdio>
#include <cstring>
#include <algorithm>
using namespace std;
#define N 100008
typedef long long LL;
int n, m, seq[N], ind[N], pos[N], next[N];
int cntL[20][N];
LL sum[20][N], sumL, subSum[N];
void build(int l, int r, int head, int dep)
{
if (l == r)
{
cntL[dep][l] = cntL[dep][l-1];
sum[dep][l] = sum[dep][l-1];
return ;
}
int mid = (l+r)>>1;
int hl = 0, hr = 0, tl = 0, tr = 0;
for (int i = head, j = l; i != -1; i = next[i], j++)
{
cntL[dep][j] = cntL[dep][j-1];
sum[dep][j] = sum[dep][j-1];
if (pos[i] <= mid)
{
next[tl] = i;
tl = i;
if (hl == 0) hl = i;
cntL[dep][j]++;
sum[dep][j] += seq[i];
}
else
{
next[tr] = i;
tr = i;
if (hr == 0) hr = i;
}
}
next[tl] = -1;
next[tr] = -1;
build(l, mid, hl, dep+1);
build(mid+1, r, hr, dep+1);
}
int query(int left, int right, int ql, int qr, int kth, int dep)
{
if (left == right)
{
return ind[left];
}
int mid = (left+right)>>1;
if (cntL[dep][qr] - cntL[dep][ql-1] >= kth)
{
return query(left, mid, left+cntL[dep][ql-1]-cntL[dep][left-1], left+cntL[dep][qr]-cntL[dep][left-1]-1, kth, dep+1);
}
else
{
sumL += sum[dep][qr]-sum[dep][ql-1];
return query(mid+1, right, mid+1+ql-left-(cntL[dep][ql-1]-cntL[dep][left-1]), mid+qr+1-left-(cntL[dep][qr]-cntL[dep][left-1]), \
kth-(cntL[dep][qr]-cntL[dep][ql-1]), dep+1);
}
}
inline int cmp(int x, int y)
{
return seq[x] < seq[y];
}
int main()
{
int ca, t, i, j, middle, ql, qr, id, tot;
LL ans;
scanf("%d", &ca);
for (t = 1; t <= ca; t++)
{
scanf("%d", &n);
subSum[0] = 0;
for (i = 1; i <= n; i++)
{
scanf("%d", seq+i);
ind[i] = i;
subSum[i] = subSum[i-1]+seq[i];
}
sort(ind+1, ind+1+n, cmp);
for (i = 1; i <= n; i++)
{
pos[ind[i]] = i;
next[i] = i+1;
}
next[n] = -1;
build(1, n, 1, 0);
printf("Case #%d:\n", t);
scanf("%d", &m);
while (m--)
{
scanf("%d%d", &ql, &qr);
ql++, qr++;
middle = (qr-ql+2)/2;
sumL= 0;
id = query(1, n, ql, qr, middle, 0);
ans = subSum[qr]-subSum[ql-1]-sumL;
tot = qr-ql+1;
ans = ans-(tot-middle+1)*1ll*seq[id]+(middle-1)*1ll*seq[id]-sumL;
printf("%lld\n", ans);
}
puts("");
}
}
This is called the Range Median Query problem. The following paper might be relevant: Towards Optimal Range Medians. (Free link, thanks to belisarius).
From the abstract of the paper:
We consider the following problem:
Given an unsorted array of n elements,
and a sequence of intervals in the
array, compute the median in each of
the subarrays defined by the
intervals. We describe a simple
algorithm which needs O(nlogk+klogn)
time to answer k such median queries.
This improves previous algorithms by a
logarithmic factor and matches a
comparison lower bound for k=O(n). The
space complexity of our simple
algorithm is O(nlogn) in the pointer
machine model, and O(n) in the RAM
model. In the latter model, a more
involved O(n) space data structure can
be constructed in O(nlogn) time where
the time per query is reduced to
O(logn/loglogn). We also give
efficient dynamic variants of both
data structures, achieving O(log^2n)
query time using O(nlogn) space in the
comparison model and
O((logn/loglogn)^2) query time using
O(nlogn/loglogn) space in the RAM
model, and show that in the cell-probe
model, any data structure which
supports updates in O(log^O(1)n) time
must have Ω(logn/loglogn) query time.
Our approach naturally generalizes to
higher-dimensional range median
problems, where element positions and
query ranges are multidimensional—it
reduces a range median query to a
logarithmic number of range counting
queries.
Of course, you could preprocess the whole array in O(n^3) time (or perhaps even O(n^2logn) time) and O(n^2) space to be able to return the median in O(1) time.
Additional constraints might help simplify the solution. For instance, do we know that r-l will lesser than a known constant? etc...

Resources