In the merge algorithm of merge sort, I don't understand we have to use auxiliary arrays L, R? Why can't we just keep 2 pointers corresponding which element we're comparing in the 2 subarrays L and R so that the merge-sort algorithm remains inplace?
say you split your array L uses the first half of the original array, and R uses the second half.
then say that durign merge the first few elements from R are smaller than the smallest in L. If you want to put them in the correct place for the merge result, you will have to overwrite elements from L that have not been processed during the merge step yet.
of course you can make a diferent split. But you can always construct such an (then slightly different) example.
My first post here. Be gentle!
Here's my solution for a simple and easy-to-understand stable in-place merge-sort. I wrote this yesterday. I'm not sure it hasn't been done before, but I've not seen it about, so maybe?
The one drawback to the following in-place merge algorithm can degenerate into O(n²) under certain conditions, but is typically O(n.log₂n) in practice. This degeneracy can be mitigated with certain changes, but I wanted to keep the base algorithm pure in the code sample so it can be easily understood.
Coupled with the O(log₂n) time complexity for the driving merge_sort() function, this presents us with a typical time complexity of O(n.(log₂n)²) overall, and O(n².log₂n) worst case, which is not fantastic, but again with some tweaks, it can be made to almost always run in O(n.(log₂n)²) time, and with its good CPU cache locality it is decent even for n values up to 1M, but it is always going to be slower than quicksort.
// Stable Merge In Place Sort
// The following code is written to illustrate the base algorithm. A good
// number of optimizations can be applied to boost its overall speed
// For all its simplicity, it does still perform somewhat decently.
// Average case time complexity appears to be: O(n.(log₂n)²)
#include <stddef.h>
#include <stdio.h>
#define swap(x, y) (t=(x), (x)=(y), (y)=t)
// Both sorted sub-arrays must be adjacent in 'a'
// Assumes that both 'an' and 'bn' are always non-zero
// 'an' is the length of the first sorted section in 'a', referred to as A
// 'bn' is the length of the second sorted section in 'a', referred to as B
static void
merge_inplace(int A[], size_t an, size_t bn)
int t, *B = &A[an];
size_t pa, pb; // Swap partition pointers within A and B
// Find the portion to swap. We're looking for how much from the
// start of B can swap with the end of A, such that every element
// in A is less than or equal to any element in B. This is quite
// simple when both sub-arrays come at us pre-sorted
for(pa = an, pb = 0; pa>0 && pb<bn && B[pb] < A[pa-1]; pa--, pb++);
// Now swap last part of A with first part of B according to the
// indicies we found
for (size_t index=pa; index < an; index++)
swap(A[index], B[index-pa]);
// Now merge the two sub-array pairings. We need to check that either array
// didn't wholly swap out the other and cause the remaining portion to be zero
if (pa>0 && (an-pa)>0)
merge_inplace(A, pa, an-pa);
if (pb>0 && (bn-pb)>0)
merge_inplace(B, pb, bn-pb);
} // merge_inplace
// Implements a recursive merge-sort algorithm with an optional
// insertion sort for when the splits get too small. 'n' must
// ALWAYS be 2 or more. It enforces this when calling itself
static void
merge_sort(int a[], size_t n)
size_t m = n/2;
// Sort first and second halves only if the target 'n' will be > 1
if (m > 1)
merge_sort(a, m);
if ((n-m)>1)
merge_sort(a+m, n-m);
// Now merge the two sorted sub-arrays together. We know that since
// n > 1, then both m and n-m MUST be non-zero, and so we will never
// violate the condition of not passing in zero length sub-arrays
merge_inplace(a, m, n-m);
} // merge_sort
// Print an array */
static void
print_array(int a[], size_t size)
if (size > 0) {
printf("%d", a[0]);
for (size_t i = 1; i < size; i++)
printf(" %d", a[i]);
} // print_array
// Test driver
int a[] = { 17, 3, 16, 5, 14, 8, 10, 7, 15, 1, 13, 4, 9, 12, 11, 6, 2 };
size_t n = sizeof(a) / sizeof(a[0]);
merge_sort(a, n);
print_array(a, n);
return 0;
} // main
If you ever tried to write a merge sort in place, you will soon find out why you can't wen you are merging the 2 sub arraies - you basically need to read from and write to the same range of the array, and it will overwrite each other. Hence we need any auxiliary array:
vector<int> merge_sort(vector<int>& vs, int l, int r, vector<int>& temp)
if(l==r) return vs; // recursion must have an end condition
int m = (l+r)/2;
merge_sort(vs, l, m, temp);
merge_sort(vs, m+1, r, temp);
int il = l, ir=m+1, i=l;
while(il <= m && ir <= r)
if(vs[il] <= vs[ir])
temp[i++] = vs[il++];
temp[i++] = vs[ir++];
// copy left over items(only one of below will apply
while(il <= m) temp[i++] = vs[il++];
while(ir <= r) temp[i++] = vs[ir++];
for(i=l; i<=r; ++i) vs[i] = temp[i];
return vs;
Given N jobs where every job is represented by following three elements of it.
1) Start Time
2) Finish Time.
3) Profit or Value Associated.
Find the maximum profit subset of jobs such that no two jobs in the subset overlap.
I know a dynamic programming solution which has a complexity of O(N^2) (close to LIS where we have to just check the previous elements with which we can merge the current interval and take the interval whose merging gives maximum till the i th element ).This solution can be further improved to O(N*log N ) using Binary search and simple sorting!
But my friend was telling me that it can be even solved by using Segment Trees and binary search! I have no clue as to where I am going to use Segment Tree and how .??
Can you help?
On request,sorry not commented
What I am doing is sorting on the basis of the starting index, storing the maximum obtainable value till i at DP[i] by merging previous intervals and their maximum obtainable value !
void solve()
int n,i,j,k,high;
pair < pair < int ,int>, int > arr[n+1];// first pair represents l,r and int alone shows cost
int dp[n+1];
std::sort(arr,arr+n); // by default sorting on the basis of starting index
for(j=0;j<i;j++)//checking all previous mergable intervals //Note we will use DP[] of the mergable interval due to optimal substructure
high=std::max(high , dp[j]+arr[i].second);
int main()
{solve();return 0;}
My working code finally took me 3 hours to debug it though! Morover this code is slower than the binary search and sorting one due to a larger constant and bad implementation :P (just for reference)
#define lc(idx) (2*idx+1)
#define rc(idx) (2*idx+2)
#define mid(l,r) ((l+r)/2)
using namespace std;
int Tree[4*2*10000-1];
void update(int L,int R,int qe,int idx,int value)
if(qe<= mid(L,R))
return ;
int Get(int L,int R,int idx,int q)
if(q<L )
return 0;
return Tree[idx];
return max(Get(L,mid(L,R),lc(idx),q),Get(mid(L,R)+1,R,rc(idx),q));
bool cmp(pair < pair < int , int > , int > A,pair < pair < int , int > , int > B)
return A.first.second< B.first.second;
int main()
int N,i;
pair < pair < int , int > , int > P[N];
vector < int > V;
int &l=P[i].first.first,&r=P[i].first.second;
int ans=0;
int aux=Get(0,2*N-1,0,P[i].first.first)+P[i].second;
return 0;
for(j=0;j<i;j++)//checking all previous mergable intervals //Note we will use DP[] of the mergable interval due to optimal substructure
high=std::max(high, dp[j]+arr[i].second);
This can be done in O(log n) with a segment tree.
First of all, let's rewrite it a bit. The max you are taking is a bit complicated, because it takes the maximum of a sum involving both i and j. But i is constant in this part, so let's take it out.
for(j=1;j<i;j++)//checking all previous mergable intervals //Note we will use DP[] of the mergable interval due to optimal substructure
high=std::max(high, dp[j]);
dp[i]=high + arr[i].second;
Great, now we have reduced the problem to determining the maximum in [0, i - 1] out of the values that satisfy your if condition.
If we didn't have the if, it would be a simple application of segment trees.
Now there are two choices.
1. Deal with O(log V) query time and O(V) memory for the segment tree
Where V is the maximum size of an interval's endpoint.
You can build a segment tree to which you insert interval start points as you move your i. Then you query over the range of values. Something like this, where the segment tree is initialized to -infinity and of size O(V).
Update(node, index, value):
if node.associated_interval == [index, index]:
node.max = value
if index in node.left.associated_interval:
Update(node.left, index, value)
Update(node.right, index, value)
node.max = max(node.left.max, node.right.max)
Query(node, left, right):
if [left, right] does not intersect node.associated_interval:
return -infinity
if node.associated_interval included in [left, right]:
return node.max
return max(Query(node.left, left, right),
Query(node.right, left, right))
high=Query(tree, 0, arr[i].first.first)
dp[i]=high + arr[i].second;
Update(tree, arr[i].first.first, dp[i])
2. Reducing to O(log n) query time and O(n) memory for the segment tree
Since the number of intervals might be significantly less than their length, it's reasonable to think that we might be able to encode them better somehow, so that their length is also O(n). Indeed, we can.
This involves normalizing your intervals in the range [1, 2*n]. Consider the following intervals
8 100
3 50
90 92
Let's plot them on a line. They'd look like this:
3 8 50 90 92 100
Now replace each of them with their index:
1 2 3 4 5 6
3 8 50 90 92 100
And write your new intervals:
2 6
1 3
4 5
Note that they retain the properties of your initial intervals: the same ones overlap, the same ones are included in each other etc.
This can be done with a sort. You can now apply the same segment tree algorithm, except you declare the segment tree for the size 2*n.
This is more of an algorithms question than a programming one. I'm wondering if the prefix sum (or any) parallel algorithm can be modified to accomplish the following. I'd like to generate a result from two input lists on a GPU in less than O(N) time.
The rule is: Carry forth the first number from data until the same index in keys contains a lesser value.
Whenever I try mapping it to a parallel scan, it doesn't work because I can't be sure which values of data to propagate in upsweep since it's not possible to know which prior data might have carried far enough to compare against the current key. This problem reminds me of a ripple carry where we need to consider the current index AND all past indices.
Again, don't need code for a parallel scan (though that would be nice), more looking to understand how it can be done or why it can't be done.
int data[N] = {5, 6, 5, 5, 3, 1, 5, 5};
int keys[N] = {5, 6, 5, 5, 4, 2, 5, 5};
int result[N];
serial_scan(N, keys, data, result);
// Print result. should be {5, 5, 5, 5, 3, 1, 1, 1, }
code to do the scan in serial is below:
void serial_scan(int N, int *k, int *d, int *r)
r[0] = d[0];
for(int i=1; i<N; i++)
if (k[i] >= r[i-1]) {
r[i] = r[i-1];
} else if (k[i] >= d[i]) {
r[i] = d[i];
} else {
r[i] = 0;
The general technique for a parallel scan can be found here, described in the functional language Standard ML. This can be done for any associative operator, and I think yours fits the bill.
One intuition pump is that you can calculate the sum of an array in O(log(n)) span (running time with infinite processors) by recursively calculating the sum of two halves of the array and adding them together. In calculating the scan you just need know the sum of the array before the current point.
We could calculate the scan of an array doing two halves in parallel: calculate the sum of the 1st half using the above technique. Then calculating the scan for the two halves sequentially; the 1st half starts at 0 and the 2nd half starts at the sum you calculated before. The full algorithm is a little trickier, but uses the same idea.
Here's some pseudo-code for doing a parallel scan in a different language (for the specific case of ints and addition, but the logic is identical for any associative operator):
//assume input.length is a power of 2
int[] scanadd( int[] input) {
if (input.length == 1)
return input
else {
//calculate a new collapsed sequence which is the sum of sequential even/odd pairs
//assume this for loop is done in parallel
int[] collapsed = new int[input.length/2]
for (i <- 0 until collapsed.length)
collapsed[i] = input[2 * i] + input[2*i+1]
//recursively scan collapsed values
int[] scancollapse = scanadd(collapse)
//now we can use the scan of the collapsed seq to calculate the full sequence
//also assume this for loop is in parallel
int[] output = int[input.length]
for (i <- 0 until input.length)
//if an index is even then we can just look into the collapsed sequence and get the value
// otherwise we can look just before it and add the value at the current index
if (i %2 ==0)
output[i] = scancollapse[i/2]
output[i] = scancollapse[(i-1)/2] + input[i]
return output
Original Problem:
I have 3 boxes each containing 200 coins, given that there is only one person who has made calls from all of the three boxes and thus there is one coin in each box which has same fingerprints and rest of all coins have different fingerprints. You have to find the coin which contains same fingerprint from all of the 3 boxes. So that we can find the fingerprint of the person who has made call from all of the 3 boxes.
Converted problem:
You have 3 arrays containing 200 integers each. Given that there is one and only one common element in these 3 arrays. Find the common element.
Please consider solving this for other than trivial O(1) space and O(n^3) time.
Some improvement in Pelkonen's answer:
From converted problem in OP:
"Given that there is one and only one common element in these 3 arrays."
We need to sort only 2 arrays and find common element.
If you sort all the arrays first O(n log n) then it will be pretty easy to find the common element in less than O(n^3) time. You can for example use binary search after sorting them.
Let N = 200, k = 3,
Create a hash table H with capacity ≥ Nk.
For each element X in array 1, set H[X] to 1.
For each element Y in array 2, if Y is in H and H[Y] == 1, set H[Y] = 2.
For each element Z in array 3, if Z is in H and H[Z] == 2, return Z.
throw new InvalidDataGivenByInterviewerException();
O(Nk) time, O(Nk) space complexity.
Use a hash table for each integer and encode the entries such that you know which array it's coming from - then check for the slot which has entries from all 3 arrays. O(n)
Use a hashtable mapping objects to frequency counts. Iterate through all three lists, incrementing occurrence counts in the hashtable, until you encounter one with an occurrence count of 3. This is O(n), since no sorting is required. Example in Python:
def find_duplicates(*lists):
num_lists = len(lists)
counts = {}
for l in lists:
for i in l:
counts[i] = counts.get(i, 0) + 1
if counts[i] == num_lists:
return i
Or an equivalent, using sets:
def find_duplicates(*lists):
intersection = set(lists[0])
for l in lists[1:]:
intersection = intersection.intersect(set(l))
return intersection.pop()
O(N) solution: use a hash table. H[i] = list of all integers in the three arrays that map to i.
For all H[i] > 1 check if three of its values are the same. If yes, you have your solution. You can do this check with the naive solution even, it should still be very fast, or you can sort those H[i] and then it becomes trivial.
If your numbers are relatively small, you can use H[i] = k if i appears k times in the three arrays, then the solution is the i for which H[i] = 3. If your numbers are huge, use a hash table though.
You can extend this to work even if you can have elements that can be common to only two arrays and also if you can have elements repeating elements in one of the arrays. It just becomes a bit more complicated, but you should be able to figure it out on your own.
If you want the fastest* answer:
Sort one array--time is N log N.
For each element in the second array, search the first. If you find it, add 1 to a companion array; otherwise add 0--time is N log N, using N space.
For each non-zero count, copy the corresponding entry into the temporary array, compacting it so it's still sorted--time is N.
For each element in the third array, search the temporary array; when you find a hit, stop. Time is less than N log N.
Here's code in Scala that illustrates this:
import java.util.Arrays
val a = Array(1,5,2,3,14,1,7)
val b = Array(3,9,14,4,2,2,4)
val c = Array(1,9,11,6,8,3,1)
val count = new Array[Int](a.length)
for (i <- 0 until b.length) {
val j =Arrays.binarySearch(a,b(i))
if (j >= 0) count(j) += 1
var n = 0
for (i <- 0 until count.length) if (count(i)>0) { count(n) = a(i); n+= 1 }
for (i <- 0 until c.length) {
if (Arrays.binarySearch(count,0,n,c(i))>=0) println(c(i))
With slightly more complexity, you can either use no extra space at the cost of being even more destructive of your original arrays, or you can avoid touching your original arrays at all at the cost of another N space.
Edit: * as the comments have pointed out, hash tables are faster for non-perverse inputs. This is "fastest worst case". The worst case may not be so unlikely unless you use a really good hashing algorithm, which may well eat up more time than your sort. For example, if you multiply all your values by 2^16, the trivial hashing (i.e. just use the bitmasked integer as an index) will collide every time on lists shorter than 64k....
//Begineers Code using Binary Search that's pretty Easy
// bool BS(int arr[],int low,int high,int target)
// {
// if(low>high)
// return false;
// int mid=low+(high-low)/2;
// if(target==arr[mid])
// return 1;
// else if(target<arr[mid])
// BS(arr,low,mid-1,target);
// else
// BS(arr,mid+1,high,target);
// }
// vector <int> commonElements (int A[], int B[], int C[], int n1, int n2, int n3)
// {
// vector<int> ans;
// for(int i=0;i<n2;i++)
// {
// if(i>0)
// {
// if(B[i-1]==B[i])
// continue;
// }
// //The above if block is to remove duplicates
// //In the below code we are searching an element form array B in both the arrays A and B;
// if(BS(A,0,n1-1,B[i]) && BS(C,0,n3-1,B[i]))
// {
// ans.push_back(B[i]);
// }
// }
// return ans;
// }
Say I have y distinct values and I want to select x of them at random. What's an efficient algorithm for doing this? I could just call rand() x times, but the performance would be poor if x, y were large.
Note that combinations are needed here: each value should have the same probability to be selected but their order in the result is not important. Sure, any algorithm generating permutations would qualify, but I wonder if it's possible to do this more efficiently without the random order requirement.
How do you efficiently generate a list of K non-repeating integers between 0 and an upper bound N covers this case for permutations.
Robert Floyd invented a sampling algorithm for just such situations. It's generally superior to shuffling then grabbing the first x elements since it doesn't require O(y) storage. As originally written it assumes values from 1..N, but it's trivial to produce 0..N and/or use non-contiguous values by simply treating the values it produces as subscripts into a vector/array/whatever.
In pseuocode, the algorithm runs like this (stealing from Jon Bentley's Programming Pearls column "A sample of Brilliance").
initialize set S to empty
for J := N-M + 1 to N do
T := RandInt(1, J)
if T is not in S then
insert T in S
insert J in S
That last bit (inserting J if T is already in S) is the tricky part. The bottom line is that it assures the correct mathematical probability of inserting J so that it produces unbiased results.
It's O(x)1 and O(1) with regard to y, O(x) storage.
Note that, in accordance with the combinations tag in the question, the algorithm only guarantees equal probability of each element occuring in the result, not of their relative order in it.
1O(x2) in the worst case for the hash map involved which can be neglected since it's a virtually nonexistent pathological case where all the values have the same hash
Assuming that you want the order to be random too (or don't mind it being random), I would just use a truncated Fisher-Yates shuffle. Start the shuffle algorithm, but stop once you have selected the first x values, instead of "randomly selecting" all y of them.
Fisher-Yates works as follows:
select an element at random, and swap it with the element at the end of the array.
Recurse (or more likely iterate) on the remainder of the array, excluding the last element.
Steps after the first do not modify the last element of the array. Steps after the first two don't affect the last two elements. Steps after the first x don't affect the last x elements. So at that point you can stop - the top of the array contains uniformly randomly selected data. The bottom of the array contains somewhat randomized elements, but the permutation you get of them is not uniformly distributed.
Of course this means you've trashed the input array - if this means you'd need to take a copy of it before starting, and x is small compared with y, then copying the whole array is not very efficient. Do note though that if all you're going to use it for in future is further selections, then the fact that it's in somewhat-random order doesn't matter, you can just use it again. If you're doing the selection multiple times, therefore, you may be able to do only one copy at the start, and amortise the cost.
If you really only need to generate combinations - where the order of elements does not matter - you may use combinadics as they are implemented e.g. here by James McCaffrey.
Contrast this with k-permutations, where the order of elements does matter.
In the first case (1,2,3), (1,3,2), (2,1,3), (2,3,1), (3,1,2), (3,2,1) are considered the same - in the latter, they are considered distinct, though they contain the same elements.
In case you need combinations, you may really only need to generate one random number (albeit it can be a bit large) - that can be used directly to find the m th combination.
Since this random number represents the index of a particular combination, it follows that your random number should be between 0 and C(n,k).
Calculating combinadics might take some time as well.
It might just not worth the trouble - besides Jerry's and Federico's answer is certainly simpler than implementing combinadics.
However if you really only need a combination and you are bugged about generating the exact number of random bits that are needed and none more... ;-)
While it is not clear whether you want combinations or k-permutations, here is a C# code for the latter (yes, we could generate only a complement if x > y/2, but then we would have been left with a combination that must be shuffled to get a real k-permutation):
static class TakeHelper
public static IEnumerable<T> TakeRandom<T>(
this IEnumerable<T> source, Random rng, int count)
T[] items = source.ToArray();
count = count < items.Length ? count : items.Length;
for (int i = items.Length - 1 ; count-- > 0; i--)
int p = rng.Next(i + 1);
yield return items[p];
items[p] = items[i];
class Program
static void Main(string[] args)
Random rnd = new Random(Environment.TickCount);
int[] numbers = new int[] { 1, 2, 3, 4, 5, 6, 7 };
foreach (int number in numbers.TakeRandom(rnd, 3))
Another, more elaborate implementation that generates k-permutations, that I had lying around and I believe is in a way an improvement over existing algorithms if you only need to iterate over the results. While it also needs to generate x random numbers, it only uses O(min(y/2, x)) memory in the process:
/// <summary>
/// Generates unique random numbers
/// <remarks>
/// Worst case memory usage is O(min((emax-imin)/2, num))
/// </remarks>
/// </summary>
/// <param name="random">Random source</param>
/// <param name="imin">Inclusive lower bound</param>
/// <param name="emax">Exclusive upper bound</param>
/// <param name="num">Number of integers to generate</param>
/// <returns>Sequence of unique random numbers</returns>
public static IEnumerable<int> UniqueRandoms(
Random random, int imin, int emax, int num)
int dictsize = num;
long half = (emax - (long)imin + 1) / 2;
if (half < dictsize)
dictsize = (int)half;
Dictionary<int, int> trans = new Dictionary<int, int>(dictsize);
for (int i = 0; i < num; i++)
int current = imin + i;
int r = random.Next(current, emax);
int right;
if (!trans.TryGetValue(r, out right))
right = r;
int left;
if (trans.TryGetValue(current, out left))
left = current;
if (r > current)
trans[r] = left;
yield return right;
The general idea is to do a Fisher-Yates shuffle and memorize the transpositions in the permutation.
It was not published anywhere nor has it received any peer-review whatsoever. I believe it is a curiosity rather than having some practical value. Nonetheless I am very open to criticism and would generally like to know if you find anything wrong with it - please consider this (and adding a comment before downvoting).
A little suggestion: if x >> y/2, it's probably better to select at random y - x elements, then choose the complementary set.
The trick is to use a variation of shuffle or in other words a partial shuffle.
function random_pick( a, n )
N = len(a);
n = min(n, N);
picked = array_fill(0, n, 0); backup = array_fill(0, n, 0);
// partially shuffle the array, and generate unbiased selection simultaneously
// this is a variation on fisher-yates-knuth shuffle
for (i=0; i<n; i++) // O(n) times
selected = rand( 0, --N ); // unbiased sampling N * N-1 * N-2 * .. * N-n+1
value = a[ selected ];
a[ selected ] = a[ N ];
a[ N ] = value;
backup[ i ] = selected;
picked[ i ] = value;
// restore partially shuffled input array from backup
// optional step, if needed it can be ignored
for (i=n-1; i>=0; i--) // O(n) times
selected = backup[ i ];
value = a[ N ];
a[ N ] = a[ selected ];
a[ selected ] = value;
return picked;
NOTE the algorithm is strictly O(n) in both time and space, produces unbiased selections (it is a partial unbiased shuffling) and non-destructive on the input array (as a partial shuffle would be) but this is optional
adapted from here
another approach using only a single call to PRNG (pseudo-random number generator) in [0,1] by IVAN STOJMENOVIC, "ON RANDOM AND ADAPTIVE PARALLEL GENERATION OF COMBINATORIAL OBJECTS" (section 3), of O(N) (worst-case) complexity
Here is a simple way to do it which is only inefficient if Y is much larger than X.
void randomly_select_subset(
int X, int Y,
const int * inputs, int X, int * outputs
) {
int i, r;
for( i = 0; i < X; ++i ) outputs[i] = inputs[i];
for( i = X; i < Y; ++i ) {
r = rand_inclusive( 0, i+1 );
if( r < i ) outputs[r] = inputs[i];
Basically, copy the first X of your distinct values to your output array, and then for each remaining value, randomly decide whether or not to include that value.
The random number is further used to choose an element of our (mutable) output array to replace.
If, for example, you have 2^64 distinct values, you can use a symmetric key algorithm (with a 64 bits block) to quickly reshuffle all combinations. (for example Blowfish).
for(i=0; i<x; i++)
e[i] = encrypt(key, i)
This is not random in the pure sense but can be useful for your purpose.
If you want to work with arbitrary # of distinct values following cryptographic techniques you can but it's more complex.
If I have a size N array of objects, and I have an array of unique numbers in the range 1...N, is there any algorithm to rearrange the object array in-place in the order specified by the list of numbers, and yet do this in O(N) time?
Context: I am doing a quick-sort-ish algorithm on objects that are fairly large in size, so it would be faster to do the swaps on indices than on the objects themselves, and only move the objects in one final pass. I'd just like to know if I could do this last pass without allocating memory for a separate array.
Edit: I am not asking how to do a sort in O(N) time, but rather how to do the post-sort rearranging in O(N) time with O(1) space. Sorry for not making this clear.
I think this should do:
static <T> void arrange(T[] data, int[] p) {
boolean[] done = new boolean[p.length];
for (int i = 0; i < p.length; i++) {
if (!done[i]) {
T t = data[i];
for (int j = i;;) {
done[j] = true;
if (p[j] != i) {
data[j] = data[p[j]];
j = p[j];
} else {
data[j] = t;
Note: This is Java. If you do this in a language without garbage collection, be sure to delete done.
If you care about space, you can use a BitSet for done. I assume you can afford an additional bit per element because you seem willing to work with a permutation array, which is several times that size.
This algorithm copies instances of T n + k times, where k is the number of cycles in the permutation. You can reduce this to the optimal number of copies by skipping those i where p[i] = i.
The approach is to follow the "permutation cycles" of the permutation, rather than indexing the array left-to-right. But since you do have to begin somewhere, everytime a new permutation cycle is needed, the search for unpermuted elements is left-to-right:
// Pseudo-code
N : integer, N > 0 // N is the number of elements
swaps : integer [0..N]
data[N] : array of object
permute[N] : array of integer [-1..N] denoting permutation (used element is -1)
next_scan_start : integer;
next_scan_start = 0;
while (swaps < N )
// Search for the next index that is not-yet-permtued.
for (idx_cycle_search = next_scan_start;
idx_cycle_search < N;
++ idx_cycle_search)
if (permute[idx_cycle_search] >= 0)
next_scan_start = idx_cycle_search + 1;
// This is a provable invariant. In short, number of non-negative
// elements in permute[] equals (N - swaps)
assert( idx_cycle_search < N );
// Completely permute one permutation cycle, 'following the
// permutation cycle's trail' This is O(N)
while (permute[idx_cycle_search] >= 0)
swap( data[idx_cycle_search], data[permute[idx_cycle_search] )
swaps ++;
old_idx = idx_cycle_search;
idx_cycle_search = permute[idx_cycle_search];
permute[old_idx] = -1;
// Also '= -idx_cycle_search -1' could be used rather than '-1'
// and would allow reversal of these changes to permute[] array
Do you mean that you have an array of objects O[1..N] and then you have an array P[1..N] that contains a permutation of numbers 1..N and in the end you want to get an array O1 of objects such that O1[k] = O[P[k]] for all k=1..N ?
As an example, if your objects are letters A,B,C...,Y,Z and your array P is [26,25,24,..,2,1] is your desired output Z,Y,...C,B,A ?
If yes, I believe you can do it in linear time using only O(1) additional memory. Reversing elements of an array is a special case of this scenario. In general, I think you would need to consider decomposition of your permutation P into cycles and then use it to move around the elements of your original array O[].
If that's what you are looking for, I can elaborate more.
EDIT: Others already presented excellent solutions while I was sleeping, so no need to repeat it here. ^_^
EDIT: My O(1) additional space is indeed not entirely correct. I was thinking only about "data" elements, but in fact you also need to store one bit per permutation element, so if we are precise, we need O(log n) extra bits for that. But most of the time using a sign bit (as suggested by J.F. Sebastian) is fine, so in practice we may not need anything more than we already have.
If you didn't mind allocating memory for an extra hash of indexes, you could keep a mapping of original location to current location to get a time complexity of near O(n). Here's an example in Ruby, since it's readable and pseudocode-ish. (This could be shorter or more idiomatically Ruby-ish, but I've written it out for clarity.)
objects = ['d', 'e', 'a', 'c', 'b']
order = [2, 4, 3, 0, 1]
cur_locations = {}
order.each_with_index do |orig_location, ordinality|
# Find the current location of the item.
cur_location = orig_location
while not cur_locations[cur_location].nil? do
cur_location = cur_locations[cur_location]
# Swap the items and keep track of whatever we swapped forward.
objects[ordinality], objects[cur_location] = objects[cur_location], objects[ordinality]
cur_locations[ordinality] = orig_location
puts objects.join(' ')
That obviously does involve some extra memory for the hash, but since it's just for indexes and not your "fairly large" objects, hopefully that's acceptable. Since hash lookups are O(1), even though there is a slight bump to the complexity due to the case where an item has been swapped forward more than once and you have to rewrite cur_location multiple times, the algorithm as a whole should be reasonably close to O(n).
If you wanted you could build a full hash of original to current positions ahead of time, or keep a reverse hash of current to original, and modify the algorithm a bit to get it down to strictly O(n). It'd be a little more complicated and take a little more space, so this is the version I wrote out, but the modifications shouldn't be difficult.
EDIT: Actually, I'm fairly certain the time complexity is just O(n), since each ordinality can have at most one hop associated, and thus the maximum number of lookups is limited to n.
#!/usr/bin/env python
def rearrange(objects, permutation):
"""Rearrange `objects` inplace according to `permutation`.
``result = [objects[p] for p in permutation]``
seen = [False] * len(permutation)
for i, already_seen in enumerate(seen):
if not already_seen: # start permutation cycle
first_obj, j = objects[i], i
while True:
seen[j] = True
p = permutation[j]
if p == i: # end permutation cycle
objects[j] = first_obj # [old] p -> j
objects[j], j = objects[p], p # p -> j
The algorithm (as I've noticed after I wrote it) is the same as the one from #meriton's answer in Java.
Here's a test function for the code:
def test():
import itertools
N = 9
for perm in itertools.permutations(range(N)):
L = range(N)
LL = L[:]
rearrange(L, perm)
assert L == [LL[i] for i in perm] == list(perm), (L, list(perm), LL)
# test whether assertions are enabled
assert 0
except AssertionError:
raise RuntimeError("assertions must be enabled for the test")
if __name__ == "__main__":
There's a histogram sort, though the running time is given as a bit higher than O(N) (N log log n).
I can do it given O(N) scratch space -- copy to new array and copy back.
EDIT: I am aware of the existance of an algorithm that will proceed through. The idea is to perform the swaps on the array of integers 1..N while at the same time mirroring the swaps on your array of large objects. I just cannot find the algorithm right now.
The problem is one of applying a permutation in place with minimal O(1) extra storage: "in-situ permutation".
It is solvable, but an algorithm is not obvious beforehand.
It is described briefly as an exercise in Knuth, and for work I had to decipher it and figure out how it worked. Look at 5.2 #13.
For some more modern work on this problem, with pseudocode:
I ended up writing a different algorithm for this, which first generates a list of swaps to apply an order and then runs through the swaps to apply it. The advantage is that if you're applying the ordering to multiple lists, you can reuse the swap list, since the swap algorithm is extremely simple.
void make_swaps(vector<int> order, vector<pair<int,int>> &swaps)
// order[0] is the index in the old list of the new list's first value.
// Invert the mapping: inverse[0] is the index in the new list of the
// old list's first value.
vector<int> inverse(order.size());
for(int i = 0; i < order.size(); ++i)
inverse[order[i]] = i;
for(int idx1 = 0; idx1 < order.size(); ++idx1)
// Swap list[idx] with list[order[idx]], and record this swap.
int idx2 = order[idx1];
if(idx1 == idx2)
swaps.push_back(make_pair(idx1, idx2));
// list[idx1] is now in the correct place, but whoever wanted the value we moved out
// of idx2 now needs to look in its new position.
int idx1_dep = inverse[idx1];
order[idx1_dep] = idx2;
inverse[idx2] = idx1_dep;
template<typename T>
void run_swaps(T data, const vector<pair<int,int>> &swaps)
for(const auto &s: swaps)
int src = s.first;
int dst = s.second;
swap(data[src], data[dst]);
void test()
vector<int> order = { 2, 3, 1, 4, 0 };
vector<pair<int,int>> swaps;
make_swaps(order, swaps);
vector<string> data = { "a", "b", "c", "d", "e" };
run_swaps(data, swaps);