Sort an array of small integers

Sort an array of small integers - algorithm

How can I sort an array of small integers in less than O(nlogn) knowing that all the integers can fit in only one bite and I have infinite memory to use?

If input is bounded to lets say your specification (1 byte) and you have infinite memory, you can use this advantage to sort not based on comparions. One such algorithm is the Key-indexed counting with the complexity of O(n). Refer to this explanation for the detailed guide how it works.

Well if you had infinite hardware acting in zero time on a finite, but unknown number of small integers then you could create an array of logic units f that perform:
define f(x, y):
return (x, y) if x <= y else (y, x)
The arrayed connections of these functional units whould produce its input in sorted order at the outputs of the last row, in zero time!
It's fun to sometimes think of having infinite resources acting in zero time :-)

An example implementation in C++
(totally untested code)
void SmallSort(std::vector<unsigned char>& small) {
std::array<int, 256> counter;
counter.fill(0);
for(auto i : small) {
++counter[i];
}
// fill the array
int idx = 0;
auto cur = small.begin();
for(auto i : counter) {
next = cur + i;
fill(cur, next, idx++);
cur = next;
}
}

Related

How to convert the following recurrence to top down dynamic programming?

I am trying to solve the Maximum Product Subarray problem from leetcode.
The problem description is: Given an integer array, find the contiguous subarray within the array containing at least one number which has the largest product.
Example: Input: [2,3,-2,4], Output: 6
To solve this I am using the following logic: let f(p,n) output the correct result until index n of the array where the result is p. So the recurrence is:
f(p,n) = p // if(n=a.length)
f(p,n) = max( p, f(p*a[n], n+1), f(a[n], n+1) ) // otherwise
This works for regular recursion (code below).
private int f(int[] a, int p, int n) {
if(n==a.length)
return p;
else
return max(p, f(a, p*a[n], n+1), f(a, a[n], n+1));
}
However I am having trouble converting it to top-down dynamic programming. The approach I have been using to convert a recursive program into one that uses top-down DP is:
Initialize a cache (I will be using an array)
If cache at index 'n' has been filled return the value as result
Otherwise recurse and store the result in cache
Return value from cache.
This is a general approach that I have been using and it has worked for most of the dp problems I have done however it does not work for this problem.
The (incorrect) code using this approach is shown below:
private int f(int[] a, int p, int n, int[] dp) {
if(dp[n]!=0)
return dp[n];
if(n==a.length)
dp[n] = p;
else
dp[n] = max(p, f_m(a, p*a[n], n+1, dp), f_m(a, a[n], n+1, dp));
return dp[n];
}
I call the functions from the main function as follows:
// int x = f(a, a[0], 1, dp); - for incorrect top-down dp attempt
// int x = f(a, a[0], 1); - for regular recursion
An example where it does not work is: [3,-1,4]. Here it incorrectly outputs 3 instead of 4.
From what I understand, the problem is because both subproblems refer to the same n+1 index of the DP array so only 1 subproblem is solved which results in the incorrect answer.
So my question is:
How can I convert this recurrence to a top-down DP program? Is there a general approach that I can follow for cases like this?

Your dp state depends on both the current index n and the current result p. So you need to memoize the result in a 2D array instead of using a 1D array for just index.
private int f(int[] a, int p, int n, int[] dp) {
if(dp[n][p]!=0)
return dp[n][p];
if(n==a.length)
dp[n][p] = p;
else
dp[n][p] = max(p, f_m(a, p*a[n], n+1, dp), f_m(a, a[n], n+1, dp));
return dp[n][p];
}

You can do it the way you are trying but , I will suggest you an easy way to the problem Its o(n) and doesn't even require to store the array thus o(1) space.
Let us keep 2 variables min and max. which stores the minimum and the maximum product so far , we are keeping min marker due to -ve numbers as two negative numbers can product to large number.
rest is easy,
initialise min=1 and max=1 and ans=0(as qs says atleast one number needs to be there you can change this initialisation accordingly) i.e the first element.
start reading the input one element at a time say 'a'
loop over length of the array
{
if(a>0)
max= a * max;
min=(1 < min * a) ? 1 : min * a ;
else if (a<0)
max=(1 > min * a) ? 1 : min * a ;
min=max * a;
else
max=1;
min=1;
ans=(ans > max) ? ans : max; // this is outside else
}
at the end of loop max will be the answer, happy coding :)

Is this a good Primality Checking Solution?

I have written this code to check if a number is prime (for numbers upto 10^9+7)
Is this a good method ??
What will be the time complexity for this ??
What I have done is that I have made a unordered_set which stores the prime numbers upto sqrt(n).
When checking if a number is prime or not if first check if its is less than the max number in the table.
If it is less it is searched in the table so the complexity should be O(1) in this case.
If it is more the number is put through a divisibility test with the numbers from the set of number containing the prime numbers.
#include<iostream>
#include<set>
#include<math.h>
#include<unordered_set>
#define sqrt10e9 31623
using namespace std;
unordered_set<long long> primeSet = { 2, 3 }; //used for fast lookups
void genrate_prime_set(long range) //this generates prime number upto sqrt(10^9+7)
{
bool flag;
set<long long> tempPrimeSet = { 2, 3 }; //a temporay set is used for genration
set<long long>::iterator j;
for (int i = 3; i <= range; i = i + 2)
{
//cout << i << " ";
flag = true;
for (j = tempPrimeSet.begin(); *j * *j <= i; ++j)
{
if (i % (*j) == 0)
{
flag = false;
break;
}
}
if (flag)
{
primeSet.insert(i);
tempPrimeSet.insert(i);
}
}
}
bool is_prime(long long i,unordered_set<long long> primeSet)
{
bool flag = true;
if(i <= sqrt10e9) //if number exist in the lookup table
return primeSet.count(i);
//if it doesn't iterate through the table
for (unordered_set<long long>::iterator j = primeSet.begin(); j != primeSet.end(); ++j)
{
if (*j * *j <= i && i % (*j) == 0)
{
flag = false;
break;
}
}
return flag;
}
int main()
{
//long long testCases, a, b, kiwiCount;
bool primeFlag = true;
//unordered_set<int> primeNum;
genrate_prime_set(sqrt10e9);
cout << primeSet.size()<<"\n";
cout << is_prime(9999991,primeSet);
return 0;
}

This doesn't strike me as a particularly efficient way to do the job at hand.
Although it probably won't make a big difference in the end, the efficient way to generate all the primes up to some specific limit is clearly to use a sieve--the sieve of Eratosthenes is simple and fast. There are a couple of modifications that can be faster, but for the small size you're dealing with, they're probably not worthwhile.
These normally produce their output in a more effective format than you're currently using as well. In particular, you typically just dedicate one bit to each possible prime (i.e., each odd number) and end up with it zeroed if the number is composite, and one if it's prime (you can, of course, reverse the sense if you prefer).
Since you only need one bit for each odd number from 3 to 31623, this requires only about 16 K bits, or about 2K bytes--a truly minuscule amount of memory by modern standards (especially: little enough to fit in L1 cache quite easily).
Since the bits are stored in order, it's also trivial to compute and test by the factors up to the square root of the number you're testing instead of testing against all the numbers in the table (including those greater than the square root of the number you're testing, which is obviously a waste of time). This also optimizes access to the memory in case some of it's not in the cache (i.e., you can access all the data in order, making life as easy as possible for the hardware prefetcher).
If you wanted to optimize further, I'd consider just using the sieve to find all primes up to 109+7, and look up inputs. Whether this is a win will depend (heavily) upon the number of queries you can expect to receive. A quick check shows that a simple implementation of the Sieve of Eratosthenes can find all primes up to 109 in about 17 seconds. After that, each query is (of course) essentially instantaneous (i.e., the cost of a single memory read). This does require around 120 megabytes of memory for the result of the sieve, which would once have been a major consideration, but (except on fairly limited systems) normally wouldn't be any more.

The very short answer: do research on the subject, starting with the term "Miller-Rabin"
The short answer is no:
Looking for factors of a number is a poor way to check for primality
Exhaustively searching through primes is a poor way to look for factors
Especially if you search through every prime, rather than just the ones less than or equal to the square root of the number
Doing a primality test on each number of them is a poor way to generate a list of primes
Also, you should take in primeSet by reference rather than copy, if it really needs to be a parameter.
Note: testing small primes to see if they divide a number is a useful first step of a primality test, but should generally only be used for the smallest primes before switching to a better method

No, it's not a very good way to determine if a number is prime. Here is pseudocode for a simple primality test that is sufficient for numbers in your range; I'll leave it to you to translate to C++:
function isPrime(n)
d := 2
while d * d <= n
if n % d == 0
return False
d := d + 1
return True
This works by trying every potential divisor up to the square root of the input number n; if no divisor has been found, then the input number could not be composite, meaning of the form n = p × q, because one of the two divisors p or q must be less than the square root of n while the other is greater than the square root of n.
There are better ways to determine primality; for instance, after initially checking if the number is even (and hence prime only if n = 2), it is only necessary to test odd potential divisors, halving the amount of work necessary. If you have a list of primes up to the square root of n, you can use that list as trial divisors and make the process even faster. And there are other techniques for larger n.
But that should be enough to get you started. When you are ready for more, come back here and ask more questions.

I can only suggest a way to use a library function in Java to check the primality of a number. As for the other questions, I do not have any answers.
The java.math.BigInteger.isProbablePrime(int certainty) returns true if this BigInteger is probably prime, false if it's definitely composite. If certainty is ≤ 0, true is returned. You should try and use it in your code. So try rewriting it in Java
Parameters
certainty - a measure of the uncertainty that the caller is willing to tolerate: if the call returns true the probability that this BigInteger is prime exceeds (1 - 1/2^certainty). The execution time of this method is proportional to the value of this parameter.
Return Value
This method returns true if this BigInteger is probably prime, false if it's definitely composite.
Example
The following example shows the usage of math.BigInteger.isProbablePrime() method
import java.math.*;
public class BigIntegerDemo {
public static void main(String[] args) {
// create 3 BigInteger objects
BigInteger bi1, bi2, bi3;
// create 3 Boolean objects
Boolean b1, b2, b3;
// assign values to bi1, bi2
bi1 = new BigInteger("7");
bi2 = new BigInteger("9");
// perform isProbablePrime on bi1, bi2
b1 = bi1.isProbablePrime(1);
b2 = bi2.isProbablePrime(1);
b3 = bi2.isProbablePrime(-1);
String str1 = bi1+ " is prime with certainity 1 is " +b1;
String str2 = bi2+ " is prime with certainity 1 is " +b2;
String str3 = bi2+ " is prime with certainity -1 is " +b3;
// print b1, b2, b3 values
System.out.println( str1 );
System.out.println( str2 );
System.out.println( str3 );
}
}
Output
7 is prime with certainity 1 is true
9 is prime with certainity 1 is false
9 is prime with certainity -1 is true

What's a good way to add a large number of small floats together?

Say you have 100000000 32-bit floating point values in an array, and each of these floats has a value between 0.0 and 1.0. If you tried to sum them all up like this
result = 0.0;
for (i = 0; i < 100000000; i++) {
result += array[i];
}
you'd run into problems as result gets much larger than 1.0.
So what are some of the ways to more accurately perform the summation?

Sounds like you want to use Kahan Summation.
According to Wikipedia,
The Kahan summation algorithm (also known as compensated summation) significantly reduces the numerical error in the total obtained by adding a sequence of finite precision floating point numbers, compared to the obvious approach. This is done by keeping a separate running compensation (a variable to accumulate small errors).
In pseudocode, the algorithm is:
function kahanSum(input)
var sum = input[1]
var c = 0.0 //A running compensation for lost low-order bits.
for i = 2 to input.length
y = input[i] - c //So far, so good: c is zero.
t = sum + y //Alas, sum is big, y small, so low-order digits of y are lost.
c = (t - sum) - y //(t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y)
sum = t //Algebraically, c should always be zero. Beware eagerly optimising compilers!
next i //Next time around, the lost low part will be added to y in a fresh attempt.
return sum

Make result a double, assuming C or C++.

If you can tolerate a little extra space (in Java):
float temp = new float[1000000];
float temp2 = new float[1000];
float sum = 0.0f;
for (i=0 ; i<1000000000 ; i++) temp[i/1000] += array[i];
for (i=0 ; i<1000000 ; i++) temp2[i/1000] += temp[i];
for (i=0 ; i<1000 ; i++) sum += temp2[i];
Standard divide-and-conquer algorithm, basically. This only works if the numbers are randomly scattered; it won't work if the first half billion numbers are 1e-12 and the second half billion are much larger.
But before doing any of that, one might just accumulate the result in a double. That'll help a lot.

If in .NET using the LINQ .Sum() extension method that exists on an IEnumerable. Then it would just be:
var result = array.Sum();

The absolutely optimal way is to use a priority queue, in the following way:
PriorityQueue<Float> q = new PriorityQueue<Float>();
for(float x : list) q.add(x);
while(q.size() > 1) q.add(q.pop() + q.pop());
return q.pop();
(this code assumes the numbers are positive; generally the queue should be ordered by absolute value)
Explanation: given a list of numbers, to add them up as precisely as possible you should strive to make the numbers close, t.i. eliminate the difference between small and big ones. That's why you want to add up the two smallest numbers, thus increasing the minimal value of the list, decreasing the difference between the minimum and maximum in the list and reducing the problem size by 1.
Unfortunately I have no idea about how this can be vectorized, considering that you're using OpenCL. But I am almost sure that it can be. You might take a look at the book on vector algorithms, it is surprising how powerful they actually are: Vector Models for Data-Parallel Computing

Algorithm to select a single, random combination of values?

Say I have y distinct values and I want to select x of them at random. What's an efficient algorithm for doing this? I could just call rand() x times, but the performance would be poor if x, y were large.
Note that combinations are needed here: each value should have the same probability to be selected but their order in the result is not important. Sure, any algorithm generating permutations would qualify, but I wonder if it's possible to do this more efficiently without the random order requirement.
How do you efficiently generate a list of K non-repeating integers between 0 and an upper bound N covers this case for permutations.

Robert Floyd invented a sampling algorithm for just such situations. It's generally superior to shuffling then grabbing the first x elements since it doesn't require O(y) storage. As originally written it assumes values from 1..N, but it's trivial to produce 0..N and/or use non-contiguous values by simply treating the values it produces as subscripts into a vector/array/whatever.
In pseuocode, the algorithm runs like this (stealing from Jon Bentley's Programming Pearls column "A sample of Brilliance").
initialize set S to empty
for J := N-M + 1 to N do
T := RandInt(1, J)
if T is not in S then
insert T in S
else
insert J in S
That last bit (inserting J if T is already in S) is the tricky part. The bottom line is that it assures the correct mathematical probability of inserting J so that it produces unbiased results.
It's O(x)1 and O(1) with regard to y, O(x) storage.
Note that, in accordance with the combinations tag in the question, the algorithm only guarantees equal probability of each element occuring in the result, not of their relative order in it.
1O(x2) in the worst case for the hash map involved which can be neglected since it's a virtually nonexistent pathological case where all the values have the same hash

Assuming that you want the order to be random too (or don't mind it being random), I would just use a truncated Fisher-Yates shuffle. Start the shuffle algorithm, but stop once you have selected the first x values, instead of "randomly selecting" all y of them.
Fisher-Yates works as follows:
select an element at random, and swap it with the element at the end of the array.
Recurse (or more likely iterate) on the remainder of the array, excluding the last element.
Steps after the first do not modify the last element of the array. Steps after the first two don't affect the last two elements. Steps after the first x don't affect the last x elements. So at that point you can stop - the top of the array contains uniformly randomly selected data. The bottom of the array contains somewhat randomized elements, but the permutation you get of them is not uniformly distributed.
Of course this means you've trashed the input array - if this means you'd need to take a copy of it before starting, and x is small compared with y, then copying the whole array is not very efficient. Do note though that if all you're going to use it for in future is further selections, then the fact that it's in somewhat-random order doesn't matter, you can just use it again. If you're doing the selection multiple times, therefore, you may be able to do only one copy at the start, and amortise the cost.

If you really only need to generate combinations - where the order of elements does not matter - you may use combinadics as they are implemented e.g. here by James McCaffrey.
Contrast this with k-permutations, where the order of elements does matter.
In the first case (1,2,3), (1,3,2), (2,1,3), (2,3,1), (3,1,2), (3,2,1) are considered the same - in the latter, they are considered distinct, though they contain the same elements.
In case you need combinations, you may really only need to generate one random number (albeit it can be a bit large) - that can be used directly to find the m th combination.
Since this random number represents the index of a particular combination, it follows that your random number should be between 0 and C(n,k).
Calculating combinadics might take some time as well.
It might just not worth the trouble - besides Jerry's and Federico's answer is certainly simpler than implementing combinadics.
However if you really only need a combination and you are bugged about generating the exact number of random bits that are needed and none more... ;-)
While it is not clear whether you want combinations or k-permutations, here is a C# code for the latter (yes, we could generate only a complement if x > y/2, but then we would have been left with a combination that must be shuffled to get a real k-permutation):
static class TakeHelper
{
public static IEnumerable<T> TakeRandom<T>(
this IEnumerable<T> source, Random rng, int count)
{
T[] items = source.ToArray();
count = count < items.Length ? count : items.Length;
for (int i = items.Length - 1 ; count-- > 0; i--)
{
int p = rng.Next(i + 1);
yield return items[p];
items[p] = items[i];
}
}
}
class Program
{
static void Main(string[] args)
{
Random rnd = new Random(Environment.TickCount);
int[] numbers = new int[] { 1, 2, 3, 4, 5, 6, 7 };
foreach (int number in numbers.TakeRandom(rnd, 3))
{
Console.WriteLine(number);
}
}
}
Another, more elaborate implementation that generates k-permutations, that I had lying around and I believe is in a way an improvement over existing algorithms if you only need to iterate over the results. While it also needs to generate x random numbers, it only uses O(min(y/2, x)) memory in the process:
/// <summary>
/// Generates unique random numbers
/// <remarks>
/// Worst case memory usage is O(min((emax-imin)/2, num))
/// </remarks>
/// </summary>
/// <param name="random">Random source</param>
/// <param name="imin">Inclusive lower bound</param>
/// <param name="emax">Exclusive upper bound</param>
/// <param name="num">Number of integers to generate</param>
/// <returns>Sequence of unique random numbers</returns>
public static IEnumerable<int> UniqueRandoms(
Random random, int imin, int emax, int num)
{
int dictsize = num;
long half = (emax - (long)imin + 1) / 2;
if (half < dictsize)
dictsize = (int)half;
Dictionary<int, int> trans = new Dictionary<int, int>(dictsize);
for (int i = 0; i < num; i++)
{
int current = imin + i;
int r = random.Next(current, emax);
int right;
if (!trans.TryGetValue(r, out right))
{
right = r;
}
int left;
if (trans.TryGetValue(current, out left))
{
trans.Remove(current);
}
else
{
left = current;
}
if (r > current)
{
trans[r] = left;
}
yield return right;
}
}
The general idea is to do a Fisher-Yates shuffle and memorize the transpositions in the permutation.
It was not published anywhere nor has it received any peer-review whatsoever. I believe it is a curiosity rather than having some practical value. Nonetheless I am very open to criticism and would generally like to know if you find anything wrong with it - please consider this (and adding a comment before downvoting).

A little suggestion: if x >> y/2, it's probably better to select at random y - x elements, then choose the complementary set.

The trick is to use a variation of shuffle or in other words a partial shuffle.
function random_pick( a, n )
{
N = len(a);
n = min(n, N);
picked = array_fill(0, n, 0); backup = array_fill(0, n, 0);
// partially shuffle the array, and generate unbiased selection simultaneously
// this is a variation on fisher-yates-knuth shuffle
for (i=0; i<n; i++) // O(n) times
{
selected = rand( 0, --N ); // unbiased sampling N * N-1 * N-2 * .. * N-n+1
value = a[ selected ];
a[ selected ] = a[ N ];
a[ N ] = value;
backup[ i ] = selected;
picked[ i ] = value;
}
// restore partially shuffled input array from backup
// optional step, if needed it can be ignored
for (i=n-1; i>=0; i--) // O(n) times
{
selected = backup[ i ];
value = a[ N ];
a[ N ] = a[ selected ];
a[ selected ] = value;
N++;
}
return picked;
}
NOTE the algorithm is strictly O(n) in both time and space, produces unbiased selections (it is a partial unbiased shuffling) and non-destructive on the input array (as a partial shuffle would be) but this is optional
adapted from here
update
another approach using only a single call to PRNG (pseudo-random number generator) in [0,1] by IVAN STOJMENOVIC, "ON RANDOM AND ADAPTIVE PARALLEL GENERATION OF COMBINATORIAL OBJECTS" (section 3), of O(N) (worst-case) complexity

Here is a simple way to do it which is only inefficient if Y is much larger than X.
void randomly_select_subset(
int X, int Y,
const int * inputs, int X, int * outputs
) {
int i, r;
for( i = 0; i < X; ++i ) outputs[i] = inputs[i];
for( i = X; i < Y; ++i ) {
r = rand_inclusive( 0, i+1 );
if( r < i ) outputs[r] = inputs[i];
}
}
Basically, copy the first X of your distinct values to your output array, and then for each remaining value, randomly decide whether or not to include that value.
The random number is further used to choose an element of our (mutable) output array to replace.

If, for example, you have 2^64 distinct values, you can use a symmetric key algorithm (with a 64 bits block) to quickly reshuffle all combinations. (for example Blowfish).
for(i=0; i<x; i++)
e[i] = encrypt(key, i)
This is not random in the pure sense but can be useful for your purpose.
If you want to work with arbitrary # of distinct values following cryptographic techniques you can but it's more complex.

Is it possible to rearrange an array in place in O(N)?

If I have a size N array of objects, and I have an array of unique numbers in the range 1...N, is there any algorithm to rearrange the object array in-place in the order specified by the list of numbers, and yet do this in O(N) time?
Context: I am doing a quick-sort-ish algorithm on objects that are fairly large in size, so it would be faster to do the swaps on indices than on the objects themselves, and only move the objects in one final pass. I'd just like to know if I could do this last pass without allocating memory for a separate array.
Edit: I am not asking how to do a sort in O(N) time, but rather how to do the post-sort rearranging in O(N) time with O(1) space. Sorry for not making this clear.

I think this should do:
static <T> void arrange(T[] data, int[] p) {
boolean[] done = new boolean[p.length];
for (int i = 0; i < p.length; i++) {
if (!done[i]) {
T t = data[i];
for (int j = i;;) {
done[j] = true;
if (p[j] != i) {
data[j] = data[p[j]];
j = p[j];
} else {
data[j] = t;
break;
}
}
}
}
}
Note: This is Java. If you do this in a language without garbage collection, be sure to delete done.
If you care about space, you can use a BitSet for done. I assume you can afford an additional bit per element because you seem willing to work with a permutation array, which is several times that size.
This algorithm copies instances of T n + k times, where k is the number of cycles in the permutation. You can reduce this to the optimal number of copies by skipping those i where p[i] = i.

The approach is to follow the "permutation cycles" of the permutation, rather than indexing the array left-to-right. But since you do have to begin somewhere, everytime a new permutation cycle is needed, the search for unpermuted elements is left-to-right:
// Pseudo-code
N : integer, N > 0 // N is the number of elements
swaps : integer [0..N]
data[N] : array of object
permute[N] : array of integer [-1..N] denoting permutation (used element is -1)
next_scan_start : integer;
next_scan_start = 0;
while (swaps < N )
{
// Search for the next index that is not-yet-permtued.
for (idx_cycle_search = next_scan_start;
idx_cycle_search < N;
++ idx_cycle_search)
if (permute[idx_cycle_search] >= 0)
break;
next_scan_start = idx_cycle_search + 1;
// This is a provable invariant. In short, number of non-negative
// elements in permute[] equals (N - swaps)
assert( idx_cycle_search < N );
// Completely permute one permutation cycle, 'following the
// permutation cycle's trail' This is O(N)
while (permute[idx_cycle_search] >= 0)
{
swap( data[idx_cycle_search], data[permute[idx_cycle_search] )
swaps ++;
old_idx = idx_cycle_search;
idx_cycle_search = permute[idx_cycle_search];
permute[old_idx] = -1;
// Also '= -idx_cycle_search -1' could be used rather than '-1'
// and would allow reversal of these changes to permute[] array
}
}

Do you mean that you have an array of objects O[1..N] and then you have an array P[1..N] that contains a permutation of numbers 1..N and in the end you want to get an array O1 of objects such that O1[k] = O[P[k]] for all k=1..N ?
As an example, if your objects are letters A,B,C...,Y,Z and your array P is [26,25,24,..,2,1] is your desired output Z,Y,...C,B,A ?
If yes, I believe you can do it in linear time using only O(1) additional memory. Reversing elements of an array is a special case of this scenario. In general, I think you would need to consider decomposition of your permutation P into cycles and then use it to move around the elements of your original array O[].
If that's what you are looking for, I can elaborate more.
EDIT: Others already presented excellent solutions while I was sleeping, so no need to repeat it here. ^_^
EDIT: My O(1) additional space is indeed not entirely correct. I was thinking only about "data" elements, but in fact you also need to store one bit per permutation element, so if we are precise, we need O(log n) extra bits for that. But most of the time using a sign bit (as suggested by J.F. Sebastian) is fine, so in practice we may not need anything more than we already have.

If you didn't mind allocating memory for an extra hash of indexes, you could keep a mapping of original location to current location to get a time complexity of near O(n). Here's an example in Ruby, since it's readable and pseudocode-ish. (This could be shorter or more idiomatically Ruby-ish, but I've written it out for clarity.)
#!/usr/bin/ruby
objects = ['d', 'e', 'a', 'c', 'b']
order = [2, 4, 3, 0, 1]
cur_locations = {}
order.each_with_index do |orig_location, ordinality|
# Find the current location of the item.
cur_location = orig_location
while not cur_locations[cur_location].nil? do
cur_location = cur_locations[cur_location]
end
# Swap the items and keep track of whatever we swapped forward.
objects[ordinality], objects[cur_location] = objects[cur_location], objects[ordinality]
cur_locations[ordinality] = orig_location
end
puts objects.join(' ')
That obviously does involve some extra memory for the hash, but since it's just for indexes and not your "fairly large" objects, hopefully that's acceptable. Since hash lookups are O(1), even though there is a slight bump to the complexity due to the case where an item has been swapped forward more than once and you have to rewrite cur_location multiple times, the algorithm as a whole should be reasonably close to O(n).
If you wanted you could build a full hash of original to current positions ahead of time, or keep a reverse hash of current to original, and modify the algorithm a bit to get it down to strictly O(n). It'd be a little more complicated and take a little more space, so this is the version I wrote out, but the modifications shouldn't be difficult.
EDIT: Actually, I'm fairly certain the time complexity is just O(n), since each ordinality can have at most one hop associated, and thus the maximum number of lookups is limited to n.

#!/usr/bin/env python
def rearrange(objects, permutation):
"""Rearrange `objects` inplace according to `permutation`.
``result = [objects[p] for p in permutation]``
"""
seen = [False] * len(permutation)
for i, already_seen in enumerate(seen):
if not already_seen: # start permutation cycle
first_obj, j = objects[i], i
while True:
seen[j] = True
p = permutation[j]
if p == i: # end permutation cycle
objects[j] = first_obj # [old] p -> j
break
objects[j], j = objects[p], p # p -> j
The algorithm (as I've noticed after I wrote it) is the same as the one from #meriton's answer in Java.
Here's a test function for the code:
def test():
import itertools
N = 9
for perm in itertools.permutations(range(N)):
L = range(N)
LL = L[:]
rearrange(L, perm)
assert L == [LL[i] for i in perm] == list(perm), (L, list(perm), LL)
# test whether assertions are enabled
try:
assert 0
except AssertionError:
pass
else:
raise RuntimeError("assertions must be enabled for the test")
if __name__ == "__main__":
test()

There's a histogram sort, though the running time is given as a bit higher than O(N) (N log log n).

I can do it given O(N) scratch space -- copy to new array and copy back.
EDIT: I am aware of the existance of an algorithm that will proceed through. The idea is to perform the swaps on the array of integers 1..N while at the same time mirroring the swaps on your array of large objects. I just cannot find the algorithm right now.

The problem is one of applying a permutation in place with minimal O(1) extra storage: "in-situ permutation".
It is solvable, but an algorithm is not obvious beforehand.
It is described briefly as an exercise in Knuth, and for work I had to decipher it and figure out how it worked. Look at 5.2 #13.
For some more modern work on this problem, with pseudocode:
http://www.fernuni-hagen.de/imperia/md/content/fakultaetfuermathematikundinformatik/forschung/berichte/bericht_273.pdf

I ended up writing a different algorithm for this, which first generates a list of swaps to apply an order and then runs through the swaps to apply it. The advantage is that if you're applying the ordering to multiple lists, you can reuse the swap list, since the swap algorithm is extremely simple.
void make_swaps(vector<int> order, vector<pair<int,int>> &swaps)
{
// order[0] is the index in the old list of the new list's first value.
// Invert the mapping: inverse[0] is the index in the new list of the
// old list's first value.
vector<int> inverse(order.size());
for(int i = 0; i < order.size(); ++i)
inverse[order[i]] = i;
swaps.resize(0);
for(int idx1 = 0; idx1 < order.size(); ++idx1)
{
// Swap list[idx] with list[order[idx]], and record this swap.
int idx2 = order[idx1];
if(idx1 == idx2)
continue;
swaps.push_back(make_pair(idx1, idx2));
// list[idx1] is now in the correct place, but whoever wanted the value we moved out
// of idx2 now needs to look in its new position.
int idx1_dep = inverse[idx1];
order[idx1_dep] = idx2;
inverse[idx2] = idx1_dep;
}
}
template<typename T>
void run_swaps(T data, const vector<pair<int,int>> &swaps)
{
for(const auto &s: swaps)
{
int src = s.first;
int dst = s.second;
swap(data[src], data[dst]);
}
}
void test()
{
vector<int> order = { 2, 3, 1, 4, 0 };
vector<pair<int,int>> swaps;
make_swaps(order, swaps);
vector<string> data = { "a", "b", "c", "d", "e" };
run_swaps(data, swaps);
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio