Efficient nearest neighbour search in Scala - algorithm

Let this coordinates class with the Euclidean distance,
case class coord(x: Double, y: Double) {
def dist(c: coord) = Math.sqrt( Math.pow(x-c.x, 2) + Math.pow(y-c.y, 2) )
and let a grid of coordinates, for instance
val grid = (1 to 25).map {_ => coord(Math.random*5, Math.random*5) }
Then for any given coordinate
val x = coord(Math.random*5, Math.random*5)
the nearest points to x are
val nearest = grid.sortWith( (p,q) => p.dist(x) < q.dist(x) )
so the first three closest are nearest.take(3).
Is there a way to make these calculations more time efficient especially for the case of a grid with one million points ?

I'm not sure if this is helpful (or even stupid), but I thought of this:
You use a sort-function to sort ALL elements in the grid and then pick the first k elements. If you consider a sorting algorithm like recursive merge-sort, you have something like this:
Split collection in half
Recurse on both halves
Merge both sorted halves
Maybe you could optimize such a function for your needs. The merging part normally merges all elements from both halves, but you are only interested in the first k that result from the merging. So you could only merge until you have k elements and ignore the rest.
So in the worst-case, where k >= n (n is the size of the grid) you would still only have the complexity of merge-sort. O(n log n) To be honest I'm not able to determine the complexity of this solution relative to k. (too tired for that at the moment)
Here is an example implementation of that solution (it's definitely not optimal and not generalized):
def minK(seq: IndexedSeq[coord], x: coord, k: Int) = {
val dist = (c: coord) => c.dist(x)
def sort(seq: IndexedSeq[coord]): IndexedSeq[coord] = seq.size match {
case 0 | 1 => seq
case size => {
val (left, right) = seq.splitAt(size / 2)
merge(sort(left), sort(right))
def merge(left: IndexedSeq[coord], right: IndexedSeq[coord]) = {
val leftF = left.lift
val rightF = right.lift
val builder = IndexedSeq.newBuilder[coord]
def loop(leftIndex: Int = 0, rightIndex: Int = 0): Unit = {
if (leftIndex + rightIndex < k) {
(leftF(leftIndex), rightF(rightIndex)) match {
case (Some(leftCoord), Some(rightCoord)) => {
if (dist(leftCoord) < dist(rightCoord)) {
builder += leftCoord
loop(leftIndex + 1, rightIndex)
} else {
builder += rightCoord
loop(leftIndex, rightIndex + 1)
case (Some(leftCoord), None) => {
builder += leftCoord
loop(leftIndex + 1, rightIndex)
case (None, Some(rightCoord)) => {
builder += rightCoord
loop(leftIndex, rightIndex + 1)
case _ =>

Profile your code, to see what is costly.
Your way of sorting is already highly inefficient.
Do not recompute distances all the time. That isn't free - most likely your program spends 99% of the time with computing distances (use a profiler to find out!)
Finally, you can use index structures. For Euclidean distance you have probably the largest choice of indexes to accelerate finding the nearest neighbors. There is k-d-tree, but I found the R-tree to be often faster. If you want to play around with these, I recommend ELKI. It's a Java library for data mining (so it should be easy to use from Scala, too), and it has a huge choice of index structures.

This one was quite fun to do.
case class Coord(x: Double, y: Double) {
def dist(c: Coord) = Math.sqrt(Math.pow(x - c.x, 2) + Math.pow(y - c.y, 2))
class CoordOrdering(x: Coord) extends Ordering[Coord] {
def compare(a: Coord, b: Coord) = a.dist(x) compare b.dist(x)
def top[T](xs: Seq[T], n: Int)(implicit ord: Ordering[T]): Seq[T] = {
// xs is an ordered sequence of n elements. insert returns xs with e inserted
// if it is less than anything currently in the sequence (and in that case,
// the last element is dropped) otherwise returns an unmodifed sequence
def insert[T](xs: Seq[T], e: T)(implicit ord: Ordering[T]): Seq[T] = {
val (l, r) = xs.span(x => ord.lt(x, e))
(l ++ (e +: r)).take(n)
Minimally tested. Call it like this:
val grid = (1 to 250000).map { _ => Coord(Math.random * 5, Math.random * 5) }
val x = Coord(Math.random * 5, Math.random * 5)
top(grid, 3)(new CoordOrdering(x))
EDIT: It's quite easy to extend this to (pre-)compute the distances just once
val zippedGrid = grid map {_.dist(x)} zip grid
object ZippedCoordOrdering extends Ordering[(Double, Coord)] {
def compare(a:(Double, Coord), b:(Double, Coord)) = a._1 compare b._1

Here is an algorithm that makes use of an R-tree data structure. Not useful for the small data set described, but it scales well to a large number of objects.
Use an ordered list whose nodes represent either objects or R-tree bounding boxes. The order is closest first using whatever distance function you want. Maintain the order on insert.
Initialize the list by inserting the bounding boxes in the root node of the R-tree.
To get the next closest object:
(1) Remove the first element from the list.
(2) If it is an object, it is the closest one.
(3) If it is the bounding box of a non-leaf node of the R-tree, insert all the bounding boxes representing children of that node into the list in their proper places according to their distance.
(4) If it is the bounding box of an R-tree leaf node, insert the objects that are children of that node (the objects, not their bounding boxes) according to their distance.
(5) Go back to step (1).
The list will remain pretty short. At the front will be nearby objects that we are interested in, and later nodes in the list will be boxes representing collections of objects that are farther away.

It depends on whether exact or approximation.
As several benchmarks such as http://www.slideshare.net/erikbern/approximate-nearest-neighbor-methods-and-vector-models-nyc-ml-meetup show that approximation is a good solution in terms of efficient.
I wrote ann4s which is a Scala implementation of Annoy
Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data.
Take a look at this repo.


Standard Algorithm for subdividing a grid into smaller and smaller parts

I'm running a simulation over a grid of parameters and I'd like to run it for as long as possible, but I don't know yet when the simulation will be terminated (think power cut). So what I'd like to do is specify the min and max values for each parameter and then let the loop pick the next best point on the grid, regularly saving the current result.
So given in 1d space a parameter a from 0 to 1 I'd like the loop to simulate for values 0, 1, 0.5, 0.75, 0.25, 0.875, 0.625, 0.375, 0.125, ... The exact order does not matter too much, as long as the next point always lies in between the previous ones.
So probably I could come up with some piece of code that generates this sequence, but I'm wondering if there are standard formulations for such an algorithm, especially for higher dimensional spaces?
One way to achieve this in one dimension is to maintain a binary tree, where each node keeps track of an interval, and its midpoint.
The left child of a node contains the left half of its interval, and the right child contains the right half.
Performing a breadth-first search in such a tree and keeping track of all the mid points of the traversed nodes, will yield the sequence you are after.
For several dimensions, depending on your needs, you can e.g. keep track of one such tree for each dimension, and generate your parameters in the order you like.
In practice this can be implemented using lazy initialisation and a queue to perform the BFS.
To demonstrate (but in practice, you would do it in a more memory-efficient way), I've added a simple binary tree DFS implementation in JavaScript (since it can be tried in the browser):
class Node {
constructor(min, max) {
this.min = min;
this.max = max;
this.mid = (min + max) / 2;
get left() { return new Node(this.min, this.mid); }
get right() { return new Node(this.mid, this.max); }
function getSequence(start, end, n) {
const res = [start, end];
const queue = [new Node(start, end)];
for (let i=0; i<n; ++i) {
const n = queue.shift();
queue.push(n.right, n.left);
return res;
getSequence(0, 1, 100);

Finding the binary sub-tree that puts each element of a list into its own lowest order bucket

First, I have a list of numbers 'L', containing elements 'x' such that 0 < 'x' <= 'M' for all elements 'x'.
Second, I have a binary tree constructed in the following manner:
1) Each node has three properties: 'min', 'max', and 'vals' (vals is a list of numbers).
2) The root node has 'max'='M', 'min'=0, and 'vals'='L' (it contains all the numbers)
3) Each left child node has:
max=(parent(max) + parent(min))/2
4) Each right child node has:
min=(parent(max) + parent(min))/2
5) For each node, 'vals' is a list of numbers such that each element 'x' of
'vals' is also an element of 'L' and satisfies
min < x <= max
6) If a node has only one element in 'vals', then it has no children. I.e., we are
only looking for nodes for which 'vals' is non-empty.
I'm looking for an algorithm to find the smallest sub-tree that satisfies the above properties. In other words, I'm trying to get a list of nodes such that each child-less node contains one - and only one - element in 'vals'.
I'm almost able to brute-force it with perl using insanely baroque data structures, but I keep bumping up against the limits of my mental capacity to keep track of all the temporary variables I've used, so I'm asking for help.
It's even cooler if you know an efficient algorithm to do the above.
If you'd like to know what I'm trying to do, it's this: find the smallest covering for a discrete wavelet packet transform to uniquely separate each frequency of the standard even-tempered musical notes. The trouble is that each iteration of the wavelet transform divides the frequency range it handles in half (hence the .../2 above defining the max and min), and the musical notes have frequencies which go up exponentially, so there's no obvious relationship between the two - not one I'm able to derive analytically (or experimentally, obviously, for that matter), anyway.
Since I'm really trying to find an algorithm so I can write a program, and since the problem is put in general terms, I didn't think it appropriate to put it in DSP. If there were a general "algorithms" group, then I think it would be better there, but this seems to be the right group for algorithms in the absence of such.
Please let me know if I can clarify anything, or if you have any suggestions - even in the absence of a complete answer - any help is appreciated!
After taking a break and two cups of coffee, I answered my own question. Indexing below is done starting at 1, MATLAB-style...
L=[] // list of numbers to put in bins
sorted=[] // list of "good" nodes
nodes=[] // array of nodes to construct
nodes[1]={ min = 0, max = 22100, val = -1, lvl = 1, row = 1 }
for(j=1;j<=12;j++) { // 12 is a reasonable guess
for(i=2^j;i<2^(j+1);i++) {
if(i/2 == int(i/2)) { // even nodes are high-pass filters
nodes[i]={ min = (nodes[i/2].min + nodes[i/2].max)/2, // nodes[i/2] is parent
max = nodes[i/2].max,
val = -1,
lvl = level,
row = -1
} else { // odd nodes are lo-pass
nodes[i]={ min = nodes[(i-1)/2].min,
max = (nodes[(i-1)/2].min+nodes[(i-1)/2].max)/2,
val = -1,
lvl = level,
row = -1
temp=[] // array to count matching numbers
for (k=1;k<=size(L);k++) {
if (nodes[i].min < L[k] && L[k] <= nodes[i].max) {
if (size(temp) == 1) {
nodes[i].row = row++
nodes[i].val = temp[1]
delete L[Lidx]
Now array sorted[] contains exactly what I was looking for!
Hopefully this helps somebody else someday...

Fuse tuples to find equivalence classes

Suppose we have a finite domain D={d1,..dk} containg k elements.
We consider S a subset of D^n, i.e. a set of tuples of the form < a1,..,an >, with ai in D.
We want to represent it (compactly) using S' a subset of 2^D^n, i.e. a set of tuples of the form < A1,..An > with Ai being subsets of D. The implication is that for any tuple s' in S' all elements in the cross product of Ai exist in S.
For instance, consider D={a,b,c} so k=3, n=2 and the tuples S=< a,b >+< a,c >+< b,b >+< b,c >.
We can use S'=<{a,b},{b,c}> to represent S.
This singleton solution is also minimal, S'=<{a},{b,c}>+<{b},{b,c}> is also a solution but it is larger, therefore less desirable.
Some sizes, in concrete instances, that we need to handle : k ~ 1000 elements in the domain D, n <= 10 relatively small (main source of complexity), |S| ranging to large values > 10^6.
A naïve approach consists in first plunging S into the domain of S' 2^D^n, then using the following test, two by two, two tuples s1,s2 in S' can be fused to form a single tuple in S' iff. they differ by only one component.
< a,b >+< a,c > -> <{a},{b,c}> (differ on second component)
< b,b >+< b,c > -> <{b},{b,c}> (differ on second component)
<{a},{b,c}> + <{b},{b,c}> -> <{a,b},{b,c}> (differ on first component)
Now there could be several minimal S', we are interested in finding any one, and approximations of minimisation of some kind are also ok, provided they don't give wrong results (i.e. even if S' is not as small as it could be, but we get very fast results).
Naive algorithm has to deal with the fact that any newly introduced "fused" tuple could match with some other tuple so it scales really badly on large input sets, even with n remaining low. You need |S'|^2 comparisons to ensure convergence, and any time you do fuse two elements, I'm currently retesting every pair (how can I improve that ?).
A lot of efficiency is iteration order dependent, so sorting the set in some way(s) could be an option, or perhaps indexing using hashes, but I'm not sure how to do it.
Imperative pseudo code would be ideal, or pointers to a reformulation of the problem to something I can run a solver on would really help.
Here's some psuedo (C# code that I haven't tested) that demonstrates your S'=<{a},{b,c}>+<{b},{b,c}> method. Except for the space requirements, which when using an integer index for the element are negligible; the overall efficiency and speed for Add'ing and Test'ing tuples should be extremely fast. If you want a practical solution then you already have one you just have to use the correct ADTs.
ElementType[] domain = new ElementType[]; // a simple array of domain elements
FillDomain(domain); // insert all domain elements
SortArray(domain); // sort the domain elements K log K time
SortedDictionary<int, HashSet<int>> subsets; // int's are index/ref into domain
subsets = new SortedDictionary<int, HashSet<int>>();
void AddTuple(SortedDictionary<int, HashSet<int>> tuples, ElementType[] domain, ElementType first, elementType second) {
int a = BinarySearch(domain, first); // log K time (binary search)
int b = BinarySearch(domain, second); // log K time (binary search)
if(tuples.ContainsKey(a)) { // log N time (binary search on sorted keys)
if(!tuples[a].Contains(b)) { // constant time (hash lookup)
tuples[a].Add(b); // constant time (hash add)
} else { // constant time (instance + hash add)
tuples[a] = new HashSet<in>();
bool ContainsTuple(SortedDictionary<int, HashSet<int>> tuples, ElementType[] domain, ElementType first, ElementType second) {
int a = BinarySearch(domain, first); // log K time (binary search)
int b = BinarySearch(domain, second); // log K time (binary search)
if(tuples.ContainsKey(a)) { // log N time (binary search on sorted keys)
if(tuples[a].Contains(b)) { // constant time (hash test)
return true;
return false;
The space savings for optimizing your tuple subset S' won't outweight the slowdown of the optimization process itself. For size optimization (if you know you're K will be less than 65536 you could use short integers instead of integers in the SortedDictionary and HashSet. But even 50 mil integers only take up 4 bytes per 32bit integer * 50 mil ~= 200 MB.
Here's another approach by encoding/mapping your tuples to a string you can take advantage of binary string compare and the fact that UTF-16 / UTF-8 encoding is very size efficient. Again this still doesn't doing the merging optimization you want, but speed and efficiency would be pretty good.
Here's some quick pseudo code in JavaScript.
Array.prototype.binarySearch = function(elm) {
var l = 0, h = this.length - 1, i;
while(l <= h) {
i = (l + h) >> 1;
if(this[i] < elm) l = ++i;
else if(this[i] > elm) h = --i;
else return i;
return -(++l);
// map your ordered domain elements to characters
// For example JavaScript's UTF-16 should be fine
// UTF-8 would work as well
var domain = {
"a": String.fromCharCode(1),
"b": String.fromCharCode(2),
"c": String.fromCharCode(3),
"d": String.fromCharCode(4)
var tupleStrings = [];
// map your tuple to the string encoding
function map(tuple) {
var str = "";
for(var i=0; i<tuple.length; i++) {
str += domain[tuple[i]];
return str;
function add(tuple) {
var str = map(tuple);
// binary search
var index = tupleStrings.binarySearch(str);
if(index < 0) index = ~index;
// insert depends on tupleString's type implementation
tupleStrings.splice(index, 0, str);
function contains(tuple) {
var str = map(tuple);
// binary search
return tupleString.binarySearch(str) >= 0;
alert(JSON.stringify(tupleStrings, null, "\n"));

Find unique common element from 3 arrays

Original Problem:
I have 3 boxes each containing 200 coins, given that there is only one person who has made calls from all of the three boxes and thus there is one coin in each box which has same fingerprints and rest of all coins have different fingerprints. You have to find the coin which contains same fingerprint from all of the 3 boxes. So that we can find the fingerprint of the person who has made call from all of the 3 boxes.
Converted problem:
You have 3 arrays containing 200 integers each. Given that there is one and only one common element in these 3 arrays. Find the common element.
Please consider solving this for other than trivial O(1) space and O(n^3) time.
Some improvement in Pelkonen's answer:
From converted problem in OP:
"Given that there is one and only one common element in these 3 arrays."
We need to sort only 2 arrays and find common element.
If you sort all the arrays first O(n log n) then it will be pretty easy to find the common element in less than O(n^3) time. You can for example use binary search after sorting them.
Let N = 200, k = 3,
Create a hash table H with capacity ≥ Nk.
For each element X in array 1, set H[X] to 1.
For each element Y in array 2, if Y is in H and H[Y] == 1, set H[Y] = 2.
For each element Z in array 3, if Z is in H and H[Z] == 2, return Z.
throw new InvalidDataGivenByInterviewerException();
O(Nk) time, O(Nk) space complexity.
Use a hash table for each integer and encode the entries such that you know which array it's coming from - then check for the slot which has entries from all 3 arrays. O(n)
Use a hashtable mapping objects to frequency counts. Iterate through all three lists, incrementing occurrence counts in the hashtable, until you encounter one with an occurrence count of 3. This is O(n), since no sorting is required. Example in Python:
def find_duplicates(*lists):
num_lists = len(lists)
counts = {}
for l in lists:
for i in l:
counts[i] = counts.get(i, 0) + 1
if counts[i] == num_lists:
return i
Or an equivalent, using sets:
def find_duplicates(*lists):
intersection = set(lists[0])
for l in lists[1:]:
intersection = intersection.intersect(set(l))
return intersection.pop()
O(N) solution: use a hash table. H[i] = list of all integers in the three arrays that map to i.
For all H[i] > 1 check if three of its values are the same. If yes, you have your solution. You can do this check with the naive solution even, it should still be very fast, or you can sort those H[i] and then it becomes trivial.
If your numbers are relatively small, you can use H[i] = k if i appears k times in the three arrays, then the solution is the i for which H[i] = 3. If your numbers are huge, use a hash table though.
You can extend this to work even if you can have elements that can be common to only two arrays and also if you can have elements repeating elements in one of the arrays. It just becomes a bit more complicated, but you should be able to figure it out on your own.
If you want the fastest* answer:
Sort one array--time is N log N.
For each element in the second array, search the first. If you find it, add 1 to a companion array; otherwise add 0--time is N log N, using N space.
For each non-zero count, copy the corresponding entry into the temporary array, compacting it so it's still sorted--time is N.
For each element in the third array, search the temporary array; when you find a hit, stop. Time is less than N log N.
Here's code in Scala that illustrates this:
import java.util.Arrays
val a = Array(1,5,2,3,14,1,7)
val b = Array(3,9,14,4,2,2,4)
val c = Array(1,9,11,6,8,3,1)
val count = new Array[Int](a.length)
for (i <- 0 until b.length) {
val j =Arrays.binarySearch(a,b(i))
if (j >= 0) count(j) += 1
var n = 0
for (i <- 0 until count.length) if (count(i)>0) { count(n) = a(i); n+= 1 }
for (i <- 0 until c.length) {
if (Arrays.binarySearch(count,0,n,c(i))>=0) println(c(i))
With slightly more complexity, you can either use no extra space at the cost of being even more destructive of your original arrays, or you can avoid touching your original arrays at all at the cost of another N space.
Edit: * as the comments have pointed out, hash tables are faster for non-perverse inputs. This is "fastest worst case". The worst case may not be so unlikely unless you use a really good hashing algorithm, which may well eat up more time than your sort. For example, if you multiply all your values by 2^16, the trivial hashing (i.e. just use the bitmasked integer as an index) will collide every time on lists shorter than 64k....
//Begineers Code using Binary Search that's pretty Easy
// bool BS(int arr[],int low,int high,int target)
// {
// if(low>high)
// return false;
// int mid=low+(high-low)/2;
// if(target==arr[mid])
// return 1;
// else if(target<arr[mid])
// BS(arr,low,mid-1,target);
// else
// BS(arr,mid+1,high,target);
// }
// vector <int> commonElements (int A[], int B[], int C[], int n1, int n2, int n3)
// {
// vector<int> ans;
// for(int i=0;i<n2;i++)
// {
// if(i>0)
// {
// if(B[i-1]==B[i])
// continue;
// }
// //The above if block is to remove duplicates
// //In the below code we are searching an element form array B in both the arrays A and B;
// if(BS(A,0,n1-1,B[i]) && BS(C,0,n3-1,B[i]))
// {
// ans.push_back(B[i]);
// }
// }
// return ans;
// }

Is it possible to rearrange an array in place in O(N)?

If I have a size N array of objects, and I have an array of unique numbers in the range 1...N, is there any algorithm to rearrange the object array in-place in the order specified by the list of numbers, and yet do this in O(N) time?
Context: I am doing a quick-sort-ish algorithm on objects that are fairly large in size, so it would be faster to do the swaps on indices than on the objects themselves, and only move the objects in one final pass. I'd just like to know if I could do this last pass without allocating memory for a separate array.
Edit: I am not asking how to do a sort in O(N) time, but rather how to do the post-sort rearranging in O(N) time with O(1) space. Sorry for not making this clear.
I think this should do:
static <T> void arrange(T[] data, int[] p) {
boolean[] done = new boolean[p.length];
for (int i = 0; i < p.length; i++) {
if (!done[i]) {
T t = data[i];
for (int j = i;;) {
done[j] = true;
if (p[j] != i) {
data[j] = data[p[j]];
j = p[j];
} else {
data[j] = t;
Note: This is Java. If you do this in a language without garbage collection, be sure to delete done.
If you care about space, you can use a BitSet for done. I assume you can afford an additional bit per element because you seem willing to work with a permutation array, which is several times that size.
This algorithm copies instances of T n + k times, where k is the number of cycles in the permutation. You can reduce this to the optimal number of copies by skipping those i where p[i] = i.
The approach is to follow the "permutation cycles" of the permutation, rather than indexing the array left-to-right. But since you do have to begin somewhere, everytime a new permutation cycle is needed, the search for unpermuted elements is left-to-right:
// Pseudo-code
N : integer, N > 0 // N is the number of elements
swaps : integer [0..N]
data[N] : array of object
permute[N] : array of integer [-1..N] denoting permutation (used element is -1)
next_scan_start : integer;
next_scan_start = 0;
while (swaps < N )
// Search for the next index that is not-yet-permtued.
for (idx_cycle_search = next_scan_start;
idx_cycle_search < N;
++ idx_cycle_search)
if (permute[idx_cycle_search] >= 0)
next_scan_start = idx_cycle_search + 1;
// This is a provable invariant. In short, number of non-negative
// elements in permute[] equals (N - swaps)
assert( idx_cycle_search < N );
// Completely permute one permutation cycle, 'following the
// permutation cycle's trail' This is O(N)
while (permute[idx_cycle_search] >= 0)
swap( data[idx_cycle_search], data[permute[idx_cycle_search] )
swaps ++;
old_idx = idx_cycle_search;
idx_cycle_search = permute[idx_cycle_search];
permute[old_idx] = -1;
// Also '= -idx_cycle_search -1' could be used rather than '-1'
// and would allow reversal of these changes to permute[] array
Do you mean that you have an array of objects O[1..N] and then you have an array P[1..N] that contains a permutation of numbers 1..N and in the end you want to get an array O1 of objects such that O1[k] = O[P[k]] for all k=1..N ?
As an example, if your objects are letters A,B,C...,Y,Z and your array P is [26,25,24,..,2,1] is your desired output Z,Y,...C,B,A ?
If yes, I believe you can do it in linear time using only O(1) additional memory. Reversing elements of an array is a special case of this scenario. In general, I think you would need to consider decomposition of your permutation P into cycles and then use it to move around the elements of your original array O[].
If that's what you are looking for, I can elaborate more.
EDIT: Others already presented excellent solutions while I was sleeping, so no need to repeat it here. ^_^
EDIT: My O(1) additional space is indeed not entirely correct. I was thinking only about "data" elements, but in fact you also need to store one bit per permutation element, so if we are precise, we need O(log n) extra bits for that. But most of the time using a sign bit (as suggested by J.F. Sebastian) is fine, so in practice we may not need anything more than we already have.
If you didn't mind allocating memory for an extra hash of indexes, you could keep a mapping of original location to current location to get a time complexity of near O(n). Here's an example in Ruby, since it's readable and pseudocode-ish. (This could be shorter or more idiomatically Ruby-ish, but I've written it out for clarity.)
objects = ['d', 'e', 'a', 'c', 'b']
order = [2, 4, 3, 0, 1]
cur_locations = {}
order.each_with_index do |orig_location, ordinality|
# Find the current location of the item.
cur_location = orig_location
while not cur_locations[cur_location].nil? do
cur_location = cur_locations[cur_location]
# Swap the items and keep track of whatever we swapped forward.
objects[ordinality], objects[cur_location] = objects[cur_location], objects[ordinality]
cur_locations[ordinality] = orig_location
puts objects.join(' ')
That obviously does involve some extra memory for the hash, but since it's just for indexes and not your "fairly large" objects, hopefully that's acceptable. Since hash lookups are O(1), even though there is a slight bump to the complexity due to the case where an item has been swapped forward more than once and you have to rewrite cur_location multiple times, the algorithm as a whole should be reasonably close to O(n).
If you wanted you could build a full hash of original to current positions ahead of time, or keep a reverse hash of current to original, and modify the algorithm a bit to get it down to strictly O(n). It'd be a little more complicated and take a little more space, so this is the version I wrote out, but the modifications shouldn't be difficult.
EDIT: Actually, I'm fairly certain the time complexity is just O(n), since each ordinality can have at most one hop associated, and thus the maximum number of lookups is limited to n.
#!/usr/bin/env python
def rearrange(objects, permutation):
"""Rearrange `objects` inplace according to `permutation`.
``result = [objects[p] for p in permutation]``
seen = [False] * len(permutation)
for i, already_seen in enumerate(seen):
if not already_seen: # start permutation cycle
first_obj, j = objects[i], i
while True:
seen[j] = True
p = permutation[j]
if p == i: # end permutation cycle
objects[j] = first_obj # [old] p -> j
objects[j], j = objects[p], p # p -> j
The algorithm (as I've noticed after I wrote it) is the same as the one from #meriton's answer in Java.
Here's a test function for the code:
def test():
import itertools
N = 9
for perm in itertools.permutations(range(N)):
L = range(N)
LL = L[:]
rearrange(L, perm)
assert L == [LL[i] for i in perm] == list(perm), (L, list(perm), LL)
# test whether assertions are enabled
assert 0
except AssertionError:
raise RuntimeError("assertions must be enabled for the test")
if __name__ == "__main__":
There's a histogram sort, though the running time is given as a bit higher than O(N) (N log log n).
I can do it given O(N) scratch space -- copy to new array and copy back.
EDIT: I am aware of the existance of an algorithm that will proceed through. The idea is to perform the swaps on the array of integers 1..N while at the same time mirroring the swaps on your array of large objects. I just cannot find the algorithm right now.
The problem is one of applying a permutation in place with minimal O(1) extra storage: "in-situ permutation".
It is solvable, but an algorithm is not obvious beforehand.
It is described briefly as an exercise in Knuth, and for work I had to decipher it and figure out how it worked. Look at 5.2 #13.
For some more modern work on this problem, with pseudocode:
I ended up writing a different algorithm for this, which first generates a list of swaps to apply an order and then runs through the swaps to apply it. The advantage is that if you're applying the ordering to multiple lists, you can reuse the swap list, since the swap algorithm is extremely simple.
void make_swaps(vector<int> order, vector<pair<int,int>> &swaps)
// order[0] is the index in the old list of the new list's first value.
// Invert the mapping: inverse[0] is the index in the new list of the
// old list's first value.
vector<int> inverse(order.size());
for(int i = 0; i < order.size(); ++i)
inverse[order[i]] = i;
for(int idx1 = 0; idx1 < order.size(); ++idx1)
// Swap list[idx] with list[order[idx]], and record this swap.
int idx2 = order[idx1];
if(idx1 == idx2)
swaps.push_back(make_pair(idx1, idx2));
// list[idx1] is now in the correct place, but whoever wanted the value we moved out
// of idx2 now needs to look in its new position.
int idx1_dep = inverse[idx1];
order[idx1_dep] = idx2;
inverse[idx2] = idx1_dep;
template<typename T>
void run_swaps(T data, const vector<pair<int,int>> &swaps)
for(const auto &s: swaps)
int src = s.first;
int dst = s.second;
swap(data[src], data[dst]);
void test()
vector<int> order = { 2, 3, 1, 4, 0 };
vector<pair<int,int>> swaps;
make_swaps(order, swaps);
vector<string> data = { "a", "b", "c", "d", "e" };
run_swaps(data, swaps);
