Can I check whether a bounded list contains duplicates, in linear time? - algorithm

Suppose I have an Int list where elements are known to be bounded and the list is known to be no longer than their range, so that it is entirely possible for it not to contain duplicates. How can I test most quickly whether it is the case?
I know of nubOrd. It is quite fast. We can pass our list through and see if it becomes shorter. But the efficiency of nubOrd is still not linear.
My idea is that we can trade space for time efficiency. Imperatively, we would allocate a bit field as wide as our range, and then traverse the list, marking the entries corresponding to the list elements' values. As soon as we try to flip a bit that is already 1, we return False. It only takes (read + compare + write) * length of the list. No binary search trees, no nothing.
Is it reasonable to attempt a similar construction in Haskell?

The discrimination package has a linear time nub you can use. Or a linear time group that doesn't require the equivalent elements to be adjacent in order to group them, so you could see if any of the groups are not size 1.
The whole package is based on sidestepping the well known bounds on comparison-based sorts (and joins, and etc) by using algorithms based on "discrimination" rather than ones based on comparisons. As I understand it, the technique is somewhat like a radix sort, but generalised to ADTs.

For integers (and other Ix-like types), you could use a mutable array, for example with the array package.
We can here use a STUArray here, like:
import Control.Monad.ST
import Data.Array.ST
updateDups_ :: [Int] -> STArray s Int Bool -> ST s Bool
updateDups_ [] _ = return False
updateDups_ (x:xs) arr = do
contains <- readArray arr x
if contains then return True
else writeArray arr x True >> updateDups_ xs arr
withDups_ :: Int -> [Int] -> ST s Bool
withDups_ mx l = newArray (0, mx) False >>= updateDups_ l
withDups :: Int -> [Int] -> Bool
withDups mx ls = runST (withDups_ mx ls)
For example:
Prelude Control.Monad.ST Data.Array.ST> withDups 17 [1,4,2,5]
False
Prelude Control.Monad.ST Data.Array.ST> withDups 17 [1,4,2,1]
True
Prelude Control.Monad.ST Data.Array.ST> withDups 17 [1,4,2,16,2]
True
So here the first parameter is the maximum value that can be added in the list, and the second parameter the list of values we want to check.

So you have a list of size N, and you know that the elements in the list are within the range min .. min+N-1.
There is a simple linear time algorithm that requires O(1) space.
First, scan the list to find the minimum and maximum elements.
If (max - min + 1) < N then you know there's a duplicate. Otherwise ...
Because the range is N, the minimum item can go at a[0], and the max item at a[n-1]. You can map any item to its position in the array simply by subtracting min. You can do an in-place sort in O(n) because you know exactly where every item should go.
Starting at the beginning of the list, take the first element and subtract min to determine where it should go. Go to that position, and replace the item that's there. With the new item, compute where it should go, and replace the item in that position, etc.
If you ever get to a point where you're you're trying to place an item at a[x], and the value already there is the value that's supposed to be there (i.e. a[x] == x+min), then you've found a duplicate.
The code to do all this is pretty simple:
Corrected code.
min, max = findMinMax()
currentIndex = 0
while currentIndex < N
temp = a[currentIndex]
targetIndex = temp - min;
// Do this until we wrap around to the current index
// If the item is already in place, then targetIndex == currentIndex,
// and we won't enter the loop.
while targetIndex != currentIndex
if (a[targetIndex] == temp)
// the item at a[targetIndex] is the item that's supposed to be there.
// The only way that can happen is if the item we have in temp is a duplicate.
found a duplicate
end if
save = a[targetIndex]
a[targetIndex] = temp
temp = save
targetIndex = temp - min
end while
// At this point, targetIndex == currentIndex.
// We've wrapped around and need to place the last item.
// There's no need to check here if a[targetIndex] == temp, because if it did,
// we would not have entered the loop.
a[targetIndex] = temp
++currentIndex
end while
That's the basic idea.

Related

Implementing Memoization efficiently on nonintegral keys

I am new to Haskell and have been practicing by doing some simple programming challenges. The last 2 days, I've been trying to implement the unbounded knapsack problem here. The algorithm I'm using is described on the wikipedia page, though for this problem the word 'weight' is replaced with the word 'length'. Anyways, I started by writing the code without memoization:
maxValue :: [(Int,Int)] -> Int -> Int
maxValue [] len = 0
maxValue ((l, val): other) len =
if l > len then
skipValue
else
max skipValue takeValue
where skipValue = maxValue other len
takeValue = (val + maxValue ([(l, val)] ++ other) (len - l)
I had hoped that haskell would be nice and have some nice syntax like #pragma memoize to help me, but looking around for examples, the solution was explained with this fibonacci problem code.
memoized_fib :: Int -> Integer
memoized_fib = (map fib [0 ..] !!)
where fib 0 = 0
fib 1 = 1
fib n = memoized_fib (n-2) + memoized_fib (n-1)
After grasping the concept behind this example, I was very disappointed - the method used is super hacky and only works if 1) the input to the function is a single integer, and 2) the function needs to compute the values recursively in the order f(0), f(1), f(2), ... But what if my parameters are vectors or sets? And if I want to memoize a function like f(n) = f(n/2) + f(n/3), I need to compute the value of f(i) for all i less than n, when I don't need most of those values. (Others have pointed out this claim is false)
I tried implementing what I wanted by passing a memo table that we slowly fill up as an extra parameter:
maxValue :: (Map.Map (Int, Int) Int) -> [(Int,Int)] -> Int -> (Map.Map (Int, Int) Int, Int)
maxValue m [] len = (m, 0)
maxValue m ((l, val) : other) len =
if l > len then
(mapWithSkip, skipValue)
else
(mapUnion, max skipValue (takeValue+val))
where (skipMap, skipValue) = maxValue m other len
mapWithSkip = Map.insertWith' max (1 + length other, len) skipValue skipMap
(takeMap, takeValue) = maxValue m ([(l, val)] ++ other) (len - l)
mapWithTake = Map.insertWith' max (1 + length other, len) (takeValue+val) mapWithSkip
mapUnion = Map.union mapWithSkip mapWithTake
But this is too slow, I believe because Map.union takes too long, it's O(n+m) rather than O(min(n,m)). Furthermore, this code seems a quite messy for something as simple as memoizaton. For this specific problem, you might be able to get away with generalizing the hacky approach to 2 dimensions, and computing a bit extra, but I want to know how to do memoization in a more general sense. How can I implement memoization in this more general form while maintaining the same complexity as the code would have in imperative languages?
And if I want to memoize a function like f(n) = f(n/2) + f(n/3), I need to compute the value of f(i) for all i less than n, when I don't need most of those values.
No, laziness means that values that are not used never get computed. You allocate a thunk for them in case they are ever used, so it's a nonzero amount of CPU and RAM dedicated to this unused value, but e.g. evaluating f 6 never causes f 5 to be evaluated. So presuming that the expense of calculating an item is much higher than the expense of allocating a cons cell, and that you end up looking at a large percentage of the total possible values, the wasted work this method uses is small.
But what if my parameters are vectors or sets?
Use the same technique, but with a different data structure than a list. A map is the most general approach, provided that your keys are Ord and also that you can enumerate all the keys you will ever need to look up.
If you can't enumerate all the keys, or you plan to look up many fewer keys than the total number possible, then you can use State (or ST) to simulate the imperative process of sharing a writable memoization cache between invocations of your function.
I would have liked to show you how this works, but I find your problem statement / links confusing. The exercise you link to does seem to be equivalent to the UKP in the Wikipedia article you link to, but I don't see anything in that article that looks like your implementation. The "Dynamic programming in-advance algorithm" Wikipedia gives is explicitly designed to have the exact same properties as the fib memoization example you gave. The key is a single Int, and the array is built from left to right: starting with len=0 as the base case, and basing all other computations on already-computed values. It also, for some reason I don't understand, seems to assume you will have at least 1 copy of each legal-sized object, rather than at least 0; but that is easily fixed if you have different constraints.
What you've implemented is totally different, starting from the total len, and choosing for each (length, value) step how many pieces of size length to cut up, then recursing with a smaller len and removing the front item from your list of weight-values. It's closer to the traditional "how many ways can you make change for an amount of currency given these denominations" problem. That, too, is amenable to the same left-to-right memoization approach as fib, but in two dimensions (one dimension for amount of currency to make change for, and another for number of denominations remaining to be used).
My go-to way to do memoization in Haskell is usually MemoTrie. It's pretty straightforward, it's pure, and it usually does what I'm looking for.
Without thinking too hard, you could produce:
import Data.MemoTrie (memo2)
maxValue :: [(Int,Int)] -> Int -> Int
maxValue = memo2 go
where
go [] len = 0
go lst#((l, val):other) len =
if l > len then skipValue else max skipValue takeValue
where
skipValue = maxValue other len
takeValue = val + maxValue lst (len - l)
I don't have your inputs, so I don't know how fast this will go — it's a little strange to memoize the [(Int,Int)] input. I think you recognize this too because in your own attempt, you actually memoize over the length of the list, not the list itself. If you want to do that, it makes sense to convert your list to a constant-time-lookup array and then memoize. This is what I came up with:
import qualified GHC.Arr as Arr
maxValue :: [(Int,Int)] -> Int -> Int
maxValue lst = memo2 go 0
where
values = Arr.listArray (0, length lst - 1) lst
go i _ | i >= length lst = 0
go i len = if l > len then skipValue else max skipValue takeValue
where
(l, val) = values Arr.! i
skipValue = go (i+1) len
takeValue = val + go i (len - l)
General, run-of-the-mill memoization in Haskell can be implemented the same way it is in other languages, by closing a memoized version of the function over a mutable map that caches the values. If you want the convenience of running the function as if it was pure, you'll need to maintain the state in IO and use unsafePerformIO.
The following memoizer will probably be sufficient for most code submission websites, as it depends only on System.IO.Unsafe, Data.IORef, and Data.Map.Strict, which should usually be available.
import qualified Data.Map.Strict as Map
import System.IO.Unsafe
import Data.IORef
memo :: (Ord k) => (k -> v) -> (k -> v)
memo f = unsafePerformIO $ do
m <- newIORef Map.empty
return $ \k -> unsafePerformIO $ do
mv <- Map.lookup k <$> readIORef m
case mv of
Just v -> return v
Nothing -> do
let v = f k
v `seq` modifyIORef' m $ Map.insert k v
return v
From your question and comments, you seem to be the sort of person who's perpetually disappointed (!), so perhaps the use of unsafePerformIO will disappoint you, but if GHC actually provided a memoization pragma, this is probably what it would be doing under the hood.
For an example of straightforward use:
fib :: Int -> Int
fib = memo fib'
where fib' 0 = 0
fib' 1 = 1
fib' n = fib (n-1) + fib (n-2)
main = do
print $ fib 100000
or more to the point (SPOILERS?!), a version of your maxValue memoized in the length only:
maxValue :: [(Int,Int)] -> Int -> Int
maxValue values = go
where go = memo (go' values)
go' [] len = 0
go' ((l, val): other) len =
if l > len then
skipValue
else
max skipValue takeValue
where skipValue = go' other len
takeValue = val + go (len - l)
This does a little more work than necessary, since the takeValue case re-evaluates the full set of marketable pieces, but it was fast enough to pass all the test cases on the linked web page. If it wasn't fast enough, then you'd need a memoizer that memoizes a function with results shared across calls with non-identical arguments (same length, but different marketable pieces, where you know the answer is going to be the same anyway because of special aspects of the problem and the order in which you check different marketable pieces and lengths). This would be a non-standard memoization, but it wouldn't be hard to modify the memo function to handle this case, I don't think, simply by splitting the argument up into a "key" argument and a "non-key" argument, or deriving the key from the argument via an arbitrary function supplied at memoization time.

product of two number in a array

Given a array of n distinct integer. Find all pairs of x,y in the array such that z(given) = x * y...do it without sorting and in a most efficient manner..
[edit] Integer are within range of int i.e 0-65536 and numbers are non negative if that helps.
Dont want to sort coz it will take a lot of time. Storage space is not a issue.
Here is linear time hash based solution:
Let hash be an array of size 65537 initilized to 0.
foreach element ele in Array
if ele != 0
hash[product/ele] = ele
end-if
if hash[ele] != 0 AND ele * hash[ele] == product
print ele, product/ele
end-if
end-foreach
There aren't any super efficient ways of doing this. The best I can think of is O(n^2):
Have an auxiliary function that takes a number (a) and a list, and goes through every element (b) checking a*b = z and saving the pair if it is.
Go through every element of your original list, and if a particular element (x) divides z (ie z % x = 0) then send x and the remainder of the list after x to the auxiliary function.
UPDATE:
I'm giving an O(n^2) solution because the question did not specify unique pairs. If only unique pairs are desired, this should be added to the question. Also, my solution assumes the order of pairs doesn't matter, which is another detail that should be clarified.
Iterate through the array...if an element x can divide z (ie z % x == 0), check if it's other factor y=(z/x) exists in the HashTable....
If it does, then you found a pair...else just add it to the hashTable and continue...

F# Efficiently removing n items from the end of a Set

I know I can remove the last element from a set:
s.Remove(s.MaximumElement)
But if I want to remove the n maximum elements... do I just execute the above n times, or is there a faster way to do that?
To be clear, this is an obvious solution:
let rec removeLastN (s : Set<'a>, num : int) : Set<'a> =
match num with
| 0 -> s
| _ -> removeLast(s.Remove(s.MinimumElement), num-1)
But it involves creating a new set n times. Is there a way to do it and only create a new set once?
But it involves creating a new set n
times. Is there a way to do it and
only create a new set once?
To the best of my knowledge, no. I'd say what you have a perfectly fine implementation, it runs in O(lg n) -- and its concise too :) Most heap implementations give you O(lg n) for delete min anyway, so what you have is about as good as you can get it.
You might be able to get a little better speed by rolling your balanced tree, and implementing a function to drop a left or right branch for all values greater than a certain value. I don't think an AVL tree or RB tree are appropriate in this context, since you can't really maintain their invariants, but a randomlized tree will give you the results you want.
A treap works awesome for this, because it uses randomization rather than tree invariants to keep itself relatively balanced. Unlike an AVL tree or a RB-tree, you can split a treap on a node without worrying about it being unbalanced. Here's a treap implementation I wrote a few months ago:
http://pastebin.com/j0aV3DJQ
I've added a split function, which will allows you take a tree and return two trees containing all values less than and all values greater than a given value. split runs in O(lg n) using a single pass through the tree, so you can prune entire branches of your tree in one shot -- provided that you know which value to split on.
But if I want to remove the n maximum
elements... do I just execute the
above n times, or is there a faster
way to do that?
Using my Treap class:
open Treap
let nthLargest n t = Seq.nth n (Treap.toSeqBack t)
let removeTopN n t =
let largest = nthLargest n t
let smallerValues, wasFound, largerValues = t.Split(largest)
smallerValues
let e = Treap.empty(fun (x : int) (y : int) -> x.CompareTo(y))
let t = [1 .. 100] |> Seq.fold (fun (acc : Treap<_>) x -> acc.Insert(x)) e
let t' = removeTopN 10 t
removeTopN runs in O(n + lg m) time, where n is the index into the tree sequence and m is the number of items in the tree.
I make no guarantees about the accuracy of my code, use at your own peril ;)
In F#, you can use Set.partition or Set.filter to create sub sets:
let s = Set([1;4;6;9;100;77])
let a, b = Set.partition (fun x -> x <= 10) s
let smallThan10 = Set.filter (fun x -> x < 10) s
In your question, maybe you don't know the value of the ith number of your set, so here is a handy function for that:
let nth (n:int) (s:'a Set) =
s |> Set.toSeq |> Seq.nth n
Now, we can write the remove-top-n function:
let removeTopN n (s:'a Set) =
let size = s.Count
let m = size - n
let mvalue = nth m s
Set.filter (fun x -> x < mvalue) s
and test it:
removeTopN 3 s
and we get:
val it : Set<int> = set [1; 4; 6]
Notice that the removeTopN does not work for a set containing multiple same values.
That is already a pretty good solution. OCaml has a split function that can split a Set so you can find the right element then you can split the Set to remove a bunch of elements at a time. Alternatively, you can use Set.difference to extract another Set of elements.

Find unique common element from 3 arrays

Original Problem:
I have 3 boxes each containing 200 coins, given that there is only one person who has made calls from all of the three boxes and thus there is one coin in each box which has same fingerprints and rest of all coins have different fingerprints. You have to find the coin which contains same fingerprint from all of the 3 boxes. So that we can find the fingerprint of the person who has made call from all of the 3 boxes.
Converted problem:
You have 3 arrays containing 200 integers each. Given that there is one and only one common element in these 3 arrays. Find the common element.
Please consider solving this for other than trivial O(1) space and O(n^3) time.
Some improvement in Pelkonen's answer:
From converted problem in OP:
"Given that there is one and only one common element in these 3 arrays."
We need to sort only 2 arrays and find common element.
If you sort all the arrays first O(n log n) then it will be pretty easy to find the common element in less than O(n^3) time. You can for example use binary search after sorting them.
Let N = 200, k = 3,
Create a hash table H with capacity ≥ Nk.
For each element X in array 1, set H[X] to 1.
For each element Y in array 2, if Y is in H and H[Y] == 1, set H[Y] = 2.
For each element Z in array 3, if Z is in H and H[Z] == 2, return Z.
throw new InvalidDataGivenByInterviewerException();
O(Nk) time, O(Nk) space complexity.
Use a hash table for each integer and encode the entries such that you know which array it's coming from - then check for the slot which has entries from all 3 arrays. O(n)
Use a hashtable mapping objects to frequency counts. Iterate through all three lists, incrementing occurrence counts in the hashtable, until you encounter one with an occurrence count of 3. This is O(n), since no sorting is required. Example in Python:
def find_duplicates(*lists):
num_lists = len(lists)
counts = {}
for l in lists:
for i in l:
counts[i] = counts.get(i, 0) + 1
if counts[i] == num_lists:
return i
Or an equivalent, using sets:
def find_duplicates(*lists):
intersection = set(lists[0])
for l in lists[1:]:
intersection = intersection.intersect(set(l))
return intersection.pop()
O(N) solution: use a hash table. H[i] = list of all integers in the three arrays that map to i.
For all H[i] > 1 check if three of its values are the same. If yes, you have your solution. You can do this check with the naive solution even, it should still be very fast, or you can sort those H[i] and then it becomes trivial.
If your numbers are relatively small, you can use H[i] = k if i appears k times in the three arrays, then the solution is the i for which H[i] = 3. If your numbers are huge, use a hash table though.
You can extend this to work even if you can have elements that can be common to only two arrays and also if you can have elements repeating elements in one of the arrays. It just becomes a bit more complicated, but you should be able to figure it out on your own.
If you want the fastest* answer:
Sort one array--time is N log N.
For each element in the second array, search the first. If you find it, add 1 to a companion array; otherwise add 0--time is N log N, using N space.
For each non-zero count, copy the corresponding entry into the temporary array, compacting it so it's still sorted--time is N.
For each element in the third array, search the temporary array; when you find a hit, stop. Time is less than N log N.
Here's code in Scala that illustrates this:
import java.util.Arrays
val a = Array(1,5,2,3,14,1,7)
val b = Array(3,9,14,4,2,2,4)
val c = Array(1,9,11,6,8,3,1)
Arrays.sort(a)
val count = new Array[Int](a.length)
for (i <- 0 until b.length) {
val j =Arrays.binarySearch(a,b(i))
if (j >= 0) count(j) += 1
}
var n = 0
for (i <- 0 until count.length) if (count(i)>0) { count(n) = a(i); n+= 1 }
for (i <- 0 until c.length) {
if (Arrays.binarySearch(count,0,n,c(i))>=0) println(c(i))
}
With slightly more complexity, you can either use no extra space at the cost of being even more destructive of your original arrays, or you can avoid touching your original arrays at all at the cost of another N space.
Edit: * as the comments have pointed out, hash tables are faster for non-perverse inputs. This is "fastest worst case". The worst case may not be so unlikely unless you use a really good hashing algorithm, which may well eat up more time than your sort. For example, if you multiply all your values by 2^16, the trivial hashing (i.e. just use the bitmasked integer as an index) will collide every time on lists shorter than 64k....
//Begineers Code using Binary Search that's pretty Easy
// bool BS(int arr[],int low,int high,int target)
// {
// if(low>high)
// return false;
// int mid=low+(high-low)/2;
// if(target==arr[mid])
// return 1;
// else if(target<arr[mid])
// BS(arr,low,mid-1,target);
// else
// BS(arr,mid+1,high,target);
// }
// vector <int> commonElements (int A[], int B[], int C[], int n1, int n2, int n3)
// {
// vector<int> ans;
// for(int i=0;i<n2;i++)
// {
// if(i>0)
// {
// if(B[i-1]==B[i])
// continue;
// }
// //The above if block is to remove duplicates
// //In the below code we are searching an element form array B in both the arrays A and B;
// if(BS(A,0,n1-1,B[i]) && BS(C,0,n3-1,B[i]))
// {
// ans.push_back(B[i]);
// }
// }
// return ans;
// }

Is it possible to rearrange an array in place in O(N)?

If I have a size N array of objects, and I have an array of unique numbers in the range 1...N, is there any algorithm to rearrange the object array in-place in the order specified by the list of numbers, and yet do this in O(N) time?
Context: I am doing a quick-sort-ish algorithm on objects that are fairly large in size, so it would be faster to do the swaps on indices than on the objects themselves, and only move the objects in one final pass. I'd just like to know if I could do this last pass without allocating memory for a separate array.
Edit: I am not asking how to do a sort in O(N) time, but rather how to do the post-sort rearranging in O(N) time with O(1) space. Sorry for not making this clear.
I think this should do:
static <T> void arrange(T[] data, int[] p) {
boolean[] done = new boolean[p.length];
for (int i = 0; i < p.length; i++) {
if (!done[i]) {
T t = data[i];
for (int j = i;;) {
done[j] = true;
if (p[j] != i) {
data[j] = data[p[j]];
j = p[j];
} else {
data[j] = t;
break;
}
}
}
}
}
Note: This is Java. If you do this in a language without garbage collection, be sure to delete done.
If you care about space, you can use a BitSet for done. I assume you can afford an additional bit per element because you seem willing to work with a permutation array, which is several times that size.
This algorithm copies instances of T n + k times, where k is the number of cycles in the permutation. You can reduce this to the optimal number of copies by skipping those i where p[i] = i.
The approach is to follow the "permutation cycles" of the permutation, rather than indexing the array left-to-right. But since you do have to begin somewhere, everytime a new permutation cycle is needed, the search for unpermuted elements is left-to-right:
// Pseudo-code
N : integer, N > 0 // N is the number of elements
swaps : integer [0..N]
data[N] : array of object
permute[N] : array of integer [-1..N] denoting permutation (used element is -1)
next_scan_start : integer;
next_scan_start = 0;
while (swaps < N )
{
// Search for the next index that is not-yet-permtued.
for (idx_cycle_search = next_scan_start;
idx_cycle_search < N;
++ idx_cycle_search)
if (permute[idx_cycle_search] >= 0)
break;
next_scan_start = idx_cycle_search + 1;
// This is a provable invariant. In short, number of non-negative
// elements in permute[] equals (N - swaps)
assert( idx_cycle_search < N );
// Completely permute one permutation cycle, 'following the
// permutation cycle's trail' This is O(N)
while (permute[idx_cycle_search] >= 0)
{
swap( data[idx_cycle_search], data[permute[idx_cycle_search] )
swaps ++;
old_idx = idx_cycle_search;
idx_cycle_search = permute[idx_cycle_search];
permute[old_idx] = -1;
// Also '= -idx_cycle_search -1' could be used rather than '-1'
// and would allow reversal of these changes to permute[] array
}
}
Do you mean that you have an array of objects O[1..N] and then you have an array P[1..N] that contains a permutation of numbers 1..N and in the end you want to get an array O1 of objects such that O1[k] = O[P[k]] for all k=1..N ?
As an example, if your objects are letters A,B,C...,Y,Z and your array P is [26,25,24,..,2,1] is your desired output Z,Y,...C,B,A ?
If yes, I believe you can do it in linear time using only O(1) additional memory. Reversing elements of an array is a special case of this scenario. In general, I think you would need to consider decomposition of your permutation P into cycles and then use it to move around the elements of your original array O[].
If that's what you are looking for, I can elaborate more.
EDIT: Others already presented excellent solutions while I was sleeping, so no need to repeat it here. ^_^
EDIT: My O(1) additional space is indeed not entirely correct. I was thinking only about "data" elements, but in fact you also need to store one bit per permutation element, so if we are precise, we need O(log n) extra bits for that. But most of the time using a sign bit (as suggested by J.F. Sebastian) is fine, so in practice we may not need anything more than we already have.
If you didn't mind allocating memory for an extra hash of indexes, you could keep a mapping of original location to current location to get a time complexity of near O(n). Here's an example in Ruby, since it's readable and pseudocode-ish. (This could be shorter or more idiomatically Ruby-ish, but I've written it out for clarity.)
#!/usr/bin/ruby
objects = ['d', 'e', 'a', 'c', 'b']
order = [2, 4, 3, 0, 1]
cur_locations = {}
order.each_with_index do |orig_location, ordinality|
# Find the current location of the item.
cur_location = orig_location
while not cur_locations[cur_location].nil? do
cur_location = cur_locations[cur_location]
end
# Swap the items and keep track of whatever we swapped forward.
objects[ordinality], objects[cur_location] = objects[cur_location], objects[ordinality]
cur_locations[ordinality] = orig_location
end
puts objects.join(' ')
That obviously does involve some extra memory for the hash, but since it's just for indexes and not your "fairly large" objects, hopefully that's acceptable. Since hash lookups are O(1), even though there is a slight bump to the complexity due to the case where an item has been swapped forward more than once and you have to rewrite cur_location multiple times, the algorithm as a whole should be reasonably close to O(n).
If you wanted you could build a full hash of original to current positions ahead of time, or keep a reverse hash of current to original, and modify the algorithm a bit to get it down to strictly O(n). It'd be a little more complicated and take a little more space, so this is the version I wrote out, but the modifications shouldn't be difficult.
EDIT: Actually, I'm fairly certain the time complexity is just O(n), since each ordinality can have at most one hop associated, and thus the maximum number of lookups is limited to n.
#!/usr/bin/env python
def rearrange(objects, permutation):
"""Rearrange `objects` inplace according to `permutation`.
``result = [objects[p] for p in permutation]``
"""
seen = [False] * len(permutation)
for i, already_seen in enumerate(seen):
if not already_seen: # start permutation cycle
first_obj, j = objects[i], i
while True:
seen[j] = True
p = permutation[j]
if p == i: # end permutation cycle
objects[j] = first_obj # [old] p -> j
break
objects[j], j = objects[p], p # p -> j
The algorithm (as I've noticed after I wrote it) is the same as the one from #meriton's answer in Java.
Here's a test function for the code:
def test():
import itertools
N = 9
for perm in itertools.permutations(range(N)):
L = range(N)
LL = L[:]
rearrange(L, perm)
assert L == [LL[i] for i in perm] == list(perm), (L, list(perm), LL)
# test whether assertions are enabled
try:
assert 0
except AssertionError:
pass
else:
raise RuntimeError("assertions must be enabled for the test")
if __name__ == "__main__":
test()
There's a histogram sort, though the running time is given as a bit higher than O(N) (N log log n).
I can do it given O(N) scratch space -- copy to new array and copy back.
EDIT: I am aware of the existance of an algorithm that will proceed through. The idea is to perform the swaps on the array of integers 1..N while at the same time mirroring the swaps on your array of large objects. I just cannot find the algorithm right now.
The problem is one of applying a permutation in place with minimal O(1) extra storage: "in-situ permutation".
It is solvable, but an algorithm is not obvious beforehand.
It is described briefly as an exercise in Knuth, and for work I had to decipher it and figure out how it worked. Look at 5.2 #13.
For some more modern work on this problem, with pseudocode:
http://www.fernuni-hagen.de/imperia/md/content/fakultaetfuermathematikundinformatik/forschung/berichte/bericht_273.pdf
I ended up writing a different algorithm for this, which first generates a list of swaps to apply an order and then runs through the swaps to apply it. The advantage is that if you're applying the ordering to multiple lists, you can reuse the swap list, since the swap algorithm is extremely simple.
void make_swaps(vector<int> order, vector<pair<int,int>> &swaps)
{
// order[0] is the index in the old list of the new list's first value.
// Invert the mapping: inverse[0] is the index in the new list of the
// old list's first value.
vector<int> inverse(order.size());
for(int i = 0; i < order.size(); ++i)
inverse[order[i]] = i;
swaps.resize(0);
for(int idx1 = 0; idx1 < order.size(); ++idx1)
{
// Swap list[idx] with list[order[idx]], and record this swap.
int idx2 = order[idx1];
if(idx1 == idx2)
continue;
swaps.push_back(make_pair(idx1, idx2));
// list[idx1] is now in the correct place, but whoever wanted the value we moved out
// of idx2 now needs to look in its new position.
int idx1_dep = inverse[idx1];
order[idx1_dep] = idx2;
inverse[idx2] = idx1_dep;
}
}
template<typename T>
void run_swaps(T data, const vector<pair<int,int>> &swaps)
{
for(const auto &s: swaps)
{
int src = s.first;
int dst = s.second;
swap(data[src], data[dst]);
}
}
void test()
{
vector<int> order = { 2, 3, 1, 4, 0 };
vector<pair<int,int>> swaps;
make_swaps(order, swaps);
vector<string> data = { "a", "b", "c", "d", "e" };
run_swaps(data, swaps);
}

Resources