Same number of 0s and 1s algorithm [duplicate] - algorithm

This question already has answers here:
Finding the largest subarray with equal number of 0's and 1's
(11 answers)
Closed 7 years ago.
I'm trying to solve the following problem:
Given an binary array containing only 0s and 1s, find the largest subarray which contain equal no of 0s and 1s.
Examples:
Input: arr[] = {1, 0, 1, 1, 1, 0, 0,0,1}
Output: 1 to 8 (Starting and Ending indexes of output sub array)
I could only think of an O(n^2) solution (i.e. the obvious way of starting an array at each subposition and then checking all remaining elements for having the same number of 0s and 1s).
Can somebody figure out a better solution for this problem?

One teensy tiny note on wording: you say find the longest subarray, which implies uniqueness, but even in your example there is more than one ( 0 to 7 or 1 to 8). It would be worded better as "find a subarray of maximal length" or similar. But that's a non-issue.
As for a faster algorithm, first define a new array swap by replacing each instance of a 0 with a -1. This can be done in O(n) time. For your example, we would have
1, -1, 1, 1, 1, -1, -1, -1, 1
Now define another array sum such that sum[i] is the sum of all values swap[0], swap[1], ..., swap[i]. Equivalently,
sum[0] = swap[0];
for i > 1, sum[i] = sum[i-1] + swap[i]
Which again is in O(n) time. So your example becomes
1, 0, 1, 2, 3, 2, 1, 0, 1
Now for an observation. If the number of 1s is equal to the number of 0s in the subarray (arr[i], ..., arr[j]) then in the first new array the 1s will cancel with the corresponding -1, so the sum of all values (swap[i], ..., swap[j]) will be equal to 0. But this is then equal to
swap[0] + swap[1] + ... + swap[j] - (swap[0] + swap[1] + ... + swap[i-1]),
which in turn is equal to
sum[j] - sum[i-1].
Although note we have to be careful if i is equal to 0 because otherwise we are out of the array's bounds. This is an easy check to implement.
Now we have reduced the problem to finding when sum[j] - sum[i-1] is equal to 0. But this is equivalent to finding values j and i such that sum[j] = sum[i-1].
Since we know that for all values in sum they lie between -n and n (where n is the size of the initial array) you can now create another a pair of arrays min and max of size 2n+1. Here, the indices of min and max correspond to potential values of sum, where min[0] will hold the smallest index i for which sum[i] = -n, and min[1] will hold the smallest index i for which sum[i] = -n+1, and so on. Similarly max will hold the largest index. This can also be achieved in linear time. After this step, max[i] and min[i] will correspond to values for which sum[min[i]] = i = sum[max[i]].
Now all you have to do is find the largest value of max[k] - min[k], and this will give you from above i = min[k] + 1 and j = max[k], the indices of a maximal subarray containing an equal number of 0s and 1s. This is also O(n).
I sorta sketched this out a bit roughly, so you have to be careful when i = 0, but that's easily accounted for. Each step is however O(n), so there's your more efficient algorithm.

I believe this can be solved n O(n) using a weight balanced binary tree structure.

Related

Find the longest sequence length whose sum is divisible by 3

I have an exercise that needs to be done with O(n) time complexity, however, I can only solve it with an O(n^2) solution.
You have an array and you need to count the longest contiguous sequence such that it's sum can be divided to 3 without any remainder. For example for array {1,2,3,-4,-1), the function will return 4 because the longest sequence that its sum(0) can be divided to 3 is {2,3,-4,-1}.
My solution O(n^2) is based on arithmetic progression. Is there any way to do it with O(n) complexity?
Please, I only want a clue or a theoretical explanation. Please don't write the full solution :)
Let's take a look at prefix sums. A [L, R] subarray is divisble by 3 if and only if prefixSum[L - 1] mod 3 = prefixSum[R] mod 3. This observation gives a very simple linear solution(because there are only 3 possible values of a prefix sum mod 3, we can simply find the first and the last one).
For example, if the input array is {1, 2, 3, -4, -1}, the prefix sums are {0, 1, 0, 0, 2, 1}. (there are n + 1 prefix sums because of an empty prefix). Now you can just take a look at the first and last occurrence of 0, 1 and 2.
As a non-CS person, this is interesting. First approach of mine was simply to calc the running sum mod 3. You'll get a sequence of {0,1,2}. Now look for the first and the last 0, the first and the last 1 and the first and the last 2, and compare their respective distances...
Iterate through the array, summing the total as you go. Record the position of the first position where the modulo sum is 0. Also, record the position of he first position where the modulo sum is 1. And, finally, record the position of he first position where the modulo sum is 2.
Do the same thing backwards also, recording the last position where the modulo sum is 0, 1, and 2. That gives three possibilities for the longest sequence - you just check which pair are farthest apart.
You apply dynamic programming.
For every position you compute 3 values:
The longest sequence ending in that position which has sum s = 0 mod 3
The longest sequence ending in that position which has sum s = 1 mod 3
The longest sequence ending in that position which has sum s = 2 mod 3
So given this value for position i you can easily compute the new ones for position i+1.

Generate a random integer from 0 to N-1 which is not in the list

You are given N and an int K[].
The task at hand is to generate a equal probabilistic random number between 0 to N-1 which doesn't exist in K.
N is strictly a integer >= 0.
And K.length is < N-1. And 0 <= K[i] <= N-1. Also assume K is sorted and each element of K is unique.
You are given a function uniformRand(int M) which generates uniform random number in the range 0 to M-1 And assume this functions's complexity is O(1).
Example:
N = 7
K = {0, 1, 5}
the function should return any random number { 2, 3, 4, 6 } with equal
probability.
I could get a O(N) solution for this : First generate a random number between 0 to N - K.length. And map the thus generated random number to a number not in K. The second step will take the complexity to O(N). Can it be done better in may be O(log N) ?
You can use the fact that all the numbers in K[] are between 0 and N-1 and they are distinct.
For your example case, you generate a random number from 0 to 3. Say you get a random number r. Now you conduct binary search on the array K[].
Initialize i = K.length/2.
Find K[i] - i. This will give you the number of numbers missing from the array in the range 0 to i.
For example K[2] = 5. So 3 elements are missing from K[0] to K[2] (2,3,4)
Hence you can decide whether you have to conduct the remaining search in the first part of array K or the next part. This is because you know r.
This search will give you a complexity of log(K.length)
EDIT: For example,
N = 7
K = {0, 1, 4} // modified the array to clarify the algorithm steps.
the function should return any random number { 2, 3, 5, 6 } with equal probability.
Random number generated between 0 and N-K.length = random{0-3}. Say we get 3. Hence we require the 4th missing number in array K.
Conduct binary search on array K[].
Initial i = K.length/2 = 1.
Now we see K[1] - 1 = 0. Hence no number is missing upto i = 1. Hence we search on the latter part of the array.
Now i = 2. K[2] - 2 = 4 - 2 = 2. Hence there are 2 missing numbers up to index i = 2. But we need the 4th missing element. So we again have to search in the latter part of the array.
Now we reach an empty array. What should we do now? If we reach an empty array between say K[j] & K[j+1] then it simply means that all elements between K[j] and K[j+1] are missing from the array K.
Hence all elements above K[2] are missing from the array, namely 5 and 6. We need the 4th element out of which we have already discarded 2 elements. Hence we will choose the second element which is 6.
Binary search.
The basic algorithm:
(not quite the same as the other answer - the number is only generated at the end)
Start in the middle of K.
By looking at the current value and it's index, we can determine the number of pickable numbers (numbers not in K) to the left.
Similarly, by including N, we can determine the number of pickable numbers to the right.
Now randomly go either left or right, weighted based on the count of pickable numbers on each side.
Repeat in the chosen subarray until the subarray is empty.
Then generate a random number in the range consisting of the numbers before and after the subarray in the array.
The running time would be O(log |K|), and, since |K| < N-1, O(log N).
The exact mathematics for number counts and weights can be derived from the example below.
Extension with K containing a bigger range:
Now let's say (for enrichment purposes) K can also contain values N or larger.
Then, instead of starting with the entire K, we start with a subarray up to position min(N, |K|), and start in the middle of that.
It's easy to see that the N-th position in K (if one exists) will be >= N, so this chosen range includes any possible number we can generate.
From here, we need to do a binary search for N (which would give us a point where all values to the left are < N, even if N could not be found) (the above algorithm doesn't deal with K containing values greater than N).
Then we just run the algorithm as above with the subarray ending at the last value < N.
The running time would be O(log N), or, more specifically, O(log min(N, |K|)).
Example:
N = 10
K = {0, 1, 4, 5, 8}
So we start in the middle - 4.
Given that we're at index 2, we know there are 2 elements to the left, and the value is 4, so there are 4 - 2 = 2 pickable values to the left.
Similarly, there are 10 - (4+1) - 2 = 3 pickable values to the right.
So now we go left with probability 2/(2+3) and right with probability 3/(2+3).
Let's say we went right, and our next middle value is 5.
We are at the first position in this subarray, and the previous value is 4, so we have 5 - (4+1) = 0 pickable values to the left.
And there are 10 - (5+1) - 1 = 3 pickable values to the right.
We can't go left (0 probability). If we go right, our next middle value would be 8.
There would be 2 pickable values to the left, and 1 to the right.
If we go left, we'd have an empty subarray.
So then we'd generate a number between 5 and 8, which would be 6 or 7 with equal probability.
This can be solved by basically solving this:
Find the rth smallest number not in the given array, K, subject to
conditions in the question.
For that consider the implicit array D, defined by
D[i] = K[i] - i for 0 <= i < L, where L is length of K
We also set D[-1] = 0 and D[L] = N
We also define K[-1] = 0.
Note, we don't actually need to construct D. Also note that D is sorted (and all elements non-negative), as the numbers in K[] are unique and increasing.
Now we make the following claim:
CLAIM: To find the rth smallest number not in K[], we need to find right most occurrence of r' in D (which occurs at position defined by j), where r' is the largest number in D, which is < r. Such an r' exists, because D[-1] = 0. Once we find such an r' (and j), the number we are looking for is r-r' + K[j].
Proof: Basically the definition of r' and j tells us that there are exactlyr' numbers missing from 0 to K[j], and more than r numbers missing from 0 to K[j+1]. Thus all the numbers from K[j]+1 to K[j+1]-1 are missing (and these missing are at least r-r' in number), and the number we seek is among them, given by K[j] + r-r'.
Algorithm:
In order to find (r',j) all we need to do is a (modified) binary search for r in D, where we keep moving to the left even if we find r in the array.
This is an O(log K) algorithm.
If you are running this many times, it probably pays to speed up your generation operation: O(log N) time just isn't acceptable.
Make an empty array G. Starting at zero, count upwards while progressing through the values of K. If a value isn't in K add it to G. If it is in K don't add it and progress your K pointer. (This relies on K being sorted.)
Now you have an array G which has only acceptable numbers.
Use your random number generator to choose a value from G.
This requires O(N) preparatory work and each generation happens in O(1) time. After N look-ups the amortized time of all operations is O(1).
A Python mock-up:
import random
class PRNG:
def __init__(self, K,N):
self.G = []
kptr = 0
for i in range(N):
if kptr<len(K) and K[kptr]==i:
kptr+=1
else:
self.G.append(i)
def getRand(self):
rn = random.randint(0,len(self.G)-1)
return self.G[rn]
prng=PRNG( [0,1,5], 7)
for i in range(20):
print prng.getRand()

Minimum sum that cant be obtained from a set

Given a set S of positive integers whose elements need not to be distinct i need to find minimal non-negative sum that cant be obtained from any subset of the given set.
Example : if S = {1, 1, 3, 7}, we can get 0 as (S' = {}), 1 as (S' = {1}), 2 as (S' = {1, 1}), 3 as (S' = {3}), 4 as (S' = {1, 3}), 5 as (S' = {1, 1, 3}), but we can't get 6.
Now we are given one array A, consisting of N positive integers. Their are M queries,each consist of two integers Li and Ri describe i'th query: we need to find this Sum that cant be obtained from array elements ={A[Li], A[Li+1], ..., A[Ri-1], A[Ri]} .
I know to find it by a brute force approach to be done in O(2^n). But given 1 ≤ N, M ≤ 100,000.This cant be done .
So is their any effective approach to do it.
Concept
Suppose we had an array of bool representing which numbers so far haven't been found (by way of summing).
For each number n we encounter in the ordered (increasing values) subset of S, we do the following:
For each existing True value at position i in numbers, we set numbers[i + n] to True
We set numbers[n] to True
With this sort of a sieve, we would mark all the found numbers as True, and iterating through the array when the algorithm finishes would find us the minimum unobtainable sum.
Refinement
Obviously, we can't have a solution like this because the array would have to be infinite in order to work for all sets of numbers.
The concept could be improved by making a few observations. With an input of 1, 1, 3, the array becomes (in sequence):
(numbers represent true values)
An important observation can be made:
(3) For each next number, if the previous numbers had already been found it will be added to all those numbers. This implies that if there were no gaps before a number, there will be no gaps after that number has been processed.
For the next input of 7 we can assert that:
(4) Since the input set is ordered, there will be no number less than 7
(5) If there is no number less than 7, then 6 cannot be obtained
We can come to a conclusion that:
(6) the first gap represents the minimum unobtainable number.
Algorithm
Because of (3) and (6), we don't actually need the numbers array, we only need a single value, max to represent the maximum number found so far.
This way, if the next number n is greater than max + 1, then a gap would have been made, and max + 1 is the minimum unobtainable number.
Otherwise, max becomes max + n. If we've run through the entire S, the result is max + 1.
Actual code (C#, easily converted to C):
static int Calculate(int[] S)
{
int max = 0;
for (int i = 0; i < S.Length; i++)
{
if (S[i] <= max + 1)
max = max + S[i];
else
return max + 1;
}
return max + 1;
}
Should run pretty fast, since it's obviously linear time (O(n)). Since the input to the function should be sorted, with quicksort this would become O(nlogn). I've managed to get results M = N = 100000 on 8 cores in just under 5 minutes.
With numbers upper limit of 10^9, a radix sort could be used to approximate O(n) time for the sorting, however this would still be way over 2 seconds because of the sheer amount of sorts required.
But, we can use statistical probability of 1 being randomed to eliminate subsets before sorting. On the start, check if 1 exists in S, if not then every query's result is 1 because it cannot be obtained.
Statistically, if we random from 10^9 numbers 10^5 times, we have 99.9% chance of not getting a single 1.
Before each sort, check if that subset contains 1, if not then its result is one.
With this modification, the code runs in 2 miliseconds on my machine. Here's that code on http://pastebin.com/rF6VddTx
This is a variation of the subset-sum problem, which is NP-Complete, but there is a pseudo-polynomial Dynamic Programming solution you can adopt here, based on the recursive formula:
f(S,i) = f(S-arr[i],i-1) OR f(S,i-1)
f(-n,i) = false
f(_,-n) = false
f(0,i) = true
The recursive formula is basically an exhaustive search, each sum can be achieved if you can get it with element i OR without element i.
The dynamic programming is achieved by building a SUM+1 x n+1 table (where SUM is the sum of all elements, and n is the number of elements), and building it bottom-up.
Something like:
table <- SUM+1 x n+1 table
//init:
for each i from 0 to SUM+1:
table[0][i] = true
for each j from 1 to n:
table[j][0] = false
//fill the table:
for each i from 1 to SUM+1:
for each j from 1 to n+1:
if i < arr[j]:
table[i][j] = table[i][j-1]
else:
table[i][j] = table[i-arr[j]][j-1] OR table[i][j-1]
Once you have the table, you need the smallest i such that for all j: table[i][j] = false
Complexity of solution is O(n*SUM), where SUM is the sum of all elements, but note that the algorithm can actually be trimmed after the required number was found, without the need to go on for the next rows, which are un-needed for the solution.

Find longest substring in binary string with not less ones than zeros

How to find, in a binary string, the longest substring where the balance, i.e. the difference between the number of ones and zeros, is >= 0?
Example:
01110000010 -> 6: 011100
1110000011110000111 -> 19: entire string
While this problem looks very similar to the Maximum Value Contiguous Subsequence (Maximum Contiguous Sum) problem, a dynamic programming solution doesn't seem to be obvious. In a divide-and-conquer approach, how to do the merging? Is an "efficient" algorithm possible after all? (A trivial O(n^2) algorithm will just iterate over all substrings for all possible starting points.)
This is a modified variant of Finding a substring, with some additional conditions. The difference is that in the linked question, only such substrings are allowed where balance never falls below zero (looking at the string in either forward or backward direction). In the given problem, balance is allowed to fall below zero, provided it recovers at some later stage.
I have a solution that requires O(n) additional memory and O(n) time.
Let's denote the 'height' of an index h(i) as
h(i) = <number of 1s in the substring 1..i> - <number of 0s in the same substring>
The problem can now be reformulated as: find i and j such as h(i) <= h(j) and j-i -> max.
Obviously, h(0) = 0, and if h(n) = 0, then the solution is the entire string.
Now let's compute the array B so that B[x] = min{i: h(i) = -x}. In other words, let B[x] be the leftmost index i at which h(i)= -x.
The array B[x] has a length of at most n, and is computed in one linear pass.
Now we can iterate over the original string and for each index i compute the length of the longest sequence with non-negative balance that ends on i as follows:
Lmax(i) = i - B[MIN{0, h(i)}]
The largest Lmax(i) across all i will give you the desired length.
I leave the proof as an exercise :) Contact me if you can't figure it out.
Also, my algorithm needs 2 passes of the original string, but you can collapse them into one.
This can be answered quite easily in O(n) using "height array", representing the number of 1's relative to the number of 0's. Like my answer in the linked question.
Now, instead of focusing on the original array, we now focus on two arrays indexed by the heights, and one will contain the smallest index such height is found, and the other will contain the largest index such height is found. Since we don't want a negative index, we can shift everything up, such that the minimum height is 0.
So for the sample cases (I added two more 1's at the end to show my point):
1110000011010000011111
Array height visualization
/\
/ \
/ \
\ /\/\ /
\/ \ /
\ /
\ /
\/
(lowest height = -5)
Shifted height array:
[5, 6, 7, 8, 7, 6, 5, 4, 3, 4, 5, 4, 5, 4, 3, 2, 1, 0, 1, 2, 3]
Height: 0 1 2 3 4 5 6 7 8
first_view = [17,16,15, 8, 7, 0, 1, 2, 3]
last_view = [17,18,19,20,21,22, 5, 4, 3]
note that we have 22 numbers and 23 distinct indices, 0-22, representing the 23 spaces between and padding the numbers
We can build the first_view and last_view array in O(n).
Now, for each height in the first_view, we only need to check every larger heights in last_view, and take the index with maximum difference from the first_view index. For example, from height 0, the maximum value of index in larger heights is 22. So the longest substring starting at index 17+1 will end at index 22.
To find the maximum index on the last_view array, you can convert it to a maximum to the right in O(n):
last_view_max = [22,22,22,22,22,22, 5, 4, 3]
And so finding answer is simply subtracting first_view from last_view_max,
first_view = [17,16,15, 8, 7, 0, 1, 2, 3]
last_view_max = [22,22,22,22,22,22, 5, 4, 3]
result = [ 5, 6, 7,14,15,22, 4, 2, 0]
and taking the maximum (again in O(n)), which is 22, achieved from starting index 0 to ending index 22, i.e., the whole string. =D
Proof of correctness:
Suppose that the maximum substring starts at index i, ends at index j.
If the height at index i is the same as the height at index k<i, then k..j would be a longer substring still satisfying the requirement. Therefore it suffices to consider the first index of each height. Analogously for the last index.
Compressed quadratic runtime
We will be looking for (locally) longest substrings with balance zero, starting at the beginning. We will ignore strings of zeros. (Corner cases: All zeros -> empty string, balance never reaches zero again -> entire string.) Of these substrings with balance zero, all trailing zeros will be removed.
Denote by B a substring with balance > 0 and by Z a substring with only zeros. Each input string can be decomposed as follows (pseudo-regex notation):
B? (Z B)* Z?
Each of the Bs is a maximum feasible solution, meaning that it cannot be extended in either direction without reducing balance. However, it might be possible to collapse sequences of BZB or ZBZ if the balance is still larger than zero after collapsing.
Note that it is always possible to collapse sequences of BZBZB to a single B if the ZBZ part has balance >= 0. (Can be done in one pass in linear time.) Once all such sequences have been collapsed, the balance of each ZBZ part is below zero. Still, it is possible that there exist BZB parts with balance above zero -- even that in a BZBZB sequence with balance below zero both the leading and trailing BZB parts have balance over zero. At this point, it seems to be difficult to decide which BZB to collapse.
Still quadratic...
Anyway, with this simplified data structure one can try all Bs as starting points (possibly extending to the left if there's still balance left). Run time is still quadratic, but (in practice) with a much smaller n.
Divide and conquer
Another classic. Should run in O(n log n), but rather difficult to implement.
Idea
The longest feasible substring is either in the left half, in the right half, or it passes over the boundary. Call the algorithm for both halves. For the boundary:
Assume problem size n. For the longest feasible substring that crosses the boundary, we are going to compute the balance of the left-half part of the substring.
Determine, for each possible balance between -n/2 and n/2, in the left half, the length of the longest string that ends at the boundary and has this (or a larger) balance. (Linear time!) Do the same for the right half and the longest string that starts at the boundary. The result is two arrays of size n + 1; we reverse one of them, add them element-wise and find the maximum. (Again, linear.)
Why does it work?
A substring with balance >= 0 that crosses the boundary can have balance < 0 in either the left or the right part, if the other part compensates this. ("Borrowing" balance.) The crucial question is how much to borrow; we iterate over all potential "balance credits" and find the best trade-off.
Why is this O(n log n)?
Because merging (looking at boundary-crossing string) takes only linear time.
Why is merging O(n)?
Exercise left to the reader.
Dynamic programming -- linear run time (finally!)
inspired by this blog post. Simple and efficient, one-pass online algorithm, but takes some time to explain.
Idea
The link above shows a different problem: Maximum subsequence sum. It cannot be mapped 1:1 to the given problem, here a "state" of O(n) is needed, in contrast to O(1) for the original problem. Still, the state can be updated in O(1).
Let's rephrase the problem. We are looking for the longest substring in the input where the balance, i.e. the difference between 0's and 1's, is greater than zero.
The state is similar to my other divide-and-conquer solution: We compute, for each position i and for each possible balance b the starting position s(i, b) of the longest string with balance b or greater that ends at position i. That is, the string that starts at index s(i, b) + 1 and ends at i has balance b or greater, and there is no longer such string that ends at i.
We find the result by maximizing i - s(i, 0).
Algorithm
Of course, we do not keep all s(i, b) in memory, just those for the current i (which we iterate over the input). We start with s(0, b) := 0 for b <= 0 and := undefined for b > 0. For each i, we update with the following rule:
If 1 is read: s(i, b) := s(i - 1, b - 1).
If 0 is read: s(i, b) := s(i - 1, b + 1) if defined, s(i, 0) := i if s(i - 1, 1) undefined.
The function s (for current i) can be implemented as a pointer into an array of length 2n + 1; this pointer is moved forward or backward depending on the input. At each iteration, we note the value of s(i, 0).
How does it work?
The state function s becomes effective especially if the balance from the start to i is negative. It records the earliest start point where zero balance is reached, for all possible numbers of 1s that have not been read yet.
Why does it work?
Because the recursive definition of the state function is equivalent to its direct definition -- the starting position of the longest string with balance b or greater that ends at position i.
Why is the recursive definition correct?
Proof by induction.

Sorting a permutation with minimum cost

I am given a permutation of elements {1, 2, 3, ..., N} and I have to sort it using a swap operation. An operation which swaps elements x, y has cost min(x,y).
I need to find out the minimum cost of sorting the permutation. I tought about a greedy going from N to 1 and putting each element on it's position using a swap operation, but this is not a good idea.
Would this be optimal:
Find element 2
If it is not at correct place already
Find element at position 2
If swapping that with 2 puts both to right place
Swap them
Cost = Cost + min(2, other swapped element)
repeat
Find element 1
If element 1 is at position 1
Find first element that is in wrong place
If no element found
set sorted true
else
Swap found element with element 1
Cost = Cost + 1
else
Find element that should go to the position where 1 is
Swap found element with element 1
Cost = Cost + 1
until sorted is true
If seeks are trivial, then the minimum number of swaps will be determined by the number of cycles. It would follow a principle similar to Cuckoo Hashing. You take the first value in the permutation, and look at the value at the index of the value at the original index. If those match, then swap for a single operation.
[3 2 1] : Value 3 is at index one, so look at the value at index 3.
[3 2 1] : Value 1 is at index 3, so a two index cycle exists. Swap these values.
If not, push the first index onto a stack and seek the index for the value of the second index. There will eventually be a cycle. At that point, start swapping by popping values off the stack. This will take a number of swaps equal to n-1, where n is the length of the cycle.
[3 1 2] : Value 3 is at index one, so look at the value at index 3.
[3 1 2] : Value 2 is at index 3, so add 3 to the stack and seek to index 2. Also store 3 as the beginning value of the cycle.
[3 1 2] : Value 1 is at index 2, so add 2 to the stack and seek to index 1.
[3 1 2] : Value 3 is the beginning of the cycle, so swap pop 2 off the stack and swap values 1 and 2.
[1 3 2] : Pop 3 off the stack and swap 2 and 3, resulting in a sorted list with 2 swaps.
[1 2 3]
With this algorithm, the maximum number of swaps will be N-1, where N is the total number of values. This occurs when there is an N length cycle.
EDIT : This algorithm gives the minimum number of swaps, but not necessarily the minimum value using the min(x, y) function. I haven't done the math, but I believe that the only time when swap(x, y) = {swap(1, x), swap(1, y), swap(1, x)} shouldn't be used is when x in {2,3} and n < 2; Should be easy enough to write that as a special case. It may be better to check and place 2 and 3 explicitly, then follow the algorithm mentioned in the comments to achieve sorting in two operations.
EDIT 2 : Pretty sure this will catch all cases.
while ( unsorted ) {
while ( 1 != index(1) )
swap (1 , index (1) )
if (index(2) == value#(2))
swap (2, value#(2) )
else
swap (1 , highest value out of place)
}
If you have permutation of the numbers 1, 2, ..., N, then the sorted collection will be precisely 1, 2, ..., N. So you know the answer with complexity O(0) (i.e. you don't need an algorithm at all).
If you actually want to sort the range by repeated swapping, you can repeatedly "advance and cycle": Advance over the already sorted range (where a[i] == i), and then swap a[i] with a[a[i]] until you complete the cycle. Repeat until you reach the end. That needs at most N − 1 swaps, and it basically performs a cycle decomposition of the permutation.
Hmm. An interesting question. A quick algorithm that came up to my mind is to use elements as indices. We first find the index of an element that has 1 as value, and swap it with element of that number. Eventually this will end up with 1 appearing at first position, this means you have to swap 1 with some element that isn't yet in the position, and continue. This tops at 2*N-2, and has lower limit at N-1 for permutation (2,3,...,N,1), but the exact cost will vary.
Okay, given above algorithm and examples, I think the most optimal will be to follow exchanging 1 with anything until it first hits first place, then exchange 2 with second-place if it's not in place already, then continue swapping 1 with anything not yet in place, until sorted.
set sorted=false
while (!sorted) {
if (element 1 is in place) {
if (element 2 is in place) {
find any element NOT in place
if (no element found) sorted=true
else {
swap 1 with element found
cost++
}
} else {
swap 2 with element at second place
cost+=2
}
} else {
find element with number equals to position of element 1
swap 1 with element found
cost++
}
}
Use a bucket sort with bucket size of 1.
The cost is zero, since no swaps occur.
Now make a pass through the bucket array, and swap each value back to it's corresponding position in the original array.
That is N swaps.
The sum of N is N(N+1)/2 giving you an exact fixed cost.
A different interpretation is that you just store from the bucket array, back into the original array. That is no swaps, hence the cost is zero, which is a reasonable minimum.

Resources