Find if there is an element repeating itself n/k times - performance

You have an array size n and a constant k (whatever)
You can assume the the array is of int type (although it could be of any type)
Describe an algorithm that finds if there is an element(s) that repeats itself at least n/k times... if there is return one. Do so in linear time (O(n))
The catch: do this algorithm (or even pseudo-code) using constant memory and running over the array only twice

I'm not 100% sure, but it sounds like you want to solve the Britney Spears problem—finding an item that makes up a certain fraction of a sample using constant memory.
Here is a statement of the problem in English, with a sketch of the solution:
… from a 2002 article by Erik
D. Demaine of MIT and Alejandro
López-Ortiz and J. Ian Munro of the
University of Waterloo in Canada.
Demaine and his colleagues have
extended the algorithm to cover a
more-general problem: Given a stream
of length n, identify a set of size m
that includes all the elements
occurring with a frequency greater
than n /( m +1). (In the case of m =1,
this reduces to the majority problem.)
The extended algorithm requires m
registers for the candidate elements
as well as m counters. The basic
scheme of operation is analogous to
that of the majority algorithm. When a
stream ele­ment matches one of the
candidates, the corresponding counter
is incremented; when there is no match
to any candidate, all of the counters
are decremented; if a counter is at 0,
the associated candidate is replaced
by a new element from the stream.

Create a temporary array of size (k-1) to store elements and their counts (The output elements are going to be among these k-1 elements).
Traverse through the input array and update temp[] (add/remove an element or increase/decrease count) for every traversed element. The array temp[] stores potential (k-1) candidates at every step. This step takes O(nk) time.
Iterate through final (k-1) potential candidates (stored in temp[]). or every element, check if it actually has count more than n/k. This step takes O(nk) time.
The main step is step 2, how to maintain (k-1) potential candidates at every point? The steps used in step 2 are like famous game: Tetris. We treat each number as a piece in Tetris, which falls down in our temporary array temp[]. Our task is to try to keep the same number stacked on the same column (count in temporary array is incremented).
Consider k = 4, n = 9
Given array: 3 1 2 2 2 1 4 3 3
i = 0
3 _ _
temp[] has one element, 3 with count 1
i = 1
3 1 _
temp[] has two elements, 3 and 1 with
counts 1 and 1 respectively
i = 2
3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 1 respectively.
i = 3
- - 2
3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 2 respectively.
i = 4
- - 2
- - 2
3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 3 respectively.
i = 5
- - 2
- 1 2
3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 2 and 3 respectively.
Now the question arises, what to do when temp[] is full and we see a new element – we remove the bottom row from stacks of elements, i.e., we decrease count of every element by 1 in temp[]. We ignore the current element.
i = 6
- - 2
- 1 2
temp[] has two elements, 1 and 2 with
counts as 1 and 2 respectively.
i = 7
- 2
3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 2 respectively.
i = 8
3 - 2
3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 2, 1 and 2 respectively.
Finally, we have at most k-1 numbers in temp[]. The elements in temp are {3, 1, 2}. Note that the counts in temp[] are useless now, the counts were needed only in step 2. Now we need to check whether the actual counts of elements in temp[] are more than n/k (9/4) or not. The elements 3 and 2 have counts more than 9/4. So we print 3 and 2.
Note that the algorithm doesn’t miss any output element. There can be two possibilities, many occurrences are together or spread across the array. If occurrences are together, then count will be high and won’t become 0. If occurrences are spread, then the element would come again in temp[]. Following is C++ implementation of above algorithm.
// A C++ program to print elements with count more than n/k
#include<iostream>
using namespace std;
// A structure to store an element and its current count
struct eleCount
{
int e; // Element
int c; // Count
};
// Prints elements with more than n/k occurrences in arr[] of
// size n. If there are no such elements, then it prints nothing.
void moreThanNdK(int arr[], int n, int k)
{
// k must be greater than 1 to get some output
if (k < 2)
return;
/* Step 1: Create a temporary array (contains element
and count) of size k-1. Initialize count of all
elements as 0 */
struct eleCount temp[k-1];
for (int i=0; i<k-1; i++)
temp[i].c = 0;
/* Step 2: Process all elements of input array */
for (int i = 0; i < n; i++)
{
int j;
/* If arr[i] is already present in
the element count array, then increment its count */
for (j=0; j<k-1; j++)
{
if (temp[j].e == arr[i])
{
temp[j].c += 1;
break;
}
}
/* If arr[i] is not present in temp[] */
if (j == k-1)
{
int l;
/* If there is position available in temp[], then place
arr[i] in the first available position and set count as 1*/
for (l=0; l<k-1; l++)
{
if (temp[l].c == 0)
{
temp[l].e = arr[i];
temp[l].c = 1;
break;
}
}
/* If all the position in the temp[] are filled, then
decrease count of every element by 1 */
if (l == k-1)
for (l=0; l<k; l++)
temp[l].c -= 1;
}
}
/*Step 3: Check actual counts of potential candidates in temp[]*/
for (int i=0; i<k-1; i++)
{
// Calculate actual count of elements
int ac = 0; // actual count
for (int j=0; j<n; j++)
if (arr[j] == temp[i].e)
ac++;
// If actual count is more than n/k, then print it
if (ac > n/k)
cout << "Number:" << temp[i].e
<< " Count:" << ac << endl;
}
}
/* Driver program to test above function */
int main()
{
cout << "First Test\n";
int arr1[] = {4, 5, 6, 7, 8, 4, 4};
int size = sizeof(arr1)/sizeof(arr1[0]);
int k = 3;
moreThanNdK(arr1, size, k);
cout << "\nSecond Test\n";
int arr2[] = {4, 2, 2, 7};
size = sizeof(arr2)/sizeof(arr2[0]);
k = 3;
moreThanNdK(arr2, size, k);
cout << "\nThird Test\n";
int arr3[] = {2, 7, 2};
size = sizeof(arr3)/sizeof(arr3[0]);
k = 2;
moreThanNdK(arr3, size, k);
cout << "\nFourth Test\n";
int arr4[] = {2, 3, 3, 2};
size = sizeof(arr4)/sizeof(arr4[0]);
k = 3;
moreThanNdK(arr4, size, k);
return 0;
}

There are two common (theoretical) approaches to this problem in O(n)
I) The first idea is the simplest
Step 1) While there are more than k distinct elements, Select k distinct elements and erase them all.
Step 2) Test all k distinct remaining elements for it's frequency
Proof of correctness:
Note that While step will be executed at most n/k - 1 times.
Suppose there is an element that repeats itself at least n/k times. In the worst case it could be chosen in all n/k-1 iterations and it will still be in the final array after it, after being tested it will be found.
Implementation:
Step 1 can be implemented keeping an associative array (maps a key to a value) of size k-1 (constant), you sweep from left to right on the array, if you find an element that is already on the map, increase it's counter to 1, if the element is not on the map and the map is not full yet (less than k-1 elements), add this new element with initial counting 1, if the map is full, remove 1 from the counter of each element, if any element reaches 0, remove it from the map. In the end, the elements on this map will be the equivalent as the remaining elements you need to test. If, in the last iteration your map becomes empty you need to test all the elements before erasing to cover the case that the frequency is exactly n/k.
Complexity: Considering the worst approach to this map, O(n * k) = O(n), as k is contant.
Step 2 can be implemented by counting the frequency of all (maximum) k-1 elements
Complexity: O(k*n) = O(n)
Overall complexity: O(n) + O(n) = O(n).
(there's a small detail that was different from the implementation, a difference of 1 element, this happens because we want to also cover the case of frequency exactly n/k repetitions in the pseudocode, if not, we could allow one more iteration be possible with there are exactly k different elements, not necessarily more than k)
II) The second algorithm uses the selection algorithm in linear time http://en.wikipedia.org/wiki/Selection_algorithm and the partition algorithm which also runs in linear time.
Using them, you break your array in k-1 buckets, with the invariant that any element in the ith bucket is smaller or equal than any element in the jth bucket for j > i in O(n). But note that the elements are not sorted inside each bucket.
Now, you use the fact that each bucket has n/(k-1) elements, and you're looking for an element that repeats itself at least (n/k), and (n/k) > n/(2*(k-1)). This suffices to use the majority theorem, which states that if an element is the majority (more frequent than number of elements divided by 2), then it's also the median of the array. You can get the median again using the selection algorithm.
So, you just test all medians and all pivots for the partitions, which you need to test them because they may split equal values in two different buckets, there are k-1 + k values, complexity O((2*k-1)*n)) = O(n).

A simple O(n) algorithm would be to keep a hashed map from the number found to the number of instances found. Using a hashed map is important for maintaining O(n). The, a final pass over the map will reveal the answers. This pass is also O(n) since the worst case scenario is that every element appeared only once and hence the map is the same size as the original array.

I don't know if you are restricted on which additional data structures you can use.
What about creating a hashmap with 'elements' <--> count mapping. Insertion is O(log N). Lookup is O(1). For each element, lookup on hashtable, insert if does not exist with count 1. If exists, check if count < (n/k). It will stay O(n).
EDIT:
I forgot the constant memory restriction. It's preallocating hash map entries with N elements permitted?

This is my implementation of Jerky algorithm described above:
#include <map>
#include <vector>
#include <iostream>
#include <algorithm>
std::vector<int> repeatingElements(const std::vector<int>& a, int k)
{
if (a.empty())
return std::vector<int>();
std::map<int, int> candidateMap; //value, count;
for (int i = 0; i < a.size(); i++)
{
if (candidateMap.find(a[i]) != candidateMap.end())
{
candidateMap[a[i]]++;
}
else
{
if (candidateMap.size() < k-1)
{
candidateMap[a[i]] = 1;
}
else
{
for (std::map<int, int>::iterator iter = candidateMap.begin();
iter != candidateMap.end();)
{
(iter->second)--;
if (iter->second == 0)
{
iter = candidateMap.erase(iter);
}
else
{
iter++;
}
}
}
}
}
std::vector<int> ret;
for (std::map<int, int>::iterator iter = candidateMap.begin();
iter != candidateMap.end(); iter++)
{
int candidate = iter->first;
if (std::count(a.begin(), a.end(), candidate) > (a.size() / k))
{
ret.push_back(candidate);
}
}
return ret;
}
int main()
{
std::vector<int> a = { 1, 1, 4, 2, 2, 3, 3 };
int k = 4;
std::vector<int> repeating_elements = repeatingElements(a, k);
for (int elem : repeating_elements)
{
std::cout << "Repeating more than n/" << k << " : " << elem << std::endl;
}
return 0;
}
And the output is:
Repeating more than n/4 : 1
Repeating more than n/4 : 2
Repeating more than n/4 : 3

Related

Find the kth largest element in an array after inserting the absolute difference back in the array

I recently found this question somewhere in a contest, couldn't remember though. The problem statement goes like this.
Given an unsorted positive integer array like [2,4,9], you can do an operation on the array to give it a new form. Find the kth largest element after you no longer can do the operation.
Operation is defined as follows. Absolute difference of any two elements should be re-inserted back in the array. For example for the above array, it could be [2,4,9,5,7], duplicates can't be inserted back, for example absolute diff(2,4) is 2, but 2 is already part of array.
Can anybody figure out the approach?
The answer is equal to m - k + 1 multiplied by the greatest common divisor (GCD) of the elements in the input array, where m is the number of elements in the final array, so long as this number is at least k.
To show this, we need to show that the array after applying the operation as many times as possible will always result in an array of the form [d, 2*d, 3*d, ..., m*d] in some order, where d is the GCD and m is some positive integer. There are three parts to the proof:
We need to show that d is constructible by some sequence of applying the operation. This is true because the operation allows us to do any subtractions we like where the smaller number is the one subtracted, and this is sufficient to perform Euclid's algorithm.
We need to show that all of the numbers in the claimed result are constructible. This is true because the largest number in the input array has d as a divisor by definition, so it must be m*d for some m, the smaller multiples can be constructed by repeatedly subtracting d.
We need to show that no other numbers are constructible. This is true because the result of a subtraction always shares common divisors with the two operands, and because larger numbers cannot be constructed by subtraction.
So the algorithm works as follows:
Find the GCD of the input array (e.g. by repeatedly applying Euclid's algorithm). Call the result d.
Find the maximum element of the input array, and divide it by d. Call the result m.
If m >= k, then return (m - k + 1)*d, otherwise raise an error.
The m - k + 1 term is to get the kth largest element in the result; if the kth smallest element is required, this will be k*d.
Not a complete answer, just some thoughts that might help.
We have streams of numbers. For example, given [2, 4, 9], we know all numbers with the difference of 2 will be generated down from each number, m, higher than 2, as well as m mod 2, which starts another cycle.
9-2=7
7-2=5
5-2=3
etc.
We get [2, 3, 4, 5, 7, 9] and the remainder 1. But 1, using the same procedure, will generate all numbers in the range (1, max).
I would start with considering how to obtain the smallest such remainder (greater than zero) we can have. But we may also need to consider the full range of differences that each generate such a "stream."
import java.util.*;
class TestClass {
public static void main(String args[] ) throws Exception {
Scanner s = new Scanner(System.in);
int t = s.nextInt();
while(t--!=0){
int n = s.nextInt();
int arr[] = new int[n];
int min = Integer.MAX_VALUE;
int max = Integer.MIN_VALUE;
int k;
for(int i=0;i<n;i++){
arr[i] = s.nextInt();
if(arr[i] > max)
max = arr[i];
if(arr[i] < min)
min = arr[i];
}
k = s.nextInt();
int is_dev = 1;
for(int i =0 ;i < n;i++){
if(arr[i]%min !=0){
is_dev = 0; //easily reach to [1,2,3 .... max]
break;
}
}
if(is_dev == 0){
n = max - (k - 1);
} else {
n = max - min*(k-1);
}
if(n<=0)
System.out.println("-1");
else
System.out.println(n);
}
}
}
eg1 . 4 7 9
here min - 4
max - 9
if we check all number not divisible to 4 take min number always
iteration 1 . 4 5 7 9 - for pair (4,7)
iteration 2 . 1 4 5 7 9 - for pair (4,5)
at this point if we have we can easily go upto
1 2 3 4 5 6 7 8 9
easily find kth largest element
so point is any one of number not divisible to min number then we can easily reach to 1.
eg2 . 3 9
min 3
max 9
here all number number divisible by 3 so final array like .
3 6 9
kth largest element will be : max-3(k-1)
eg 9 - 3(2-1) =6

Counting bounded slice codility

I have recently attended a programming test in codility, and the question is to find the Number of bounded slice in an array..
I am just giving you breif explanation of the question.
A Slice of an array said to be a Bounded slice if Max(SliceArray)-Min(SliceArray)<=K.
If Array [3,5,6,7,3] and K=2 provided .. the number of bounded slice is 9,
first slice (0,0) in the array Min(0,0)=3 Max(0,0)=3 Max-Min<=K result 0<=2 so it is bounded slice
second slice (0,1) in the array Min(0,1)=3 Max(0,1)=5 Max-Min<=K result 2<=2 so it is bounded slice
second slice (0,2) in the array Min(0,1)=3 Max(0,2)=6 Max-Min<=K result 3<=2 so it is not bounded slice
in this way you can find that there are nine bounded slice.
(0, 0), (0, 1), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3), (4, 4).
Following is the solution i have provided
private int FindBoundSlice(int K, int[] A)
{
int BoundSlice=0;
Stack<int> MinStack = new Stack<int>();
Stack<int> MaxStack = new Stack<int>();
for (int p = 0; p < A.Length; p++)
{
MinStack.Push(A[p]);
MaxStack.Push(A[p]);
for (int q = p; q < A.Length; q++)
{
if (IsPairBoundedSlice(K, A[p], A[q], MinStack, MaxStack))
BoundSlice++;
else
break;
}
}
return BoundSlice;
}
private bool IsPairBoundedSlice(int K, int P, int Q,Stack<int> Min,Stack<int> Max)
{
if (Min.Peek() > P)
{
Min.Pop();
Min.Push(P);
}
if (Min.Peek() > Q)
{
Min.Pop();
Min.Push(Q);
}
if (Max.Peek() < P)
{
Max.Pop();
Max.Push(P);
}
if (Max.Peek() < Q)
{
Max.Pop();
Max.Push(Q);
}
if (Max.Peek() - Min.Peek() <= K)
return true;
else
return false;
}
But as per codility review the above mentioned solution is running in O(N^2), can anybody help me in finding the solution which runs in O(N).
Maximum Time Complexity allowed O(N).
Maximum Space Complexity allowed O(N).
Disclaimer
It is possible and I demonstrate it here to write an algorithm that solves the problem you described in linear time in the worst case, visiting each element of the input sequence at a maximum of two times.
This answer is an attempt to deduce and describe the only algorithm I could find and then gives a quick tour through an implementation written in Clojure. I will probably write a Java implementation as well and update this answer but as of now that task is left as an excercise to the reader.
EDIT: I have now added a working Java implementation. Please scroll down to the end.
EDIT: Notices that PeterDeRivaz provided a sequence ([0 1 2 3 4], k=2) making the algorithm visit certain elements three times and probably falsifying it. I will update the answer at later time regarding that issue.
Unless I have overseen something trivial I can hardly imagine significant further simplification. Feedback is highly welcome.
(I found your question here when googling for codility like exercises as a preparation for a job test there myself. I set myself aside half an hour to solve it and didn't come up with a solution, so I was unhappy and spent some dedicated hammock time - now that I have taken the test I must say found the presented exercises significantly less difficult than this problem).
Observations
For any valid bounded slice of size we can say that it is divisible into the triangular number of size bounded sub-slices with their individual bounds lying within the slices bounds (including itself).
Ex. 1: [3 1 2] is a bounded slice for k=2, has a size of 3 and thus can be divided into (3*4)/2=6 sub-slices:
[3 1 2] ;; slice 1
[3 1] [1 2] ;; slices 2-3
[3] [1] [2] ;; slices 4-6
Naturally, all those slices are bounded slices for k.
When you have two overlapping slices that are both bounded slices for k but differ in their bounds, the amount of possible bounded sub-slices in the array can be calculated as the sum of the triangular numbers of those slices minus the triangular number of the count of elements they share.
Ex. 2: The bounded slices [4 3 1] and [3 1 2] for k=2 differ in bounds and overlap in the array [4 3 1 2]. They share the bounded slice [3 1] (notice that overlapping bounded slices always share a bounded slice, otherwise they could not overlap). For both slices the triangular number is 6, the triangular number of the shared slice is (2*3)/2=3. Thus the array can be divided into 6+6-3=9 slices:
[4 3 1] [3 1 2] ;; 1-2 the overlapping slices
[4 3] 6 [3 1] 6 [1 2] ;; 3-5 two slices and the overlapping slice
[4] [3] 3 [1] [2] ;; 6-9 single-element slices
As observable, the triangle of the overlapping bounded slice is part of both triangles element count, so that is why it must be subtracted from the added triangles as it otherwise would be counted twice. Again, all counted slices are bounded slices for k=2.
Approach
The approach is to find the largest possible bounded slices within the input sequence until all elements have been visited, then to sum them up using the technique described above.
A slice qualifies as one of the largest possible bounded slices (in the following text often referred as one largest possible bounded slice which shall then not mean the largest one, only one of them) if the following conditions are fulfilled:
It is bounded
It may share elements with two other slices to its left and right
It can not grow to the left or to the right without becoming unbounded - meaning: If it is possible, it has to contain so many elements that its maximum-minimum=k
By implication a bounded slice does not qualify as one of the largest possible bounded slices if there is a bounded slice with more elements that entirely encloses this slice
As a goal our algorithm must be capable to start at any element in the array and determine one largest possible bounded slice that contains that element and is the only one to contain it. It is then guaranteed that the next slice constructed from a starting point outside of it will not share the starting element of the previous slice because otherwise it would be one largest possible bounded slice with the previously found slice together (which now, by definition, is impossible). Once that algorithm has been found it can be applied sequentially from the beginning building such largest possible slices until no more elements are left. This would guarantee that each element is traversed two times in the worst case.
Algorithm
Start at the first element and find the largest possible bounded slice that includes said first element. Add the triangular number of its size to the counter.
Continue exactly one element after found slice and repeat. Subtract the triangular number of the count of elements shared with the previous slice (found searching backwards), add the triangular number of its total size (found searching forwards and backwards) until the sequence has been traversed. Repeat until no more elements can be found after a found slice, return the result.
Ex. 3: For the input sequence [4 3 1 2 0] with k=2 find the count of bounded slices.
Start at the first element, find the largest possible bounded slice:
[4 3], count=2, overlap=0, result=3
Continue after that slice, find the largest possible bounded slice:
[3 1 2], size=3, overlap=1, result=3-1+6=8
...
[1 2 0], size=3, overlap=2, result=8-3+6=11
result=11
Process behavior
In the worst case the process grows linearly in time and space. As proven above, elements are traversed two times at max. and per search for a largest possible bounded slice some locals need to be stored.
However, the process becomes dramatically faster as the array contains less largest possible bounded slices. For example, the array [4 4 4 4] with k>=0 has only one largest possible bounded slice (the array itself). The array will be traversed once and the triangular number of the count of its elements is returned as the correct result. Notice how this is complementary to solutions of worst case growth O((n * (n+1)) / 2). While they reach their worst case with only one largest possible bounded slice, for this algorithm such input would mean the best case (one visit per element in one pass from start to end).
Implementation
The most difficult part of the implementation is to find a largest bounded slice from one element scanning in two directions. When we search in one direction, we track the minimum and maximum bounds of our search and see how they compare to k. Once an element has been found that stretches the bounds so that maximum-minimum <= k does not hold anymore, we are done in that direction. Then we search into the other direction but use the last valid bounds of the backwards scan as starting bounds.
Ex.4: We start in the array [4 3 1 2 0] at the third element (1) after we have successfully found the largest bounded slice [4 3]. At this point we only know that our starting value 1 is the minimum, the maximum (of the searched largest bounded slice) or between those two. We scan backwards (exclusive) and stop after the second element (as 4 - 1 > k=2). The last valid bounds were 1 and 3. When we now scan forwards, we use the same algorithm but use 1 and 3 as bounds. Notice that even though in this example our starting element is one of the bounds, that is not always the case: Consider the same scenario with a 2 instead of the 3: Neither that 2 or the 1 would be determined to be a bound as we could find a 0 but also a 3 while scanning forwards - only then it could be decided which of 2 or 3 is a lower or upper bound.
To solve that problem here is a special counting algorithm. Don't worry if you don't understand Clojure yet, it does just what it says.
(defn scan-while-around
"Count numbers in `coll` until a number doesn't pass an (inclusive)
interval filter where said interval is guaranteed to contain
`around` and grows with each number to a maximum size of `size`.
Return count and the lower and upper bounds (inclusive) that were not
passed as [count lower upper]."
([around size coll]
(scan-while-around around around size coll))
([lower upper size coll]
(letfn [(step [[count lower upper :as result] elem]
(let [lower (min lower elem)
upper (max upper elem)]
(if (<= (- upper lower) size)
[(inc count) lower upper]
(reduced result))))]
(reduce step [0 lower upper] coll))))
Using this function we can search backwards, from before the starting element passing it our starting element as around and using k as the size.
Then we start a forward scan from the starting element with the same function, by passing it the previously returned bounds lower and upper.
We add their returned counts to the total count of the found largest possible slide and use the count of the backwards scan as the length of the overlap and subtract its triangular number.
Notice that in any case the forward scan is guaranteed to return a count of at least one. This is important for the algorithm for two reasons:
We use the resulting count of the forward scan to determine the starting point of the next search (and would loop infinitely with it being 0)
The algorithm would not be correct as for any starting element the smallest possible largest possible bounded slice always exists as an array of size 1 containing the starting element.
Assuming that triangular is a function returning the triangular number, here is the final algorithm:
(defn bounded-slice-linear
"Linear implementation"
[s k]
(loop [start-index 0
acc 0]
(if (< start-index (count s))
(let [start-elem (nth s start-index)
[backw lower upper] (scan-while-around start-elem
k
(rseq (subvec s 0
start-index)))
[forw _ _] (scan-while-around lower upper k
(subvec s start-index))]
(recur (+ start-index forw)
(-> acc
(+ (triangular (+ forw
backw)))
(- (triangular backw)))))
acc)))
(Notice that the creation of subvectors and their reverse sequences happens in constant time and that the resulting vectors share structure with the input vector so no "rest-size" depending allocation is happening (although it may look like it). This is one of the beautiful aspects of Clojure, that you can avoid tons of index-fiddling and usually work with elements directly.)
Here is a triangular implementation for comparison:
(defn bounded-slice-triangular
"O(n*(n+1)/2) implementation for testing."
[s k]
(reduce (fn [c [elem :as elems]]
(+ c (first (scan-while-around elem k elems))))
0
(take-while seq
(iterate #(subvec % 1) s))))
Both functions only accept vectors as input.
I have extensively tested their behavior for correctness using various strategies. Please try to prove them wrong anyway. Here is a link to a full file to hack on: https://www.refheap.com/32229
Here is the algorithm implemented in Java (not tested as extensively but seems to work, Java is not my first language. I'd be happy about feedback to learn)
public class BoundedSlices {
private static int triangular (int i) {
return ((i * (i+1)) / 2);
}
public static int solve (int[] a, int k) {
int i = 0;
int result = 0;
while (i < a.length) {
int lower = a[i];
int upper = a[i];
int countBackw = 0;
int countForw = 0;
for (int j = (i-1); j >= 0; --j) {
if (a[j] < lower) {
if (upper - a[j] > k)
break;
else
lower = a[j];
}
else if (a[j] > upper) {
if (a[j] - lower > k)
break;
else
upper = a[j];
}
countBackw++;
}
for (int j = i; j <a.length; j++) {
if (a[j] < lower) {
if (upper - a[j] > k)
break;
else
lower = a[j];
}
else if (a[j] > upper) {
if (a[j] - lower > k)
break;
else
upper = a[j];
}
countForw++;
}
result -= triangular(countBackw);
result += triangular(countForw + countBackw);
i+= countForw;
}
return result;
}
}
Now codility release their golden solution with O(N) time and space.
https://codility.com/media/train/solution-count-bounded-slices.pdf
if you still confused after read the pdf, like me.. here is a
very nice explanation
The solution from the pdf:
def boundedSlicesGolden(K, A):
N = len(A)
maxQ = [0] * (N + 1)
posmaxQ = [0] * (N + 1)
minQ = [0] * (N + 1)
posminQ = [0] * (N + 1)
firstMax, lastMax = 0, -1
firstMin, lastMin = 0, -1
j, result = 0, 0
for i in xrange(N):
while (j < N):
# added new maximum element
while (lastMax >= firstMax and maxQ[lastMax] <= A[j]):
lastMax -= 1
lastMax += 1
maxQ[lastMax] = A[j]
posmaxQ[lastMax] = j
# added new minimum element
while (lastMin >= firstMin and minQ[lastMin] >= A[j]):
lastMin -= 1
lastMin += 1
minQ[lastMin] = A[j]
posminQ[lastMin] = j
if (maxQ[firstMax] - minQ[firstMin] <= K):
j += 1
else:
break
result += (j - i)
if result >= maxINT:
return maxINT
if posminQ[firstMin] == i:
firstMin += 1
if posmaxQ[firstMax] == i:
firstMax += 1
return result
HINTS
Others have explained the basic algorithm which is to keep 2 pointers and advance the start or the end depending on the current difference between maximum and minimum.
It is easy to update the maximum and minimum when moving the end.
However, the main challenge of this problem is how to update when moving the start. Most heap or balanced tree structures will cost O(logn) to update, and will result in an overall O(nlogn) complexity which is too high.
To do this in time O(n):
Advance the end until you exceed the allowed threshold
Then loop backwards from this critical position storing a cumulative value in an array for the minimum and maximum at every location between the current end and the current start
You can now advance the start pointer and immediately lookup from the arrays the updated min/max values
You can carry on using these arrays to update start until start reaches the critical position. At this point return to step 1 and generate a new set of lookup values.
Overall this procedure will work backwards over every element exactly once, and so the total complexity is O(n).
EXAMPLE
For the sequence with K of 4:
4,1,2,3,4,5,6,10,12
Step 1 advances the end until we exceed the bound
start,4,1,2,3,4,5,end,6,10,12
Step 2 works backwards from end to start computing array MAX and MIN.
MAX[i] is maximum of all elements from i to end
Data = start,4,1,2,3,4,5,end,6,10,12
MAX = start,5,5,5,5,5,5,critical point=end -
MIN = start,1,1,2,3,4,5,critical point=end -
Step 3 can now advance start and immediately lookup the smallest values of max and min in the range start to critical point.
These can be combined with the max/min in the range critical point to end to find the overall max/min for the range start to end.
PYTHON CODE
def count_bounded_slices(A,k):
if len(A)==0:
return 0
t=0
inf = max(abs(a) for a in A)
left=0
right=0
left_lows = [inf]*len(A)
left_highs = [-inf]*len(A)
critical = 0
right_low = inf
right_high = -inf
# Loop invariant
# t counts number of bounded slices A[a:b] with a<left
# left_lows[i] is defined for values in range(left,critical)
# and contains the min of A[left:critical]
# left_highs[i] contains the max of A[left:critical]
# right_low is the minimum of A[critical:right]
# right_high is the maximum of A[critical:right]
while left<len(A):
# Extend right as far as possible
while right<len(A) and max(left_highs[left],max(right_high,A[right]))-min(left_lows[left],min(right_low,A[right]))<=k:
right_low = min(right_low,A[right])
right_high = max(right_high,A[right])
right+=1
# Now we know that any slice starting at left and ending before right will satisfy the constraints
t += right-left
# If we are at the critical position we need to extend our left arrays
if left==critical:
critical=right
left_low = inf
left_high = -inf
for x in range(critical-1,left,-1):
left_low = min(left_low,A[x])
left_high = max(left_high,A[x])
left_lows[x] = left_low
left_highs[x] = left_high
right_low = inf
right_high = -inf
left+=1
return t
A = [3,5,6,7,3]
print count_bounded_slices(A,2)
Here is my attempt at solving this problem:
- you start with p and q form position 0, min =max =0;
- loop until p = q = N-1
- as long as max-min<=k advance q and increment number of bounded slides.
- if max-min >k advance p
- you need to keep track of 2x min/max values because when you advance p, you might remove one or both of the min/max values
- each time you advance p or q update min/max
I can write the code if you want, but I think the idea is explicit enough...
Hope it helps.
Finally a code that works according to the below mentioned idea. This outputs 9.
(The code is in C++. You can change it for Java)
#include <iostream>
using namespace std;
int main()
{
int A[] = {3,5,6,7,3};
int K = 2;
int i = 0;
int j = 0;
int minValue = A[0];
int maxValue = A[0];
int minIndex = 0;
int maxIndex = 0;
int length = sizeof(A)/sizeof(int);
int count = 0;
bool stop = false;
int prevJ = 0;
while ( (i < length || j < length) && !stop ) {
if ( maxValue - minValue <= K ) {
if ( j < length-1 ) {
j++;
if ( A[j] > maxValue ) {
maxValue = A[j];
maxIndex = j;
}
if ( A[j] < minValue ) {
minValue = A[j];
minIndex = j;
}
} else {
count += j - i + 1;
stop = true;
}
} else {
if ( j > 0 ) {
int range = j - i;
int count1 = range * (range + 1) / 2; // Choose 2 from range with repitition.
int rangeRep = prevJ - i; // We have to subtract already counted ones.
int count2 = rangeRep * (rangeRep + 1) / 2;
count += count1 - count2;
prevJ = j;
}
if ( A[j] == minValue ) {
// first reach the first maxima
while ( A[i] - minValue <= K )
i++;
// then come down to correct level.
while ( A[i] - minValue > K )
i++;
maxValue = A[i];
} else {//if ( A[j] == maxValue ) {
while ( maxValue - A[i] <= K )
i++;
while ( maxValue - A[i] > K )
i++;
minValue = A[i];
}
}
}
cout << count << endl;
return 0;
}
Algorithm (minor tweaking done in code):
Keep two pointers i & j and maintain two values minValue and maxValue..
1. Initialize i = 0, j = 0, and minValue = maxValue = A[0];
2. If maxValue - minValue <= K,
- Increment count.
- Increment j.
- if new A[j] > maxValue, maxValue = A[j].
- if new A[j] < minValue, minValue = A[j].
3. If maxValue - minValue > K, this can only happen iif
- the new A[j] is either maxValue or minValue.
- Hence keep incrementing i untill abs(A[j] - A[i]) <= K.
- Then update the minValue and maxValue and proceed accordingly.
4. Goto step 2 if ( i < length-1 || j < length-1 )
I have provided the answer for the same question in different SO Question
(1) For an A[n] input , for sure you will have n slices , So add at first.
for example for {3,5,4,7,6,3} you will have for sure (0,0)(1,1)(2,2)(3,3)(4,4) (5,5).
(2) Then find the P and Q based on min max comparison.
(3) apply the Arithmetic series formula to find the number of combination between (Q-P) as a X . then it would be X ( X+1) /2 But we have considered "n" already so the formula would be (x ( x+1) /2) - x) which is x (x-1) /2 after basic arithmetic.
For example in the above example if P is 0 (3) and Q is 3 (7) we have Q-P is 3 . When apply the formula the value would be 3 (3-1)/2 = 3. Now add the 6 (length) + 3 .Then take care of Q- min or Q - max records.
Then check the Min and Max index .In this case Min as 0 Max as 3 (obivously any one of the would match with currentIndex (which ever used to loop). here we took care of (0,1)(0,2)(1,2) but we have not taken care of (1,3) (2,3) . Rather than start the hole process from index 1 , save this number (position 2,3 = 2) , then start same process from currentindex( assume min and max as A[currentIndex] as we did while starting). finaly multiply the number with preserved . in our case 2 * 2 ( A[7],A[6]) .
It runs in O(N) time with O(N) space.
I came up with a solution in Scala:
package test
import scala.collection.mutable.Queue
object BoundedSlice {
def apply(k:Int, a:Array[Int]):Int = {
var c = 0
var q:Queue[Int] = Queue()
a.map(i => {
if(!q.isEmpty && Math.abs(i-q.last) > k)
q.clear
else
q = q.dropWhile(j => (Math.abs(i-j) > k)).toQueue
q += i
c += q.length
})
c
}
def main(args: Array[String]): Unit = {
val a = Array[Int](3,5,6,7,3)
println(BoundedSlice(2, a))
}
}

Finding the missing number in an array

An array a[] contains all of the integers from 0 to N, except one. However, you cannot access an element with a single operation. Instead, you can call get(i, k) which returns the kth bit of a[i] or you can call swap(i, j) which swaps the ith and jth elements of a[]. Design a O(N) algorithm to find the missing integer.
(For simplicity, assume N is a power of 2.)
If N is a power of 2, it can be done in O(N) using divide and conquer.
Note that there are logN bits in the numbers. Now, using this information - you can use a combination of partition based selection algorithm and radix-sort.
Iterate the numbers for the first bit, and divide the array to two
halves - the first half has this bit as 0, the other half has it as 1. (Use the swap() for partitioning the array).
Note that one half has ceil(N/2) elements, and the other has floor(N/2) elements.
Repeat the process for the smaller array, until you find the missing
number.
The complexity of this approach will be N + N/2 + N/4 + ... + 1 < 2N, so it is O(n)
O(N*M), where M is the number of bits:
N is a power of 2, only one number is missing, so if you check each bit, and count the numbers where that bit is 0, and count where is 1, you'll get 2^(M-1) and 2^(M-1)-1, the shorter one belongs to the missing number. With this, you can get all the bits of the missing number.
there are really no even need to use swap operation!!
Use XOR!
Okay, first you can calculate binary XOR of all number from 0 to N.
So first:
long nxor = 0;
for (long i = 0; i <= N; i++)
nxor = XOR(nxor, i);
Then we can calculate XOR of all numbers in array, it's also simple. Let's call as K - maximal number of bits inside all number.
long axor = 0;
long K = 0;
long H = N;
while (H > 0)
{
H >>= 1; K++;
}
for (long i = 0; i < N - 1; i++)
for (long j = 0; j < K; k++)
axor = XOR(axor, get(i,j) << j);
Finally you can calculate XOR of result:
long result = XOR(nxor, axor).
And by the way, if n is a power of 2, then nxor value will be equal to n ;-)!
Suppose that the input is a[]=0,1,2,3,4,5,7,8, so that 6 is missing. The numbers are sorted for convenience only, because they don't have to be sorted for the solution to work.
Since N is 8 then the numbers are represented using 4 bits.
From 0000 to 1000.
First partition the array using the most significant bit.
You get 0,1,2,3,4,5,7 and 8. Since 8 is present, continue with the left partition.
Partition the sub array using the 2nd most significant bit.
You get 0,1,2,3 and 4,5,7. Now continue with the partition that has odd number of elements, which is 4,5,7.
Partition the sub array using the 3rd most significant bit.
You get 4,5 and 7. Again continue with the partition that has odd number of elements, which is 7.
Partition the sub array using the 4th most significant bit you get nothing and 7.
So the missing number is 6.
Another example:
a[]=0,1,3,4,5,6,7,8, so that 2 is missing.
1st bit partition: 0,1,3,4,5,6,7 and 8, continue with 0,1,3,4,5,6,7.
2nd bit partition: 0,1,3 and 4,5,6,7, continue with 0,1,3 (odd number of elements).
3rd bit partition: 0,1 and 3, continue with 3 (odd number of elements).
4th bit partition: nothing and 3, so 2 is missing.
Another example:
a[]=1,2,3,4,5,6,7,8, so that 0 is missing.
1st bit partition: 1,2,3,4,5,6,7 and 8, continue with 1,2,3,4,5,6,7.
2nd bit partition: 1,2,3 and 4,5,6,7, continue with 1,2,3 (odd number of elements).
3rd bit partition: 1 and 2,3, continue with 1 (odd number of elements).
4th bit partition: nothing and 1, so 0 is missing.
The 1st partition takes N operations.
The 2nd partition takes N operations.
The 3rd partition takes N/2 operations.
The 4th partition takes N/4 operations.
And so on.
So the running time is O(N+N+N/2+N/4+...)=O(N).
And also you another anwer when we will use sum operation instead of xor operation.
Just below please find code.
long allsum = n * (n + 1) / 2;
long sum = 0;
long K = 0;
long H = N;
while (H > 0)
{
H >>= 1; K++;
}
for (long i = 0; i < N - 1; i++)
for (long j = 0; j < K; k++)
sum += get(i,j) << j;
long result = allsum - sum.
With out xor operation, we will answer this question like this way
package missingnumberinarray;
public class MissingNumber
{
public static void main(String args[])
{
int array1[] = {1,2,3,4,6,7,8,9,10}; // we need sort the array first.
System.out.println(array1[array1.length-1]);
int n = array1[array1.length-1];
int total = (n*(n+1))/2;
System.out.println(total);
int arraysum = 0;
for(int i = 0; i < array1.length; i++)
{
arraysum += array1[i];
}
System.out.println(arraysum);
int mis = total-arraysum;
System.out.println("The missing number in array is "+mis);
}
}

array median transformation minimum steps

Given an array A with n
integers. In one turn one can apply the
following operation to any consecutive
subarray A[l..r] : assign to all A i (l <= i <= r)
median of subarray A[l..r] .
Let max be the maximum integer of A .
We want to know the minimum
number of operations needed to change A
to an array of n integers each with value
max.
For example, let A = [1, 2, 3] . We want to change it to [3, 3, 3] . We
can do this in two operations, first for
subarray A[2..3] (after that A equals to [1,
3, 3] ), then operation to A[1..3] .
Also,median is defined for some array A as follows. Let B be the same
array A , but sorted in non-decreasing
order. Median of A is B m (1-based
indexing), where m equals to (n div 2)+1 .
Here 'div' is an integer division operation.
So, for a sorted array with 5 elements,
median is the 3rd element and for a sorted
array with 6 elements, it is the 4th element.
Since the maximum value of N is 30.I thought of brute forcing the result
could there be a better solution.
You can double the size of the subarray containing the maximum element in each iteration. After the first iteration, there is a subarray of size 2 containing the maximum. Then apply your operation to a subarray of size 4, containing those 2 elements, giving you a subarray of size 4 containing the maximum. Then apply to a size 8 subarray and so on. You fill the array in log2(N) operations, which is optimal. If N is 30, five operations is enough.
This is optimal in the worst case (i.e. when only one element is the maximum), since it sets the highest possible number of elements in each iteration.
Update 1: I noticed I messed up the 4s and 8s a bit. Corrected.
Update 2: here's an example. Array size 10, start state:
[6 1 5 9 3 2 0 7 4 8]
To get two nines, run op on subarray of size two containing the nine. For instance A[4…5] gets you:
[6 1 5 9 9 2 0 7 4 8]
Now run on size four subarray that contains 4…5, for instance on A[2…5] to get:
[6 9 9 9 9 2 0 7 4 8]
Now on subarray of size 8, for instance A[1…8], get:
[9 9 9 9 9 9 9 9 4 8]
Doubling now would get us 16 nines, but we have only 10 positions, so round of with A[1…10], get:
[9 9 9 9 9 9 9 9 9 9]
Update 3: since this is only optimal in the worst case, it is actually not an answer to the original question, which asks for a way of finding the minimal number of operations for all inputs. I misinterpreted the sentence about brute forcing to be about brute forcing with the median operations, rather than in finding the minimum sequence of operations.
This is the problem from codechef Long Contest.Since the contest is already over,so awkwardiom ,i am pasting the problem setter approach (Source : CC Contest Editorial Page).
"Any state of the array can be represented as a binary mask with each bit 1 means that corresponding number is equal to the max and 0 otherwise. You can run DP with state R[mask] and O(n) transits. You can proof (or just believe) that the number of statest will be not big, of course if you run good DP. The state of our DP will be the mask of numbers that are equal to max. Of course, it makes sense to use operation only for such subarray [l; r] that number of 1-bits is at least as much as number of 0-bits in submask [l; r], because otherwise nothing will change. Also you should notice that if the left bound of your operation is l it is good to make operation only with the maximal possible r (this gives number of transits equal to O(n)). It was also useful for C++ coders to use map structure to represent all states."
The C/C++ Code is::
#include <cstdio>
#include <iostream>
using namespace std;
int bc[1<<15];
const int M = (1<<15) - 1;
void setMin(int& ret, int c)
{
if(c < ret) ret = c;
}
void doit(int n, int mask, int currentSteps, int& currentBest)
{
int numMax = bc[mask>>15] + bc[mask&M];
if(numMax == n) {
setMin(currentBest, currentSteps);
return;
}
if(currentSteps + 1 >= currentBest)
return;
if(currentSteps + 2 >= currentBest)
{
if(numMax * 2 >= n) {
setMin(currentBest, 1 + currentSteps);
}
return;
}
if(numMax < (1<<currentSteps)) return;
for(int i=0;i<n;i++)
{
int a = 0, b = 0;
int c = mask;
for(int j=i;j<n;j++)
{
c |= (1<<j);
if(mask&(1<<j)) b++;
else a++;
if(b >= a) {
doit(n, c, currentSteps + 1, currentBest);
}
}
}
}
int v[32];
void solveCase() {
int n;
scanf(" %d", &n);
int maxElement = 0;
for(int i=0;i<n;i++) {
scanf(" %d", v+i);
if(v[i] > maxElement) maxElement = v[i];
}
int mask = 0;
for(int i=0;i<n;i++) if(v[i] == maxElement) mask |= (1<<i);
int ret = 0, p = 1;
while(p < n) {
ret++;
p *= 2;
}
doit(n, mask, 0, ret);
printf("%d\n",ret);
}
main() {
for(int i=0;i<(1<<15);i++) {
bc[i] = bc[i>>1] + (i&1);
}
int cases;
scanf(" %d",&cases);
while(cases--) solveCase();
}
The problem setter approach has exponential complexity. It is pretty good for N=30. But not so for larger sizes. I think, it's more interesting to find an exponential time solution. And I found one, with O(N4) complexity.
This approach uses the fact that optimal solution starts with some group of consecutive maximal elements and extends only this single group until whole array is filled with maximal values.
To prove this fact, take 2 starting groups of consecutive maximal elements and extend each of them in optimal way until they merge into one group. Suppose that group 1 needs X turns to grow to size M, group 2 needs Y turns to grow to the same size M, and on turn X + Y + 1 these groups merge. The result is a group of size at least M * 4. Now instead of turn Y for group 2, make an additional turn X + 1 for group 1. In this case group sizes are at least M * 2 and at most M / 2 (even if we count initially maximal elements, that might be included in step Y). After this change, on turn X + Y + 1 the merged group size is at least M * 4 only as a result of the first group extension, add to this at least one element from second group. So extending a single group here produces larger group in same number of steps (and if Y > 1, it even requires less steps). Since this works for equal group sizes (M), it will work even better for non-equal groups. This proof may be extended to the case of several groups (more than two).
To work with single group of consecutive maximal elements, we need to keep track of only two values: starting and ending positions of the group. Which means it is possible to use a triangular matrix to store all possible groups, allowing to use a dynamic programming algorithm.
Pseudo-code:
For each group of consecutive maximal elements in original array:
Mark corresponding element in the matrix and clear other elements
For each matrix diagonal, starting with one, containing this element:
For each marked element in this diagonal:
Retrieve current number of turns from this matrix element
(use indexes of this matrix element to initialize p1 and p2)
p2 = end of the group
p1 = start of the group
Decrease p1 while it is possible to keep median at maximum value
(now all values between p1 and p2 are assumed as maximal)
While p2 < N:
Check if number of maximal elements in the array is >= N/2
If this is true, compare current number of turns with the best result \
and update it if necessary
(additional matrix with number of maximal values between each pair of
points may be used to count elements to the left of p1 and to the
right of p2)
Look at position [p1, p2] in the matrix. Mark it and if it contains \
larger number of turns, update it
Repeat:
Increase p1 while it points to maximal value
Increment p1 (to skip one non-maximum value)
Increase p2 while it is possible to keep median at maximum value
while median is not at maximum value
To keep algorithm simple, I didn't mention special cases when group starts at position 0 or ends at position N, skipped initialization and didn't make any optimizations.

Minimum number of swaps needed to change Array 1 to Array 2?

For example, input is
Array 1 = [2, 3, 4, 5]
Array 2 = [3, 2, 5, 4]
Minimum number of swaps needed are 2.
The swaps need not be with adjacent cells, any two elements can be swapped.
https://www.spoj.com/problems/YODANESS/
As #IVlad noted in the comment to your question Yodaness problem asks you to count number of inversions and not minimal number of swaps.
For example:
L1 = [2,3,4,5]
L2 = [2,5,4,3]
The minimal number of swaps is one (swap 5 and 3 in L2 to get L1), but number of inversions is three: (5 4), (5 3), and (4 3) pairs are in the wrong order.
The simplest way to count number of inversions follows from the definition:
A pair of elements (pi,pj) is called an inversion in a permutation p if i < j and pi > pj.
In Python:
def count_inversions_brute_force(permutation):
"""Count number of inversions in the permutation in O(N**2)."""
return sum(pi > permutation[j]
for i, pi in enumerate(permutation)
for j in xrange(i+1, len(permutation)))
You could count inversion in O(N*log(N)) using divide & conquer strategy (similar to how a merge sort algorithm works). Here's pseudo-code from Counting Inversions translated to Python code:
def merge_and_count(a, b):
assert a == sorted(a) and b == sorted(b)
c = []
count = 0
i, j = 0, 0
while i < len(a) and j < len(b):
c.append(min(b[j], a[i]))
if b[j] < a[i]:
count += len(a) - i # number of elements remaining in `a`
j+=1
else:
i+=1
# now we reached the end of one the lists
c += a[i:] + b[j:] # append the remainder of the list to C
return count, c
def sort_and_count(L):
if len(L) == 1: return 0, L
n = len(L) // 2
a, b = L[:n], L[n:]
ra, a = sort_and_count(a)
rb, b = sort_and_count(b)
r, L = merge_and_count(a, b)
return ra+rb+r, L
Example:
>>> sort_and_count([5, 4, 2, 3])
(5, [2, 3, 4, 5])
Here's solution in Python for the example from the problem:
yoda_words = "in the force strong you are".split()
normal_words = "you are strong in the force".split()
perm = get_permutation(normal_words, yoda_words)
print "number of inversions:", sort_and_count(perm)[0]
print "number of swaps:", number_of_swaps(perm)
Output:
number of inversions: 11
number of swaps: 5
Definitions of get_permutation() and number_of_swaps() are:
def get_permutation(L1, L2):
"""Find permutation that converts L1 into L2.
See http://en.wikipedia.org/wiki/Cycle_representation#Notation
"""
if sorted(L1) != sorted(L2):
raise ValueError("L2 must be permutation of L1 (%s, %s)" % (L1,L2))
permutation = map(dict((v, i) for i, v in enumerate(L1)).get, L2)
assert [L1[p] for p in permutation] == L2
return permutation
def number_of_swaps(permutation):
"""Find number of swaps required to convert the permutation into
identity one.
"""
# decompose the permutation into disjoint cycles
nswaps = 0
seen = set()
for i in xrange(len(permutation)):
if i not in seen:
j = i # begin new cycle that starts with `i`
while permutation[j] != i: # (i σ(i) σ(σ(i)) ...)
j = permutation[j]
seen.add(j)
nswaps += 1
return nswaps
As implied by Sebastian's solution, the algorithm you are looking for can be based on inspecting the permutation's cycles.
We should consider array #2 to be a permutation transformation on array #1. In your example, the permutation can be represented as P = [2,1,4,3].
Every permutation can be expressed as a set of disjoint cycles, representing cyclic position changes of the items. The permutation P for example has 2 cycles: (2,1) and (4,3). Therefore two swaps are enough. In the general case, you should simply subtract the number of cycles from the permutation length, and you get the minimum number of required swaps. This follows from the observation that in order to "fix" a cycle of N elements, N-1 swaps are enough.
This problem has a clean, greedy, trivial solution:
Find any swap operation which gets both swapped elements in Array1 closer to their destination in Array2. Perform the swap operation on Array1 if one exists.
Repeat step1 until no more such swap operations exist.
Find any swap operation which gets one swapped element in Array1 closer to its destination in Array2. If such an operation exist, perform it on Array1.
Go back to step1 until Array1 == Array2.
The correctness of the algorithm can be proved by defining a potential for the problem as the sum of distances of all elements in array1 from their destination in array2.
This can be easily converted to another type of problem, which can be solved more efficiently. All that is needed is to convert the arrays into permutations, i.e. change the values to their ids. So your arrays:
L1 = [2,3,4,5]
L2 = [2,5,4,3]
would become
P1 = [0,1,2,3]
P2 = [0,3,2,1]
with the assignment 2->0, 3->1, 4->2, 5->3. This can only be done if there are no repeated items though. If there are, then this becomes harder to solve.
Converting permutation from one to another can be converted to a similar problem (Number of swaps in a permutation) by inverting the target permutation in O(n), composing the permutations in O(n) and then finding the number of swaps from there to an identity permutation in O(m).
Given:
int P1[] = {0, 1, 2, 3}; // 2345
int P2[] = {0, 3, 2, 1}; // 2543
// we can follow a simple algebraic modification
// (see http://en.wikipedia.org/wiki/Permutation#Product_and_inverse):
// P1 * P = P2 | premultiply P1^-1 *
// P1^-1 * P1 * P = P1^-1 * P2
// I * P = P1^-1 * P2
// P = P1^-1 * P2
// where P is a permutation that makes P1 into P2.
// also, the number of steps from P to identity equals
// the number of steps from P1 to P2.
int P1_inv[4];
for(int i = 0; i < 4; ++ i)
P1_inv[P1[i]] = i;
// invert the first permutation in O(n)
int P[4];
for(int i = 0; i < 4; ++ i)
P[i] = P2[P1_inv[i]];
// chain the permutations in O(n)
int num_steps = NumSteps(P, 4); // will return 2
// now we just need to count the steps in O(num_steps)
To count the steps, a simple algorithm can be devised, such as:
int NumSteps(int *P, int n)
{
int count = 0;
for(int i = 0; i < n; ++ i) {
for(; P[i] != i; ++ count) // could be permuted multiple times
swap(P[P[i]], P[i]); // look where the number at hand should be
}
// count number of permutations
return count;
}
This always swaps an item for a place where it should be in the identity permutation, therefore at every step it undoes and counts one swap. Now, provided that the number of swaps it returns is indeed minimum, the runtime of the algorithm is bounded by it and is guaranteed to finish (instead of getting stuck in an infinite loop). It will run in O(m) swaps or O(m + n) loop iterations where m is number of swaps (the count returned) and n is number of items in the sequence (4). Note that m < n is always true. Therefore, this should be superior to O(n log n) solutions, as the upper bound is O(n - 1) of swaps or O(n + n - 1) of loop iterations here, which is both practically O(n) (constant factor of 2 omitted in the latter case).
The algorithm will only work for valid permutations, it will loop infinitely for sequences with duplicate values and will do out-of-bounds array access (and crash) for sequences with values other than [0, n). A complete test case can be found here (builds with Visual Studio 2008, the algorithm itself should be fairly portable). It generates all possible permutations of lengths 1 to 32 and checks against solutions, generated with breadth first search (BFS), seems to work for all of permutations of lengths 1 to 12, then it becomes fairly slow but I assume it will just continue working.
Algorithm:
Check if the elements of list in the same position are equal. If yes, no swap is required. If no, swap the position of list-element wherever the element is matching
Iterate the process for the entire list elements.
Code:
def nswaps(l1, l2):
cnt = 0
for i in range(len(l1)):
if l1[i] != l2[i]:
ind = l2.index(l1[i])
l2[i], l2[ind] = l2[ind], l2[i]
cnt += 1
pass
return cnt
Since we already know that arr2 has the correct indexes of each element present in arr1. Therefore, we can simply compare the arr1 elements with arr2, and swap them with the correct indexes in case they are at wrong index.
def minimum_swaps(arr1, arr2):
swaps = 0
for i in range(len(arr1)):
if arr1[i] != arr2[i]:
swaps += 1
element = arr1[i]
index = arr1.index(arr2[i]) # find index of correct element
arr1[index] = element # swap
arr1[i] = arr2[i]
return swaps
#J.F. Sebastian and #Eyal Schneider's answer are pretty cool.
I got inspired on solving a similar problem: Calculate the minimum swaps needed to sort an array, e.g.: to sort {2,1,3,0}, you need minimum 2 swaps.
Here is the Java Code:
// 0 1 2 3
// 3 2 1 0 (0,3) (1,2)
public static int sortWithSwap(int [] a) {
Integer[] A = new Integer[a.length];
for(int i=0; i<a.length; i++) A[i] = a[i];
Integer[] B = Arrays.copyOf(mapping(A), A.length, Integer[].class);
int cycles = 0;
HashSet<Integer> set = new HashSet<>();
boolean newCycle = true;
for(int i=0; i<B.length; ) {
if(!set.contains(B[i])) {
if(newCycle) {
newCycle = false;
cycles++;
}
set.add(B[i]);
i = B[i];
}
else if(set.contains(B[i])) { // duplicate in existing cycles
newCycle = true;
i++;
}
}
// suppose sequence has n cycles, each cycle needs swap len(cycle)-1 times
// and sum of length of all cycles is length of sequence, so
// swap = sequence length - cycles
return a.length - cycles;
}
// a b b c
// c a b b
// 3 0 1 1
private static Object[] mapping(Object[] A) {
Object[] B = new Object[A.length];
Object[] ret = new Object[A.length];
System.arraycopy(A, 0, B, 0, A.length);
Arrays.sort(A);
HashMap<Object, Integer> map = new HashMap<>();
for(int i=0; i<A.length; i++) {
map.put(A[i], i);
}
for(int i=0; i<B.length; i++) {
ret[i] = map.get(B[i]);
}
return ret;
}
This seems like an edit distance problem, except that only transpositions are allowed.
Check out Damerau–Levenshtein distance pseudo code. I believe you can adjust it to count only the transpositions.

Resources