puzzle using arrays - data-structures

My first array M + N size and second array of size N.
let us say m=4,n=5
a[ ]= 1,3,5,7,0,0,0,0,0
b[ ]= 2,4,6,8,10
Now , how can i merge these two arrays without using external sorting algorithms and any other temporary array(inplace merge) but complexity should be o(n).Resultant array must be in sorted order.

Provided a is exactly the right size and arrays are already sorted (as seems to be the case), the following pseudo-code should help:
# 0 1 2 3 4 5 6 7 8
a = [1,3,5,7,0,0,0,0,0]
b = [2,4,6,8,10]
afrom = 3
bfrom = 4
ato = 8
while bfrom >= 0:
if afrom == -1:
a[ato] = b[bfrom]
ato = ato - 1
bfrom = bfrom - 1
else:
if b[bfrom] > a[afrom]:
a[ato] = b[bfrom]
ato = ato - 1
bfrom = bfrom - 1
else:
a[ato] = a[afrom]
ato = ato - 1
afrom = afrom - 1
print a
It's basically a merge of the two lists into one, starting at the ends. Once bfrom hits -1, there are no more elements in b so the remainder in a were less than the lowest in b. Therefore the rest of a can remain unchanged.
If a runs out first, then it's a matter of transferring the rest of b since all the a elements have been transferred above ato already.
This is O(n) as requested and would result in something like:
[1, 2, 3, 4, 5, 6, 7, 8, 10]
Understanding that pseudo-code and translating it to your specific language is a job for you, now that you've declared it homework :-)

for (i = 0; i < N; i++) {
a[m+i] = b[i];
}
This will do an in-place merge (concatenation).
If you're asking for an ordered merge, that's not possible in O(N). If it were to be possible, you could use it to sort in O(N). And of course O(N log N) is the best known general-case sorting algorithm...
I've got to ask, though, looking at your last few questions: are you just asking us for homework help? You do know that it's OK to say "this is homework", and nobody will laugh at you, right? We'll even still do our best to help you learn.

Do you want a sorted array ? If not this should do
for(int i=a.length-1,j=0;i >=0; i--)
{
a[i] = b[j++];
}

You can take a look at in-place counting sort that works provided you know the input range. Effectively O(n).

Related

Count subsets of array which qualify min(subset)+max(subset) < k

Was asked this question in an interview, didn't have a better answer than generating all possible subsets.
Example:
a = [4,2,5,7] k = 8
output = 4
[2],[4,2],[2,5],[4,2,5]
Interviewer tried implying sorting the array should help, but I still couldn't figure out a better-than-brute-force solution. Will appreciate your input.
The interviewer implied that sorting the array would help and it does help. I'll try to explain.
Taking the array and k values you stated:
a = [4,2,5,7]
k = 8
Sorting the array will yield:
a_sort = [2,4,5,7]
Now we can consider the following procedure:
set ii = 0, jj = 1
choose a_sort[ii] as a part of your subset
2.1. If 2 * a_sort[ii] >= k, you are done. else, the subset [a_sort[ii]] holds the condition and is a part of the solution.
add a_sort[ii+jj] to your subset
3.1. If a_sort[ii] + a_sort[ii+jj] < k,
3.1.1. the subset [a_sort[ii], a_sort[ii+jj]] holds the condition and is a part of the solution, as well as any subset which consists of any additional number of elements a_sort[kk] where ii< kk < ii+jj
3.1.2. set jj += 1 and go back to step 3.
3.2. else, set ii += 1, jj = ii + 1, go back to step 2
With your input this procedure should return:
[[2], [2,4],[2,5],[2,4,5]]
# [2,7] results in 9 > 8 and therefore you move to [4]
# Note that for [4] subset you get 8 = 8 which is not smaller than 8, we are done
Explenation
if you have a subset of [a_sort[ii]] which does not hold 2 * a_sort[ii] < k, adding additional numbers to the subset will only yield min(subset)+max(subset) > 2 * a_sort[ii] > k and therefore there will not be any additional subsets which hold the wanted condition. Moreover, by setting a subset of [a_sort[ii+1]] will results in 2 * a_sort[ii+1] >= 2 * a_sort[ii] > k` sinse a_sort is sorted. Therefore you will not find any additional subsets.
for jj > ii, if a_sort[ii] + a_sort[ii+jj] < k then you can push any number if members from a_sort into the subset, as long as the index kk will be bigger than ii and lower than ii+jj since a_sort is sorted, and adding these members to the subset will not change the value of min(subset)+max(subset) which will remain a_sort[ii] + a_sort[ii+jj] and we already know that this value is smaller thank k
Getting the count
In case you simply want to the possible subsets, this can be done easier than generating the subsets themselves.
Assuming that for ii > jj the condition holds, i.e. a_sort[ii] + a_sort[ii+jj] < k. If jj = ii + 1 there is an addition of 1 possible subset. If jj > ii + 1 there are jj - ii - 1 additional elements which can be either present not not without a change of the value a_sort[ii] + a_sort[ii+jj]. Therefore there are a total of 2**(jj-ii-1) additional subsets available to add to the solution group (jj-ii-1 elements, each is independently present or not). This also holds for jj = ii + 1 since in this case 2**(jj-ii-1) = 2**0 = 1
Looking at the example above:
[2] adds 1 count
[2,4] adds 1 count (1 = 0 + 1)
[2,5] adds 2 counts (2 = 0 + 2 --> 2 **(2 - 0 - 1) = 2**1 = 2)
A total count of 4
Sort the array
For an element x at index l, do a binary search on the array to get index of the maximum integer in the array which is < k-x. Let the index be r.
For all subsets where min(subset) = x, we can have any element with index in range (l,r]. Number of subsets with min(subset) = x becomes the total number of possible subsets for (r-l) elements, so count = 2^(r-l) (or 0 if r<l).
(Note: in all such subsets, we are fixing x. That's why the range (l,r] isn't inclusive of l)
You have to iterate over the array, use the above process for each element/index to get the count of subsets where our current element is the minimum and the subset satisfies the given constraint. If you find an element with count=0, break the iteration.
This should work with a 0(N*log(N)) complexity, good enough for an interview question imo.
For the given example, sorted array = [2,4,5,7].
For element 2, l=0 and r=2. Count = 2^(2-0) = 4 (covers [2],[4,2],[2,5],[4,2,5]
For element 4, l=1 and r=0. Count = 0, and we break the iteration.

Unable to understand algorithm

Here is the link of problem
https://www.hackerrank.com/challenges/equal
I read its editorial and unable to understand it. And if you are not make any account on hackerrank then surely you will not see it's editorial so here is some lines of editorial.
This is equivalent to saying, christy can take away the chocolates of
one coworker by 1, 2 or 5 while keeping others' chocolate untouched.
Let's consider decreasing a coworker's chocolate as an operation. To minimize the number of operations, we should try to make the number of chocolates of every coworker equal to the minimum one in the group(min). We have to decrease the number of chocolates the ith person A[i] by (A[i] - min). Let this value be x.
This can be done in k operations.
k = x/5 +(x%5)/2 + (x%5)%2
and from here i unable to understand
Let f(min) be sum of operations performed over all coworkers to reduce
each of their chocolates to min. However, sometimes f(min) might not
always give the correct answer. It can also be a case when
f(min) > f(min-1)
f(min) < f(min-5)
as f(min-5) takes N operations more than f(min) where N is the number
of coworkers. Therefore, if
A = {min,min-1,min-2,min-3,min-4}
then f(A) <= f(min) < f(min-5)
can someone help me to understand why this is necessary to check f(min),f(min-1),...,f(min-4)
Consider the case A = [1,5,5]
As the editorial said, it is intuitive to think it is optimal to change A to [1,1,1] with 4 (2 minus 2) operations, but it is better to change it to [0,0,0] with 3 (1 minus 1, 2 minus 5) operations.
Hence if min = minimum element in array, then change all elements to min may not be optimal.
The part you do not understand is to cater this situation, we know min may not be optimal as min-x maybe better, but how large is x? Well it is 4. The editorial is saying if we know x is at most 4, we can just simply brute force min, min-1...min-4 to see which one is the minimum without thinking too much.
Reasoning (Not proof!) for x <= 4
If x >= 5, then you have to use at least extra N type 3 (minus 5) operations on all elements which is definitely not worth.
Basically it is not a matter of the type of operation, it is because you need to use same operation on ALL elements, after you do that, the problem is not reduced, the relative difference between elements is still the same while you aim to make the relative difference to 0, you cost N operations for nothing.
In other words, if x >= 5, then x-5 must be a more optimal choice of goal, indeed x%5 must be the best goal.
(Below is TL;DR part: Version 2) Jump to the Last Section If You are Not Interested in the proof
In the process of writing the original solution, I suspect x <= 2 indeed, and I have tried to submit a code on HackerRank which only check minimum for f(min-x) where x <= 2, and it got ACed.
More formally, I claim
If 5> (z-min)%5 >= 3 and (z-min')%5==0, then F(min')< F(min)
where min'=min-x for x<=2, F(k) = min # of operation for element z to become k
(Beware the notation, I use F(), it is different meaning from f() in the question)
Here is the proof:
If (z-min)%5 = 1 or 2, then it needs at least (z-min)/5 + 1 operations, while (z-min')%5 == 0 needs (z-min')/5 = (z-min)/5 + 1 operation, means F(min') = F(min)
If(z-min)%5 == 3 or 4, then it needs at least (z-min)/5 + 2 operations, while (z-min')%5 == 0 needs (z-min')/5 = (z-min)/5 + 1 operation, means F(min') < F(min) (or F(min') = F(min)+1)
So we proof
If 5> (z-min)%5 >= 3 and (z-min')%5==0, then F(min')< F(min)
where min'=min-x
Now let's proof the range of x
As we assume (z-min)%5 >= 3 and (z-min')%5 == 0,
so (z-min')%5 = (z-min+x)%5 = ((z-min)%5 + x%5)%5 == 0
Now, if x >= 3, then (z-min)%5 can never be >= 3 in order to make ((z-min)%5 + x%5)%5 == 0
If x = 2, then (z-min)%5 can be 3; if x = 1 then (z-min)%5 can be 4, to meet both conditions: 5> (z-min)%5 >= 3 and (z-min')%5==0
Thus together we show
If 5> (z-min)%5 >= 3 and (z-min')%5==0, then F(min')< F(min)
where min'=min-x for x<=2
Note one can always generate array P, such that f(min') < f(min), as you can always repeat integer which can be improved by such method until it out number those integers cannot. This is because for elements that cannot be improved, they will always need exactly 1 more operations
eg: Let P = [2,2,2,10] f(min) = 0+3 = 3, f(min-2) = 3+2 = 5
Here 10 is the element which can be improved, while 2 cannot, so we can just add more 10 in the array. Each 2 will use 1 more operation to get to min' = min-2, while each 10 will save 1 operation to get min'. So we only have to add more 10 until it out number (compensate) the "waste" of 2:
P = [2,2,2,10,10,10,10,10], then f(min) = 0+15 = 15, f(min-2) = 3+10=13
or simply just
P = [2,10,10], f(min) = 6, f(min-2) = 5
(End of TL;DR part!)
EDITED
OMG THE TEST CASE ON HACKERRANK IS WEAK!
Story is when I arrive my office this morning, I keep thinking this problem a bit, and think that there maybe a problem in my code (which got ACed!)
#include <cmath>
#include <cstdio>
#include <vector>
#include <iostream>
#include <algorithm>
using namespace std;
int T, n, a[10005], m = 1<<28;
int f(int m){
m = max(0, m);
int cnt = 0;
for(int i=0; i<n;i++){
cnt += (a[i]-m)/5 + (a[i]-m)%5/2 + (a[i]-m)%5%2;
}
return cnt;
}
int main() {
cin >> T;
while(T--){
m = 1<<28;
cin >> n;
for(int i=0; i<n;i++) cin >> a[i], m = min(m,a[i]);
cout << min(min(f(m), f(m-1)),f(m-2)) << endl;
}
return 0;
}
Can you see the problem?
The problem is m = max(0, m); !
It ensure that min-x must be at least 0, but wait, my proof above did not say anything about the range of min-x! It can be negative indeed!
Remember the original question is about "adding", so there is no maximum value of the goal; while we model the question to "subtracting", there is no minimum value of the goal as well (but I set it to 0!)
Try this test case with the code above:
1
3
0 3 3
It forces min-x = 0, so it gives 4 as output, but the answer should be 3
(If we use "adding" model, the goal should be 10, with +5 on a[0],a[2], +5 on a[0],a[1], +2 on a[1], a[2])
So everything finally got right (I think...) when I remove the line m = max(0, m);, it allows min-x to get negative and give 3 as a correct output, and of course the new code get ACed as well...

How to analyze of the complexity of this code?

I am solving a problem from codeforces. According to the editorial, the complexity of the following code should be O(n).
for(int i = n - 1; i >= 0; --i) {
r[i] = i + 1;
while (r[i] < n && height[i] > height[r[i]])
r[i] = r[r[i]];
if (r[i] < n && height[i] == height[r[i]])
r[i] = r[r[i]];
}
Here, height[i] is the height of i-th hill and r[i] is the position of the first right hill which is higher than height[i], and height[0] is the always greatest among other values of height array.
My question is, how we can guarantee the complexity of the code to be O(n) although the inner while loop's being?
In the inner while loop, the code updates r[i] values until height[i] > height[r[i]]. and the number of the updates depends on height array. For example, the number of updates of the height array sorted by non-decreasing order will be different from that of the height array sorted by non-increasing order. (in both cases, the we will sort the array except height[0], because height[0] should be always maximum in this problem).
And Is there any method to analyze an algorithm which varies on input data like this? amortized analysis will be one of answers?
PS. I would like to clarify my question more, We are to set the array r[] in the loop. And what about this? if the array height = {5,4,1,2,3} and i=1, (r[2]=3, r[3]=4 because 2 is the first value which is greater than 1, and 3 is the first value which is greater than 2) we are to compare 4 with 1, and because 4>1, we keep trying to compare 4 and 2(=height[r[2]]), 4 with 3(=height[r[3]]). In this case we have to compare 4 times to set r[1]. The number of comparison is differ from when height = {5,1,2,3,4}. Can we still guarantee the complexity of the code to be O(n)? If I miss something, Please let me know. Thank you.
I tried the mentioned algorithm with simple example, but it seems nothing changed, did I miss something ?
Example:
n = 5
height = { 2, 4, 6, 8, 10 }
r = { 1, 2, 3, 4, 5 }
---- i:4 ----
r[4] < 5 ?
---- i:3 ----
8 > 10 ?
8 = 10 ?
---- i:2 ----
6 > 8 ?
6 = 8 ?
---- i:1 ----
4 > 6 ?
4 = 6 ?
---- i:0 ----
2 > 4 ?
2 = 4 ?
-------------
height = { 2, 4, 6, 8, 10 }
r = { 1, 2, 3, 4, 5 }
Your algo (I do not know whether it will solve your problem) is actually O(n), even though there is an inner loop, but in most of the cases inner loop will not execute because of given the conditions. So in worst case it will run like 2n time, which is O(n).
You can test this assumption with a method like this where yourMethod will return the number of time it's inner loop were executed:
int arr[] = {1, 2, 3, 4, 5};
do {
int count = yourMethod(arr, 5);
}while(next_permutation(arr, arr+5));
With this you will be able check the worst case, average case, etc.

Find all possible combinations from 4 input numbers which can add up to 24

Actually, this question can be generalized as below:
Find all possible combinations from a given set of elements, which meets
a certain criteria.
So, any good algorithms?
There are only 16 possibilities (and one of those is to add together "none of them", which ain't gonna give you 24), so the old-fashioned "brute force" algorithm looks pretty good to me:
for (unsigned int choice = 1; choice < 16; ++choice) {
int sum = 0;
if (choice & 1) sum += elements[0];
if (choice & 2) sum += elements[1];
if (choice & 4) sum += elements[2];
if (choice & 8) sum += elements[3];
if (sum == 24) {
// we have a winner
}
}
In the completely general form of your problem, the only way to tell whether a combination meets "certain criteria" is to evaluate those criteria for every single combination. Given more information about the criteria, maybe you could work out some ways to avoid testing every combination and build an algorithm accordingly, but not without those details. So again, brute force is king.
There are two interesting explanations about the sum problem, both in Wikipedia and MathWorld.
In the case of the first question you asked, the first answer is good for a limited number of elements. You should realize that the reason Mr. Jessop used 16 as the boundary for his loop is because this is 2^4, where 4 is the number of elements in your set. If you had 100 elements, the loop limit would become 2^100 and your algorithm would literally take forever to finish.
In the case of a bounded sum, you should consider a depth first search, because when the sum of elements exceeds the sum you are looking for, you can prune your branch and backtrack.
In the case of the generic question, finding the subset of elements that satisfy certain criteria, this is known as the Knapsack problem, which is known to be NP-Complete. Given that, there is no algorithm that will solve it in less than exponential time.
Nevertheless, there are several heuristics that bring good results to the table, including (but not limited to) genetic algorithms (one I personally like, for I wrote a book on them) and dynamic programming. A simple search in Google will show many scientific papers that describe different solutions for this problem.
Find all possible combinations from a given set of elements, which
meets a certain criteria
If i understood you right, this code will helpful for you:
>>> from itertools import combinations as combi
>>> combi.__doc__
'combinations(iterable, r) --> combinations object\n\nReturn successive r-length
combinations of elements in the iterable.\n\ncombinations(range(4), 3) --> (0,1
,2), (0,1,3), (0,2,3), (1,2,3)'
>>> set = range(4)
>>> set
[0, 1, 2, 3]
>>> criteria = range(3)
>>> criteria
[0, 1, 2]
>>> for tuple in list(combi(set, len(criteria))):
... if cmp(list(tuple), criteria) == 0:
... print 'criteria exists in tuple: ', tuple
...
criteria exists in tuple: (0, 1, 2)
>>> list(combi(set, len(criteria)))
[(0, 1, 2), (0, 1, 3), (0, 2, 3), (1, 2, 3)]
Generally for a problem as this you have to try all posebilities, the thing you should do have the code abort the building of combiantion if you know it will not satesfie the criteria (if you criteria is that you do not have more then two blue balls, then you have to abort calculation that has more then two). Backtracing
def perm(set,permutation):
if lenght(set) == lenght(permutation):
print permutation
else:
for element in set:
if permutation.add(element) == criteria:
perm(sett,permutation)
else:
permutation.pop() //remove the element added in the if
The set of input numbers matters, as you can tell as soon as you allow e.g. negative numbers, imaginary numbers, rational numbers etc in your start set. You could also restrict to e.g. all even numbers, all odd number inputs etc.
That means that it's hard to build something deductive. You need brute force, a.k.a. try every combination etc.
In this particular problem you could build an algoritm that recurses - e.g. find every combination of 3 Int ( 1,22) that add up to 23, then add 1, every combination that add to 22 and add 2 etc. Which can again be broken into every combination of 2 that add up to 21 etc. You need to decide if you can count same number twice.
Once you have that you have a recursive function to call -
combinations( 24 , 4 ) = combinations( 23, 3 ) + combinations( 22, 3 ) + ... combinations( 4, 3 );
combinations( 23 , 3 ) = combinations( 22, 2 ) + ... combinations( 3, 2 );
etc
This works well except you have to be careful around repeating numbers in the recursion.
private int[][] work()
{
const int target = 24;
List<int[]> combos = new List<int[]>();
for(int i = 0; i < 9; i++)
for(int x = 0; x < 9; x++)
for(int y = 0; y < 9; y++)
for (int z = 0; z < 9; z++)
{
int res = x + y + z + i;
if (res == target)
{
combos.Add(new int[] { x, y, z, i });
}
}
return combos.ToArray();
}
It works instantly, but there probably are better methods rather than 'guess and check'. All I am doing is looping through every possibility, adding them all together, and seeing if it comes out to the target value.
If i understand your question correctly, what you are asking for is called "Permutations" or the number (N) of possible ways to arrange (X) numbers taken from a set of (Y) numbers.
N = Y! / (Y - X)!
I don't know if this will help, but this is a solution I came up with for an assignment on permutations.
You have an input of : 123 (string) using the substr functions
1) put each number of the input into an array
array[N1,N2,N3,...]
2)Create a swap function
function swap(Number A, Number B)
{
temp = Number B
Number B = Number A
Number A = temp
}
3)This algorithm uses the swap function to move the numbers around until all permutations are done.
original_string= '123'
temp_string=''
While( temp_string != original_string)
{
swap(array element[i], array element[i+1])
if (i == 1)
i == 0
temp_string = array.toString
i++
}
Hopefully you can follow my pseudo code, but this works at least for 3 digit permutations
(n X n )
built up a square matrix of nxn
and print all together its corresponding crossed values
e.g.
1 2 3 4
1 11 12 13 14
2 .. .. .. ..
3 ..
4 .. ..

How can you compare to what extent two lists are in the same order?

I have two arrays containing the same elements, but in different orders, and I want to know the extent to which their orders differ.
The method I tried, didn't work. it was as follows:
For each list I built a matrix which recorded for each pair of elements whether they were above or below each other in the list. I then calculated a pearson correlation coefficient of these two matrices. This worked extremely badly. Here's a trivial example:
list 1:
1
2
3
4
list 2:
1
3
2
4
The method I described above produced matrices like this (where 1 means the row number is higher than the column, and 0 vice-versa):
list 1:
1 2 3 4
1 1 1 1
2 1 1
3 1
4
list 2:
1 2 3 4
1 1 1 1
2 0 1
3 1
4
Since the only difference is the order of elements 2 and 3, these should be deemed to be very similar. The Pearson Correlation Coefficient for those two matrices is 0, suggesting they are not correlated at all. I guess the problem is that what I'm looking for is not really a correlation coefficient, but some other kind of similarity measure. Edit distance, perhaps?
Can anyone suggest anything better?
Mean square of differences of indices of each element.
List 1: A B C D E
List 2: A D C B E
Indices of each element of List 1 in List 2 (zero based)
A B C D E
0 3 2 1 4
Indices of each element of List 1 in List 1 (zero based)
A B C D E
0 1 2 3 4
Differences:
A B C D E
0 -2 0 2 0
Square of differences:
A B C D E
4 4
Average differentness = 8 / 5.
Just an idea, but is there any mileage in adapting a standard sort algorithm to count the number of swap operations needed to transform list1 into list2?
I think that defining the compare function may be difficult though (perhaps even just as difficult as the original problem!), and this may be inefficient.
edit: thinking about this a bit more, the compare function would essentially be defined by the target list itself. So for example if list 2 is:
1 4 6 5 3
...then the compare function should result in 1 < 4 < 6 < 5 < 3 (and return equality where entries are equal).
Then the swap function just needs to be extended to count the swap operations.
A bit late for the party here, but just for the record, I think Ben almost had it... if you'd looked further into correlation coefficients, I think you'd have found that Spearman's rank correlation coefficient might have been the way to go.
Interestingly, jamesh seems to have derived a similar measure, but not normalized.
See this recent SO answer.
You might consider how many changes it takes to transform one string into another (which I guess it was you were getting at when you mentioned edit distance).
See: http://en.wikipedia.org/wiki/Levenshtein_distance
Although I don't think l-distance takes into account rotation. If you allow rotation as an operation then:
1, 2, 3, 4
and
2, 3, 4, 1
Are pretty similar.
There is a branch-and-bound algorithm that should work for any set of operators you like. It may not be real fast. The pseudocode goes something like this:
bool bounded_recursive_compare_routine(int* a, int* b, int level, int bound){
if (level > bound) return false;
// if at end of a and b, return true
// apply rule 0, like no-change
if (*a == *b){
bounded_recursive_compare_routine(a+1, b+1, level+0, bound);
// if it returns true, return true;
}
// if can apply rule 1, like rotation, to b, try that and recur
bounded_recursive_compare_routine(a+1, b+1, level+cost_of_rotation, bound);
// if it returns true, return true;
...
return false;
}
int get_minimum_cost(int* a, int* b){
int bound;
for (bound=0; ; bound++){
if (bounded_recursive_compare_routine(a, b, 0, bound)) break;
}
return bound;
}
The time it takes is roughly exponential in the answer, because it is dominated by the last bound that works.
Added: This can be extended to find the nearest-matching string stored in a trie. I did that years ago in a spelling-correction algorithm.
I'm not sure exactly what formula it uses under the hood, but difflib.SequenceMatcher.ratio() does exactly this:
ratio(self) method of difflib.SequenceMatcher instance:
Return a measure of the sequences' similarity (float in [0,1]).
Code example:
from difflib import SequenceMatcher
sm = SequenceMatcher(None, '1234', '1324')
print sm.ratio()
>>> 0.75
Another approach that is based on a little bit of mathematics is to count the number of inversions to convert one of the arrays into the other one. An inversion is the exchange of two neighboring array elements. In ruby it is done like this:
# extend class array by new method
class Array
def dist(other)
raise 'can calculate distance only to array with same length' if length != other.length
# initialize count of inversions to 0
count = 0
# loop over all pairs of indices i, j with i<j
length.times do |i|
(i+1).upto(length) do |j|
# increase count if i-th and j-th element have different order
count += 1 if (self[i] <=> self[j]) != (other[i] <=> other[j])
end
end
return count
end
end
l1 = [1, 2, 3, 4]
l2 = [1, 3, 2, 4]
# try an example (prints 1)
puts l1.dist(l2)
The distance between two arrays of length n can be between 0 (they are the same) and n*(n+1)/2 (reversing the first array one gets the second). If you prefer to have distances always between 0 and 1 to be able to compare distances of pairs of arrays of different length, just divide by n*(n+1)/2.
A disadvantage of this algorithms is it running time of n^2. It also assumes that the arrays don't have double entries, but it could be adapted.
A remark about the code line "count += 1 if ...": the count is increased only if either the i-th element of the first list is smaller than its j-th element and the i-th element of the second list is bigger than its j-th element or vice versa (meaning that the i-th element of the first list is bigger than its j-th element and the i-th element of the second list is smaller than its j-th element). In short: (l1[i] < l1[j] and l2[i] > l2[j]) or (l1[i] > l1[j] and l2[i] < l2[j])
If one has two orders one should look at two important ranking correlation coefficients:
Spearman's rank correlation coefficient: https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
This is almost the same as Jamesh answer but scaled in the range -1 to 1.
It is defined as:
1 - ( 6 * sum_of_squared_distances ) / ( n_samples * (n_samples**2 - 1 )
Kendalls tau: https://nl.wikipedia.org/wiki/Kendalls_tau
When using python one could use:
from scipy import stats
order1 = [ 1, 2, 3, 4]
order2 = [ 1, 3, 2, 4]
print stats.spearmanr(order1, order2)[0]
>> 0.8000
print stats.kendalltau(order1, order2)[0]
>> 0.6667
if anyone is using R language, I've implemented a function that computes the "spearman rank correlation coefficient" using the method described above by #bubake here:
get_spearman_coef <- function(objectA, objectB) {
#getting the spearman rho rank test
spearman_data <- data.frame(listA = objectA, listB = objectB)
spearman_data$rankA <- 1:nrow(spearman_data)
rankB <- c()
for (index_valueA in 1:nrow(spearman_data)) {
for (index_valueB in 1:nrow(spearman_data)) {
if (spearman_data$listA[index_valueA] == spearman_data$listB[index_valueB]) {
rankB <- append(rankB, index_valueB)
}
}
}
spearman_data$rankB <- rankB
spearman_data$distance <-(spearman_data$rankA - spearman_data$rankB)**2
spearman <- 1 - ( (6 * sum(spearman_data$distance)) / (nrow(spearman_data) * ( nrow(spearman_data)**2 -1) ) )
print(paste("spearman's rank correlation coefficient"))
return( spearman)
}
results :
get_spearman_coef(c("a","b","c","d","e"), c("a","b","c","d","e"))
spearman's rank correlation coefficient: 1
get_spearman_coef(c("a","b","c","d","e"), c("b","a","d","c","e"))
spearman's rank correlation coefficient: 0.9

Resources