Groupby reduction in OpenCL? - parallel-processing

Groupby reduction in OpenCL? - parallel-processing

I want to implement a groupby reduction in OpenCL. For example, the input
a1 a2 a3 b1 b2 c3 c4
would produce
a6 b3 c7
The C pseudocode looks like this:
int data[n][2], result[n][2], result_count = -1,
added = 0, group = data[0][0];
for (int i = 0; i < n; i++) {
if (group == data[i][0]) {
added += data[i][1];
} else {
result[++result_count][0] = group;
result[result_count][1] = added;
group = data[i][0];
added = 0;
}
}
return result, result_count;
The only standard algorithm I know which goes in this direction is parallel reduction; however, it reduces to one number and not to a buffer of added values by group. I am not sure if parallel reduction could work with a dynamic result buffer (e.g. in local memory) and still be efficient in terms of performance.

A solution by Hashing
Phase 1) A hashing scheme can be used to hash the group value to a location, then an atomic add can sum the contents of the second value.
Phase 2) A prefix sum scan algorithm pass over the hash table to compact it.
Phase 3) Optionally sort the results
A solution by Sorting
Phase 1) Sort the data on the group value
Phase 2) Use a reduction to sum each group
Phase 3) A prefix sum scan to compact the sums

Related

Algorithm for downsampling array of intervals

I have a sorted array of N intervals of different length. I am plotting these intervals with alternating colors blue/green.
I am trying to find a method or algorithm to "downsample" the array of intervals to produce a visually similar plot, but with less elements.
Ideally I could write some function where I can pass the target number of output intervals as an argument. The output length only has to come close to the target.
input = [
[0, 5, "blue"],
[5, 6, "green"],
[6, 10, "blue"],
// ...etc
]
output = downsample(input, 25)
// [[0, 10, "blue"], ... ]
Below is a picture of what I am trying to accomplish. In this example the input has about 250 intervals, and the output about ~25 intervals. The input length can vary a lot.

Update 1:
Below is my original post which I initially deleted, because there were issues with displaying the equations and also I wasn't very confident if it really makes sense. But later, I figured that the optimisation problem that I described can be actually solved efficiently with DP (Dynamic programming).
So I did a sample C++ implementation. Here are some results:
Here is a live demo that you can play with in your browser (make sure browser support WebGL2, like Chrome or Firefox). It takes a bit to load the page.
Here is the C++ implementation: link
Update 2:
Turns out the proposed solution has the following nice property - we can easily control the importance of the two parts F1 and F2 of the cost function. Simply change the cost function to F(α)=F1 + αF2, where α >= 1.0 is a free parameter. The DP algorithm remains the same.
Here are some result for different α values using the same number of intervals N:
Live demo (WebGL2 required)
As can be seen, higher α means it is more important to cover the original input intervals even if this means covering more of the background in-between.
Original post
Even-though some good algorithms have already been proposed, I would like to propose a slightly unusual approach - interpreting the task as an optimisation problem. Although, I don't know how to efficiently solve the optimisation problem (or even if it can be solved in reasonable time at all), it might be useful to someone purely as a concept.
First, without loss of generality, lets declare the blue color to be background. We will be painting N green intervals on top of it (N is the number provided to the downsample() function in OP's description). The ith interval is defined by its starting coordinate 0 <= xi < xmax and width wi >= 0 (xmax is the maximum coordinate from the input).
Lets also define the array G(x) to be the number of green cells in the interval [0, x) in the input data. This array can easily be pre-calculated. We will use it to quickly calculate the number of green cells in arbitrary interval [x, y) - namely: G(y) - G(x).
We can now introduce the first part of the cost function for our optimisation problem:
The smaller F1 is, the better our generated intervals cover the input intervals, so we will be searching for xi, wi that minimise it. Ideally we want F1=0 which would mean that the intervals do not cover any of the background (which of course is not possible because N is less than the input intervals).
However, this function is not enough to describe the problem, because obviously we can minimise it by taking empty intervals: F1(x, 0)=0. Instead, we want to cover as much as possible from the input intervals. Lets introduce the second part of the cost function which corresponds to this requirement:
The smaller F2 is, the more input intervals are covered. Ideally we want F2=0 which would mean that we covered all of the input rectangles. However, minimising F2 competes with minimising F1.
Finally, we can state our optimisation problem: find xi, wi that minimize F=F1 + F2
How to solve this problem? Not sure. Maybe use some metaheuristic approach for global optimisation such as Simulated annealing or Differential evolution. These are typically easy to implement, especially for this simple cost function.
Best case would be to exist some kind of DP algorithm for solving it efficiently, but unlikely.

I would advise you to use Haar wavelet. That is a very simple algorithm which was often used to provide the functionality of progressive loading for big images on websites.
Here you can see how it works with 2D function. That is what you can use. Alas, the document is in Ukrainian, but code in C++, so readable:)
This document provides an example of 3D object:
Pseudocode on how to compress with Haar wavelet you can find in Wavelets for Computer Graphics: A Primer Part 1y.

You could do the following:
Write out the points that divide the whole strip into intervals as the array [a[0], a[1], a[2], ..., a[n-1]]. In your example, the array would be [0, 5, 6, 10, ... ].
Calculate double-interval lengths a[2]-a[0], a[3]-a[1], a[4]-a[2], ..., a[n-1]-a[n-3] and find the least of them. Let it be a[k+2]-a[k]. If there are two or more equal lengths having the lowest value, choose one of them randomly. In your example, you should get the array [6, 5, ... ] and search for the minimum value through it.
Swap the intervals (a[k], a[k+1]) and (a[k+1], a[k+2]). Basically, you need to assign a[k+1]=a[k]+a[k+2]-a[k+1] to keep the lengths, and to remove the points a[k] and a[k+2] from the array after that because two pairs of intervals of the same color are now merged into two larger intervals. Thus, the numbers of blue and green intervals decreases by one each after this step.
If you're satisfied with the current number of intervals, end the process, otherwise go to the step 1.
You performed the step 2 in order to decrease "color shift" because, at the step 3, the left interval is moved a[k+2]-a[k+1] to the right and the right interval is moved a[k+1]-a[k] to the left. The sum of these distances, a[k+2]-a[k] can be considered a measure of change you're introducing into the whole picture.
Main advantages of this approach:
It is simple.
It doesn't give a preference to any of the two colors. You don't need to assign one of the colors to be the background and the other to be the painting color. The picture can be considered both as "green-on-blue" and "blue-on-green". This reflects quite common use case when two colors just describe two opposite states (like the bit 0/1, "yes/no" answer) of some process extended in time or in space.
It always keeps the balance between colors, i.e. the sum of intervals of each color remains the same during the reduction process. Thus the total brightness of the picture doesn't change. It is important as this total brightness can be considered an "indicator of completeness" at some cases.

Here's another attempt at dynamic programming that's slightly different than Georgi Gerganov's, although the idea to try and formulate a dynamic program may have been inspired by his answer. Neither the implementation nor the concept is guaranteed to be sound but I did include a code sketch with a visual example :)
The search space in this case is not reliant on the total unit width but rather on the number of intervals. It's O(N * n^2) time and O(N * n) space, where N and n are the target and given number of (green) intervals, respectively, because we assume that any newly chosen green interval must be bound by two green intervals (rather than extend arbitrarily into the background).
The idea also utilises the prefix sum idea used to calculate runs with a majority element. We add 1 when we see the target element (in this case green) and subtract 1 for others (that algorithm is also amenable to multiple elements with parallel prefix sum tracking). (I'm not sure that restricting candidate intervals to sections with a majority of the target colour is always warranted but it may be a useful heuristic depending on the desired outcome. It's also adjustable -- we can easily adjust it to check for a different part than 1/2.)
Where Georgi Gerganov's program seeks to minimise, this dynamic program seeks to maximise two ratios. Let h(i, k) represent the best sequence of green intervals up to the ith given interval, utilising k intervals, where each is allowed to stretch back to the left edge of some previous green interval. We speculate that
h(i, k) = max(r + C*r1 + h(i-l, k-1))
where, in the current candidate interval, r is the ratio of green to the length of the stretch, and r1 is the ratio of green to the total given green. r1 is multiplied by an adjustable constant to give more weight to the volume of green covered. l is the length of the stretch.
JavaScript code (for debugging, it includes some extra variables and log lines):
function rnd(n, d=2){
let m = Math.pow(10,d)
return Math.round(m*n) / m;
}
function f(A, N, C){
let ps = [[0,0]];
let psBG = [0];
let totalG = 0;
A.unshift([0,0]);
for (let i=1; i<A.length; i++){
let [l,r,c] = A[i];
if (c == 'g'){
totalG += r - l;
let prevI = ps[ps.length-1][1];
let d = l - A[prevI][1];
let prevS = ps[ps.length-1][0];
ps.push(
[prevS - d, i, 'l'],
[prevS - d + r - l, i, 'r']
);
psBG[i] = psBG[i-1];
} else {
psBG[i] = psBG[i-1] + r - l;
}
}
//console.log(JSON.stringify(A));
//console.log('');
//console.log(JSON.stringify(ps));
//console.log('');
//console.log(JSON.stringify(psBG));
let m = new Array(N + 1);
m[0] = new Array((ps.length >> 1) + 1);
for (let i=0; i<m[0].length; i++)
m[0][i] = [0,0];
// for each in N
for (let i=1; i<=N; i++){
m[i] = new Array((ps.length >> 1) + 1);
for (let ii=0; ii<m[0].length; ii++)
m[i][ii] = [0,0];
// for each interval
for (let j=i; j<m[0].length; j++){
m[i][j] = m[i][j-1];
for (let k=j; k>i-1; k--){
// our anchors are the right
// side of each interval, k's are the left
let jj = 2*j;
let kk = 2*k - 1;
// positive means green
// is a majority
if (ps[jj][0] - ps[kk][0] > 0){
let bg = psBG[ps[jj][1]] - psBG[ps[kk][1]];
let s = A[ps[jj][1]][1] - A[ps[kk][1]][0] - bg;
let r = s / (bg + s);
let r1 = C * s / totalG;
let candidate = r + r1 + m[i-1][j-1][0];
if (candidate > m[i][j][0]){
m[i][j] = [
candidate,
ps[kk][1] + ',' + ps[jj][1],
bg, s, r, r1,k,m[i-1][j-1][0]
];
}
}
}
}
}
/*
for (row of m)
console.log(JSON.stringify(
row.map(l => l.map(x => typeof x != 'number' ? x : rnd(x)))));
*/
let result = new Array(N);
let j = m[0].length - 1;
for (let i=N; i>0; i--){
let [_,idxs,w,x,y,z,k] = m[i][j];
let [l,r] = idxs.split(',');
result[i-1] = [A[l][0], A[r][1], 'g'];
j = k - 1;
}
return result;
}
function show(A, last){
if (last[1] != A[A.length-1])
A.push(last);
let s = '';
let j;
for (let i=A.length-1; i>=0; i--){
let [l, r, c] = A[i];
let cc = c == 'g' ? 'X' : '.';
for (let j=r-1; j>=l; j--)
s = cc + s;
if (i > 0)
for (let j=l-1; j>=A[i-1][1]; j--)
s = '.' + s
}
for (let j=A[0][0]-1; j>=0; j--)
s = '.' + s
console.log(s);
return s;
}
function g(A, N, C){
const ts = f(A, N, C);
//console.log(JSON.stringify(ts));
show(A, A[A.length-1]);
show(ts, A[A.length-1]);
}
var a = [
[0,5,'b'],
[5,9,'g'],
[9,10,'b'],
[10,15,'g'],
[15,40,'b'],
[40,41,'g'],
[41,43,'b'],
[43,44,'g'],
[44,45,'b'],
[45,46,'g'],
[46,55,'b'],
[55,65,'g'],
[65,100,'b']
];
// (input, N, C)
g(a, 2, 2);
console.log('');
g(a, 3, 2);
console.log('');
g(a, 4, 2);
console.log('');
g(a, 4, 5);

I would suggest using K-means it is an algorithm used to group data(a more detailed explanation here: https://en.wikipedia.org/wiki/K-means_clustering and here https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html)
this would be a brief explanation of how the function should look like, hope it is helpful.
from sklearn.cluster import KMeans
import numpy as np
def downsample(input, cluster = 25):
# you will need to group your labels in a nmpy array as shown bellow
# for the sake of example I will take just a random array
X = np.array([[1, 2], [1, 4], [1, 0],[4, 2], [4, 4], [4, 0]])
# n_clusters will be the same as desired output
kmeans = KMeans(n_clusters= cluster, random_state=0).fit(X)
# then you can iterate through labels that was assigned to every entr of your input
# in our case the interval
kmeans_list = [None]*cluster
for i in range(0, X.shape[0]):
kmeans_list[kmeans.labels_[i]].append(X[i])
# after that you will basicly have a list of lists and every inner list will contain all points that corespond to a
# specific label
ret = [] #return list
for label_list in kmeans_list:
left = 10001000 # a big enough number to exced anything that you will get as an input
right = -left # same here
for entry in label_list:
left = min(left, entry[0])
right = max(right, entry[1])
ret.append([left,right])
return ret

Fuse tuples to find equivalence classes

Suppose we have a finite domain D={d1,..dk} containg k elements.
We consider S a subset of D^n, i.e. a set of tuples of the form < a1,..,an >, with ai in D.
We want to represent it (compactly) using S' a subset of 2^D^n, i.e. a set of tuples of the form < A1,..An > with Ai being subsets of D. The implication is that for any tuple s' in S' all elements in the cross product of Ai exist in S.
For instance, consider D={a,b,c} so k=3, n=2 and the tuples S=< a,b >+< a,c >+< b,b >+< b,c >.
We can use S'=<{a,b},{b,c}> to represent S.
This singleton solution is also minimal, S'=<{a},{b,c}>+<{b},{b,c}> is also a solution but it is larger, therefore less desirable.
Some sizes, in concrete instances, that we need to handle : k ~ 1000 elements in the domain D, n <= 10 relatively small (main source of complexity), |S| ranging to large values > 10^6.
A naïve approach consists in first plunging S into the domain of S' 2^D^n, then using the following test, two by two, two tuples s1,s2 in S' can be fused to form a single tuple in S' iff. they differ by only one component.
e.g.
< a,b >+< a,c > -> <{a},{b,c}> (differ on second component)
< b,b >+< b,c > -> <{b},{b,c}> (differ on second component)
<{a},{b,c}> + <{b},{b,c}> -> <{a,b},{b,c}> (differ on first component)
Now there could be several minimal S', we are interested in finding any one, and approximations of minimisation of some kind are also ok, provided they don't give wrong results (i.e. even if S' is not as small as it could be, but we get very fast results).
Naive algorithm has to deal with the fact that any newly introduced "fused" tuple could match with some other tuple so it scales really badly on large input sets, even with n remaining low. You need |S'|^2 comparisons to ensure convergence, and any time you do fuse two elements, I'm currently retesting every pair (how can I improve that ?).
A lot of efficiency is iteration order dependent, so sorting the set in some way(s) could be an option, or perhaps indexing using hashes, but I'm not sure how to do it.
Imperative pseudo code would be ideal, or pointers to a reformulation of the problem to something I can run a solver on would really help.

Here's some psuedo (C# code that I haven't tested) that demonstrates your S'=<{a},{b,c}>+<{b},{b,c}> method. Except for the space requirements, which when using an integer index for the element are negligible; the overall efficiency and speed for Add'ing and Test'ing tuples should be extremely fast. If you want a practical solution then you already have one you just have to use the correct ADTs.
ElementType[] domain = new ElementType[]; // a simple array of domain elements
FillDomain(domain); // insert all domain elements
SortArray(domain); // sort the domain elements K log K time
SortedDictionary<int, HashSet<int>> subsets; // int's are index/ref into domain
subsets = new SortedDictionary<int, HashSet<int>>();
//
void AddTuple(SortedDictionary<int, HashSet<int>> tuples, ElementType[] domain, ElementType first, elementType second) {
int a = BinarySearch(domain, first); // log K time (binary search)
int b = BinarySearch(domain, second); // log K time (binary search)
if(tuples.ContainsKey(a)) { // log N time (binary search on sorted keys)
if(!tuples[a].Contains(b)) { // constant time (hash lookup)
tuples[a].Add(b); // constant time (hash add)
}
} else { // constant time (instance + hash add)
tuples[a] = new HashSet<in>();
tuples[a].Add(b);
}
}
//
bool ContainsTuple(SortedDictionary<int, HashSet<int>> tuples, ElementType[] domain, ElementType first, ElementType second) {
int a = BinarySearch(domain, first); // log K time (binary search)
int b = BinarySearch(domain, second); // log K time (binary search)
if(tuples.ContainsKey(a)) { // log N time (binary search on sorted keys)
if(tuples[a].Contains(b)) { // constant time (hash test)
return true;
}
}
return false;
}
The space savings for optimizing your tuple subset S' won't outweight the slowdown of the optimization process itself. For size optimization (if you know you're K will be less than 65536 you could use short integers instead of integers in the SortedDictionary and HashSet. But even 50 mil integers only take up 4 bytes per 32bit integer * 50 mil ~= 200 MB.
EDIT
Here's another approach by encoding/mapping your tuples to a string you can take advantage of binary string compare and the fact that UTF-16 / UTF-8 encoding is very size efficient. Again this still doesn't doing the merging optimization you want, but speed and efficiency would be pretty good.
Here's some quick pseudo code in JavaScript.
Array.prototype.binarySearch = function(elm) {
var l = 0, h = this.length - 1, i;
while(l <= h) {
i = (l + h) >> 1;
if(this[i] < elm) l = ++i;
else if(this[i] > elm) h = --i;
else return i;
}
return -(++l);
};
// map your ordered domain elements to characters
// For example JavaScript's UTF-16 should be fine
// UTF-8 would work as well
var domain = {
"a": String.fromCharCode(1),
"b": String.fromCharCode(2),
"c": String.fromCharCode(3),
"d": String.fromCharCode(4)
}
var tupleStrings = [];
// map your tuple to the string encoding
function map(tuple) {
var str = "";
for(var i=0; i<tuple.length; i++) {
str += domain[tuple[i]];
}
return str;
}
function add(tuple) {
var str = map(tuple);
// binary search
var index = tupleStrings.binarySearch(str);
if(index < 0) index = ~index;
// insert depends on tupleString's type implementation
tupleStrings.splice(index, 0, str);
}
function contains(tuple) {
var str = map(tuple);
// binary search
return tupleString.binarySearch(str) >= 0;
}
add(["a","b"]);
add(["a","c"]);
add(["b","b"]);
add(["b","c"]);
add(["c","c"]);
add(["d","a"]);
alert(contains(["a","a"]));
alert(contains(["d","a"]));
alert(JSON.stringify(tupleStrings, null, "\n"));

number to unique permutation mapping of a sequence containing duplicates

I am looking for an algorithm that can map a number to a unique permutation of a sequence. I have found out about Lehmer codes and the factorial number system thanks to a similar question, Fast permutation -> number -> permutation mapping algorithms, but that question doesn't deal with the case where there are duplicate elements in the sequence.
For example, take the sequence 'AAABBC'. There are 6! = 720 ways that could be arranged, but I believe there are only 6! / (3! * 2! * 1!) = 60 unique permutation of this sequence. How can I map a number to a permutation in these cases?
Edit: changed the term 'set' to 'sequence'.

From Permutation to Number:
Let K be the number of character classes (example: AAABBC has three character classes)
Let N[K] be the number of elements in each character class. (example: for AAABBC, we have N[K]=[3,2,1], and let N= sum(N[K])
Every legal permutation of the sequence then uniquely corresponds to a path in an incomplete K-way tree.
The unique number of the permutation then corresponds to the index of the tree-node in a post-order traversal of the K-ary tree terminal nodes.
Luckily, we don't actually have to perform the tree traversal -- we just need to know how many terminal nodes in the tree are lexicographically less than our node. This is very easy to compute, as at any node in the tree, the number terminal nodes below the current node is equal to the number of permutations using the unused elements in the sequence, which has a closed form solution that is a simple multiplication of factorials.
So given our 6 original letters, and the first element of our permutation is a 'B', we determine that there will be 5!/3!1!1! = 20 elements that started with 'A', so our permutation number has to be greater than 20. Had our first letter been a 'C', we could have calculated it as 5!/2!2!1! (not A) + 5!/3!1!1! (not B) = 30+ 20, or alternatively as
60 (total) - 5!/3!2!0! (C) = 50
Using this, we can take a permutation (e.g. 'BAABCA') and perform the following computations:
Permuation #= (5!/2!2!1!) ('B') + 0('A') + 0('A')+ 3!/1!1!1! ('B') + 2!/1!
= 30 + 3 +2 = 35
Checking that this works: CBBAAA corresponds to
(5!/2!2!1! (not A) + 5!/3!1!1! (not B)) 'C'+ 4!/2!2!0! (not A) 'B' + 3!/2!1!0! (not A) 'B' = (30 + 20) +6 + 3 = 59
Likewise, AAABBC =
0 ('A') + 0 'A' + '0' A' + 0 'B' + 0 'B' + 0 'C = 0
Sample implementation:
import math
import copy
from operator import mul
def computePermutationNumber(inPerm, inCharClasses):
permutation=copy.copy(inPerm)
charClasses=copy.copy(inCharClasses)
n=len(permutation)
permNumber=0
for i,x in enumerate(permutation):
for j in xrange(x):
if( charClasses[j]>0):
charClasses[j]-=1
permNumber+=multiFactorial(n-i-1, charClasses)
charClasses[j]+=1
if charClasses[x]>0:
charClasses[x]-=1
return permNumber
def multiFactorial(n, charClasses):
val= math.factorial(n)/ reduce(mul, (map(lambda x: math.factorial(x), charClasses)))
return val
From Number to Permutation:
This process can be done in reverse, though I'm not sure how efficiently:
Given a permutation number, and the alphabet that it was generated from, recursively subtract the largest number of nodes less than or equal to the remaining permutation number.
E.g. Given a permutation number of 59, we first can subtract 30 + 20 = 50 ('C') leaving 9. Then we can subtract 'B' (6) and a second 'B'(3), re-generating our original permutation.

Here is an algorithm in Java that enumerates the possible sequences by mapping an integer to the sequence.
public class Main {
private int[] counts = { 3, 2, 1 }; // 3 Symbols A, 2 Symbols B, 1 Symbol C
private int n = sum(counts);
public static void main(String[] args) {
new Main().enumerate();
}
private void enumerate() {
int s = size(counts);
for (int i = 0; i < s; ++i) {
String p = perm(i);
System.out.printf("%4d -> %s\n", i, p);
}
}
// calculates the total number of symbols still to be placed
private int sum(int[] counts) {
int n = 0;
for (int i = 0; i < counts.length; i++) {
n += counts[i];
}
return n;
}
// calculates the number of different sequences with the symbol configuration in counts
private int size(int[] counts) {
int res = 1;
int num = 0;
for (int pos = 0; pos < counts.length; pos++) {
for (int den = 1; den <= counts[pos]; den++) {
res *= ++num;
res /= den;
}
}
return res;
}
// maps the sequence number to a sequence
private String perm(int num) {
int[] counts = this.counts.clone();
StringBuilder sb = new StringBuilder(n);
for (int i = 0; i < n; ++i) {
int p = 0;
for (;;) {
while (counts[p] == 0) {
p++;
}
counts[p]--;
int c = size(counts);
if (c > num) {
sb.append((char) ('A' + p));
break;
}
counts[p]++;
num -= c;
p++;
}
}
return sb.toString();
}
}
The mapping used by the algorithm is as follows. I use the example given in the question (3 x A, 2 x B, 1 x C) to illustrate it.
There are 60 (=6!/3!/2!/1!) possible sequences in total, 30 (=5!/2!/2!/1!) of them have an A at the first place, 20 (=5!/3!/1!/1!) have a B at the first place, and 10 (=5!/3!/2!/0!) have a C at the first place.
The numbers 0..29 are mapped to all sequences starting with an A, 30..49 are mapped to the sequences starting with B, and 50..59 are mapped to the sequences starting with C.
The same process is repeated for the next place in the sequence, for example if we take the sequences starting with B we have now to map numbers 0 (=30-30) .. 19 (=49-30) to the sequences with configuration (3 x A, 1 x B, 1 x C)

A very simple algorithm to mapping a number for a permutation consists of n digits is
number<-digit[0]*10^(n-1)+digit[1]*10^(n-2)+...+digit[n]*10^0
You can find plenty of resources for algorithms to generate permutations. I guess you want to use this algorithm in bioinformatics. For example you can use itertools.permutations from Python.

Assuming the resulting number fits inside a word (e.g. 32 or 64 bit integer) relatively easily, then much of the linked article still applies. Encoding and decoding from a variable base remains the same. What changes is how the base varies.
If you're creating a permutation of a sequence, you pick an item out of your bucket of symbols (from the original sequence) and put it at the start. Then you pick out another item from your bucket of symbols and put it on the end of that. You'll keep picking and placing symbols at the end until you've run out of symbols in your bucket.
What's significant is which item you picked out of the bucket of the remaining symbols each time. The number of remaining symbols is something you don't have to record because you can compute that as you build the permutation -- that's a result of your choices, not the choices themselves.
The strategy here is to record what you chose, and then present an array of what's left to be chosen. Then choose, record which index you chose (packing it via the variable base method), and repeat until there's nothing left to choose. (Just as above when you were building a permuted sequence.)
In the case of duplicate symbols it doesn't matter which one you picked, so you can treat them as the same symbol. The difference is that when you pick a symbol which still has a duplicate left, you didn't reduce the number of symbols in the bucket to pick from next time.
Let's adopt a notation that makes this clear:
Instead of listing duplicate symbols left in our bucket to choose from like c a b c a a we'll list them along with how many are still in the bucket: c-2 a-3 b-1.
Note that if you pick c from the list, the bucket has c-1 a-3 b-1 left in it. That means next time we pick something, we have three choices.
But on the other hand, if I picked b from the list, the bucket has c-2 a-3 left in it. That means next time we pick something, we only have two choices.
When reconstructing the permuted sequence we just maintain the bucket the same way as when we were computing the permutation number.
The implementation details aren't trivial, but they're straightforward with standard algorithms. The only thing that might heckle you is what to do when a symbol in your bucket is no longer available.
Suppose your bucket was represented by a list of pairs (like above): c-1 a-3 b-1 and you choose c. Your resulting bucket is c-0 a-3 b-1. But c-0 is no longer a choice, so your list should only have two entries, not three. You could move the entire list down by 1 resulting in a-3 b-1, but if your list is long this is expensive. A fast an easy solution: move the last element of the bucket into the removed location and decrease your bucket size: c0 a-3 b-1 becomes b-1 a-3 <empty> or just b-1 a-3.
Note that we can do the above because it doesn't matter what order the symbols in the bucket are listed in, as long as it's the same way when we encode or decode the number.

As I was unsure of the code in gbronner's answer (or of my understanding), I recoded it in R as follows
ritpermz=function(n, parclass){
return(factorial(n) / prod(factorial(parclass)))}
rankum <- function(confg, parclass){
n=length(confg)
permdex=1
for (i in 1:(n-1)){
x=confg[i]
if (x > 1){
for (j in 1:(x-1)){
if(parclass[j] > 0){
parclass[j]=parclass[j]-1
permdex=permdex + ritpermz(n-i, parclass)
parclass[j]=parclass[j]+1}}}
parclass[x]=parclass[x]-1
}#}
return(permdex)
}
which does produce a ranking with the right range of integers

Sorted list difference

I have the following problem.
I have a set of elements that I can sort by a certain algorithm A . The sorting is good, but very expensive.
There is also an algorithm B that can approximate the result of A. It is much faster, but the ordering will not be exactly the same.
Taking the output of A as a 'golden standard' I need to get a meaningful estimate of the error resulting of the use of B on the same data.
Could anyone please suggest any resource I could look at to solve my problem?
Thanks in advance!
EDIT :
As requested : adding an example to illustrate the case :
if the data are the first 10 letters of the alphabet,
A outputs : a,b,c,d,e,f,g,h,i,j
B outputs : a,b,d,c,e,g,h,f,j,i
What are the possible measures of the resulting error, that would allow me to tune the internal parameters of algorithm B to get result closer to the output of A?

Spearman's rho
I think what you want is Spearman's rank correlation coefficient. Using the index [rank] vectors for the two sortings (perfect A and approximate B), you calculate the rank correlation rho ranging from -1 (completely different) to 1 (exactly the same):
where d(i) are the difference in ranks for each character between A and B
You can defined your measure of error as a distance D := (1-rho)/2.

I would determine the largest correctly ordered sub set.
+-------------> I
| +--------->
| |
A -> B -> D -----> E -> G -> H --|--> J
| ^ | | ^
| | | | |
+------> C ---+ +-----------> F ---+
In your example 7 out of 10 so the algorithm scores 0.7. The other sets have the length 6. Correct ordering scores 1.0, reverse ordering 1/n.
I assume that this is related to the number of inversions. x + y indicates x <= y (correct order) and x - y indicates x > y (wrong order).
A + B + D - C + E + G + H - F + J - I
We obtain almost the same result - 6 of 9 are correct scorring 0.667. Again correct ordering scores 1.0 and reverse ordering 0.0 and this might be much easier to calculate.

Are you looking for finding some algorithm that calculates the difference based on array sorted with A and array sorted with B as inputs? Or are you looking for a generic method of determining on average how off an array would be when sorted with B?
If the first, then I suggest something as simple as the distance each item is from where it should be (an average would do better than a sum to remove length of array as an issue)
If the second, then I think I'd need to see more about these algorithms.

It's tough to give a good generic answer, because the right solution for you will depend on your application.
One of my favorite options is just the number of in-order element pairs, divided by the total number of pairs. This is a nice, simple, easy-to-compute metric that just tells you how many mistakes there are. But it doesn't make any attempt to quantify the magnitude of those mistakes.
double sortQuality = 1;
if (array.length > 1) {
int inOrderPairCount = 0;
for (int i = 1; i < array.length; i++) {
if (array[i] >= array[i - 1]) ++inOrderPairCount;
}
sortQuality = (double) inOrderPairCount / (array.length - 1);
}

Calculating RMS Error may be one of the many possible methods. Here is small python code.
def calc_error(out_A,out_B):
# in <= input
# out_A <= output of algorithm A
# out_B <= output of algorithm B
rms_error = 0
for i in range(len(out_A)):
# Take square of differences and add
rms_error += (out_A[i]-out_B[i])**2
return rms_error**0.5 # Take square root
>>> calc_error([1,2,3,4,5,6],[1,2,3,4,5,6])
0.0
>>> calc_error([1,2,3,4,5,6],[1,2,4,3,5,6]) # 4,3 swapped
1.414
>>> calc_error([1,2,3,4,5,6],[1,2,4,6,3,5]) # 3,4,5,6 randomized
2.44
NOTE:
Taking square root is not necessary but taking squares is as just differences may sum to zero. I think that calc_error function gives approximate number of wrongly placed pairs but I dont have any programming tools handy so :(.
Take a look at this question.

you could try something involving hamming distance

if anyone is using R language, I've implemented a function that computes the "spearman rank correlation coefficient" using the method described above by #bubake :
get_spearman_coef <- function(objectA, objectB) {
#getting the spearman rho rank test
spearman_data <- data.frame(listA = objectA, listB = objectB)
spearman_data$rankA <- 1:nrow(spearman_data)
rankB <- c()
for (index_valueA in 1:nrow(spearman_data)) {
for (index_valueB in 1:nrow(spearman_data)) {
if (spearman_data$listA[index_valueA] == spearman_data$listB[index_valueB]) {
rankB <- append(rankB, index_valueB)
}
}
}
spearman_data$rankB <- rankB
spearman_data$distance <-(spearman_data$rankA - spearman_data$rankB)**2
spearman <- 1 - ( (6 * sum(spearman_data$distance)) / (nrow(spearman_data) * ( nrow(spearman_data)**2 -1) ) )
print(paste("spearman's rank correlation coefficient"))
return( spearman)
}
results :
get_spearman_coef(c("a","b","c","d","e"), c("a","b","c","d","e"))
spearman's rank correlation coefficient: 1
get_spearman_coef(c("a","b","c","d","e"), c("b","a","d","c","e"))
spearman's rank correlation coefficient: 0.9

Good algorithm for combining items from N lists into one with balanced distribution?

Let's say I have the three following lists
A1
A2
A3
B1
B2
C1
C2
C3
C4
C5
I'd like to combine them into a single list, with the items from each list as evenly distributed as possible sorta like this:
C1
A1
C2
B1
C3
A2
C4
B2
A3
C5
I'm using .NET 3.5/C# but I'm looking more for how to approach it then specific code.
EDIT: I need to keep the order of elements from the original lists.

Take a copy of the list with the most members. This will be the destination list.
Then take the list with the next largest number of members.
divide the destination list length by the smaller length to give a fractional value of greater than one.
For each item in the second list, maintain a float counter. Add the value calculated in the previous step, and mathematically round it to the nearest integer (keep the original float counter intact). Insert it at this position in the destination list and increment the counter by 1 to account for it. Repeat for all list members in the second list.
Repeat steps 2-5 for all lists.
EDIT: This has the advantage of being O(n) as well, which is always nice :)

Implementation of
Andrew Rollings' answer:
public List<String> equimix(List<List<String>> input) {
// sort biggest list to smallest list
Collections.sort(input, new Comparator<List<String>>() {
public int compare(List<String> a1, List<String> a2) {
return a2.size() - a1.size();
}
});
List<String> output = input.get(0);
for (int i = 1; i < input.size(); i++) {
output = equimix(output, input.get(i));
}
return output;
}
public List<String> equimix(List<String> listA, List<String> listB) {
if (listB.size() > listA.size()) {
List<String> temp;
temp = listB;
listB = listA;
listA = temp;
}
List<String> output = listA;
double shiftCoeff = (double) listA.size() / listB.size();
double floatCounter = shiftCoeff;
for (String item : listB) {
int insertionIndex = (int) Math.round(floatCounter);
output.add(insertionIndex, item);
floatCounter += (1+shiftCoeff);
}
return output;
}

First, this answer is more of a train of thought than a concete solution.
OK, so you have a list of 3 items (A1, A2, A3), where you want A1 to be somewhere in the first 1/3 of the target list, A2 in the second 1/3 of the target list, and A3 in the third 1/3. Likewise you want B1 to be in the first 1/2, etc...
So you allocate your list of 10 as an array, then start with the list with the most items, in this case C. Calculate the spot where C1 should fall (1.5) Drop C1 in the closest spot, (in this case, either 1 or 2), then calculate where C2 should fall (3.5) and continue the process until there are no more Cs.
Then go with the list with the second-to-most number of items. In this case, A. Calculate where A1 goes (1.66), so try 2 first. If you already put C1 there, try 1. Do the same for A2 (4.66) and A3 (7.66). Finally, we do list B. B1 should go at 2.5, so try 2 or 3. If both are taken, try 1 and 4 and keep moving radially out until you find an empty spot. Do the same for B2.
You'll end up with something like this if you pick the lower number:
C1 A1 C2 A2 C3 B1 C4 A3 C5 B2
or this if you pick the upper number:
A1 C1 B1 C2 A2 C3 A3 C4 B2 C5
This seems to work pretty well for your sample lists, but I don't know how well it will scale to many lists with many items. Try it and let me know how it goes.

Make a hash table of lists.
For each list, store the nth element in the list under the key (/ n (+ (length list) 1))
Optionally, shuffle the lists under each key in the hash table, or sort them in some way
Concatenate the lists in the hash by sorted key

I'm thinking of a divide and conquer approach. Each iteration of which you split all the lists with elements > 1 in half and recurse. When you get to a point where all the lists except one are of one element you can randomly combine them, pop up a level and randomly combine the lists removed from that frame where the length was one... et cetera.
Something like the following is what I'm thinking:
- filter lists into three categories
- lists of length 1
- first half of the elements of lists with > 1 elements
- second half of the elements of lists with > 1 elements
- recurse on the first and second half of the lists if they have > 1 element
- combine results of above computation in order
- randomly combine the list of singletons into returned list

You could simply combine the three lists into a single list and then UNSORT that list. An unsorted list should achieve your requirement of 'evenly-distributed' without too much effort.
Here's an implementation of unsort: http://www.vanheusden.com/unsort/.

A quick suggestion, in python-ish pseudocode:
merge = list()
lists = list(list_a, list_b, list_c)
lists.sort_by(length, descending)
while lists is not empty:
l = lists.remove_first()
merge.append(l.remove_first())
if l is not empty:
next = lists.remove_first()
lists.append(l)
lists.sort_by(length, descending)
lists.prepend(next)
This should distribute elements from shorter lists more evenly than the other suggestions here.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio