Sum of numbers with approximation and no repetition - algorithm

For an app I'm working on, I need to process an array of numbers and return a new array such that the sum of the elements are as close as possible to a target sum. This is similar to the coin-counting problem, with two differences:
Each element of the new array has to come from the input array (i.e. no repetition/duplication)
The algorithm should stop when it finds an array whose sum falls within X of the target number (e.g., given [10, 12, 15, 23, 26], a target of 35, and a sigma of 5, a result of [10, 12, 15] (sum 37) is OK but a result of [15, 26] (sum 41) is not.
I was considering the following algorithm (in pseudocode) but I doubt that this is the best way to do it.
function (array, goal, sigma)
var A = []
for each element E in array
if (E + (sum of rest of A) < goal +/- sigma)
A.push(E)
return A
For what it's worth, the language I'm using is Javascript. Any advice is much appreciated!

This is not intended as the best answer possible, just maybe something that will work well enough. All remarks/input is welcome.
Also, this is taking into mind the answers from the comments, that the input is length of songs (usually 100 - 600), the length of the input array is between 5 to 50 and the goal is anywhere between 100 to 7200.
The idea:
Start with finding the average value of the input, and then work out a guess on the number of input values you're going to need. Lets say that comes out x.
Order your input.
Take the first x-1 values and substitute the smallest one with the any other to get to your goal (somewhere in the range). If none exist, find a number so you're still lower than the goal.
Repeat step #3 using backtracking or something like that. Maybe limit the number of trials you're gonna spend there.
x++ and go back to step #3.

I would use some kind of divide and conquer and a recursive implementation. Here is a prototype in Smalltalk
SequenceableCollection>>subsetOfSum: s plusOrMinus: d
"check if a singleton matches"
self do: [:v | (v between: s - d and: s + d) ifTrue: [^{v}]].
"nope, engage recursion with a smaller collection"
self keysAndValuesDo: [:i :v |
| sub |
sub := (self copyWithoutIndex: i) subsetOfSum: s-v plusOrMinus: d.
sub isNil ifFalse: [^sub copyWith: v]].
"none found"
^nil
Using like this:
#(10 12 15 23 26) subsetOfSum: 62 plusOrMinus: 3.
gives:
#(23 15 12 10)

With limited input this problem is good candidate for dynamic programming with time complexity O((Sum + Sigma) * ArrayLength)
Delphi code:
function FindCombination(const A: array of Integer; Sum, Sigma: Integer): string;
var
Sums: array of Integer;
Value, idx: Integer;
begin
Result := '';
SetLength(Sums, Sum + Sigma + 1); //zero-initialized array
Sums[0] := 1; //just non-zero
for Value in A do begin
idx := Sum + Sigma;
while idx >= Value do begin
if Sums[idx - Value] <> 0 then begin //(idx-Value) sum can be formed from array]
Sums[idx] := Value; //value is included in this sum
if idx >= Sum - Sigma then begin //bingo!
while idx > 0 do begin //unwind and extract all values for this sum
Result := Result + IntToStr(Sums[idx]) + ' ';
idx := idx - Sums[idx];
end;
Exit;
end;
end;
Dec(idx); //idx--
end;
end;
end;

Here's one commented algorithm in JavaScript:
var arr = [9, 12, 20, 23, 26];
var target = 35;
var sigma = 5;
var n = arr.length;
// sort the numbers in ascending order
arr.sort(function(a,b){return a-b;});
// initialize the recursion
var stack = [[0,0,[]]];
while (stack[0] !== undefined){
var params = stack.pop();
var i = params[0]; // index
var s = params[1]; // sum so far
var r = params[2]; // accumulating list of numbers
// if the sum is within range, output sum
if (s >= target - sigma && s <= target + sigma){
console.log(r);
break;
// since the numbers are sorted, if the current
// number makes the sum too large, abandon this thread
} else if (s + arr[i] > target + sigma){
continue;
}
// there are still enough numbers left to skip this one
if (i < n - 1){
stack.push([i + 1,s,r]);
}
// there are still enough numbers left to add this one
if (i < n){
_r = r.slice();
_r.push(arr[i]);
stack.push([i + 1,s + arr[i],_r]);
}
}
/* [9,23] */

Related

Reconstructing input to encoder from output

I would like to understand how to solve the Codility ArrayRecovery challenge, but I don't even know what branch of knowledge to consult. Is it combinatorics, optimization, computer science, set theory, or something else?
Edit:
The branch of knowledge to consult is constraint programming, particularly constraint propagation. You also need some combinatorics to know that if you take k numbers at a time from the range [1..n], with the restriction that no number can be bigger than the one before it, that works out to be
(n+k-1)!/k!(n-1)! possible combinations
which is the same as the number of combinations with replacements of n things taken k at a time, which has the mathematical notation . You can read about why it works out like that here.
Peter Norvig provides an excellent example of how to solve this kind of problem with his Sudoku solver.
You can read the full description of the ArrayRecovery problem via the link above. The short story is that there is an encoder that takes a sequence of integers in the range 1 up to some given limit (say 100 for our purposes) and for each element of the input sequence outputs the most recently seen integer that is smaller than the current input, or 0 if none exists.
input 1, 2, 3, 4 => output 0, 1, 2, 3
input 2, 4, 3 => output 0, 2, 2
The full task is, given the output (and the range of allowable input), figure out how many possible inputs could have generated it. But before I even get to that calculation, I'm not confident about how to even approach formulating the equation. That is what I am asking for help with. (Of course a full solution would be welcome, too, if it is explained.)
I just look at some possible outputs and wonder. Here are some sample encoder outputs and the inputs I can come up with, with * meaning any valid input and something like > 4 meaning any valid input greater than 4. If needed, inputs are referred to as A1, A2, A3, ... (1-based indexing)
Edit #2
Part of the problem I was having with this challenge is that I did not manually generate the exactly correct sets of possible inputs for an output. I believe the set below is correct now. Look at this answer's edit history if you want to see my earlier mistakes.
output #1: 0, 0, 0, 4
possible inputs: [>= 4, A1 >= * >= 4, 4, > 4]
output #2: 0, 0, 0, 2, 3, 4 # A5 ↴ See more in discussion below
possible inputs: [>= 2, A1 >= * >=2, 2, 3, 4, > 4]
output #3: 0, 0, 0, 4, 3, 1
possible inputs: none # [4, 3, 1, 1 >= * > 4, 4, > 1] but there is no number 1 >= * > 4
The second input sequence is very tightly constrained compared to the first just by adding 2 more outputs. The third sequence is so constrained as to be impossible.
But the set of constraints on A5 in example #2 is a bit harder to articulate. Of course A5 > O5, that is the basic constraint on all the inputs. But any output > A4 and after O5 has to appear in the input after A4, so A5 has to be an element of the set of numbers that comes after A5 that is also > A4. Since there is only 1 such number (A6 == 4), A5 has to be it, but it gets more complicated if there is a longer string of numbers that follow. (Editor's note: actually it doesn't.)
As the output set gets longer, I worry these constraints just get more complicated and harder to get right. I cannot think of any data structures for efficiently representing these in a way that leads to efficiently calculating the number of possible combinations. I also don't quite see how to algorithmically add constraint sets together.
Here are the constraints I see so far for any given An
An > On
An <= min(Set of other possible numbers from O1 to n-1 > On). How to define the set of possible numbers greater than On?
Numbers greater than On that came after the most recent occurrence of On in the input
An >= max(Set of other possible numbers from O1 to n-1 < On). How to define the set of possible numbers less than On?
Actually this set is empty because On is, by definition, the largest possible number from the previous input sequence. (Which it not to say it is strictly the largest number from the previous input sequence.)
Any number smaller than On that came before the last occurrence of it in the input would be ineligible because of the "nearest" rule. No numbers smaller that On could have occurred after the most recent occurrence because of the "nearest" rule and because of the transitive property: if Ai < On and Aj < Ai then Aj < On
Then there is the set theory:
An must be an element of the set of unaccounted-for elements of the set of On+1 to Om, where m is the smallest m > n such that Om < On. Any output after such Om and larger than Om (which An is) would have to appear as or after Am.
An element is unaccounted-for if it is seen in the output but does not appear in the input in a position that is consistent with the rest of the output. Obviously I need a better definition than this in order to code and algorithm to calculate it.
It seems like perhaps some kind of set theory and/or combinatorics or maybe linear algebra would help with figuring out the number of possible sequences that would account for all of the unaccounted-for outputs and fit the other constraints. (Editor's note: actually, things never get that complicated.)
The code below passes all of Codility's tests. The OP added a main function to use it on the command line.
The constraints are not as complex as the OP thinks. In particular, there is never a situation where you need to add a restriction that an input be an element of some set of specific integers seen elsewhere in the output. Every input position has a well-defined minimum and maximum.
The only complication to that rule is that sometimes the maximum is "the value of the previous input" and that input itself has a range. But even then, all the values like that are consecutive and have the same range, so the number of possibilities can be calculated with basic combinatorics, and those inputs as a group are independent of the other inputs (which only serve to set the range), so the possibilities of that group can be combined with the possibilities of other input positions by simple multiplication.
Algorithm overview
The algorithm makes a single pass through the output array updating the possible numbers of input arrays after every span, which is what I am calling repetitions of numbers in the output. (You might say maximal subsequences of the output where every element is identical.) For example, for output 0,1,1,2 we have three spans: 0, 1,1 and 2. When a new span begins, the number of possibilities for the previous span is calculated.
This decision was based on a few observations:
For spans longer than 1 in length, the minimum value of the input
allowed in the first position is whatever the value is of the input
in the second position. Calculating the number of possibilities of a
span is straightforward combinatorics, but the standard formula
requires knowing the range of the numbers and the length of the span.
Every time the value of the
output changes (and a new span beings), that strongly constrains the value of the previous span:
When the output goes up, the only possible reason is that the previous input was the value of the new, higher output and the input corresponding to the position of the new, higher output, was even higher.
When an output goes down, new constraints are established, but those are a bit harder to articulate. The algorithm stores stairs (see below) in order to quantify the constraints imposed when the output goes down
The aim here was to confine the range of possible values for every span. Once we do that accurately, calculating the number of combinations is straightforward.
Because the encoder backtracks looking to output a number that relates to the input in 2 ways, both smaller and closer, we know we can throw out numbers that are larger and farther away. After a small number appears in the output, no larger number from before that position can have any influence on what follows.
So to confine these ranges of input when the output sequence decreased, we need to store stairs - a list of increasingly larger possible values for the position in the original array. E.g for 0,2,5,7,2,4 stairs build up like this: 0, 0,2, 0,2,5, 0,2,5,7, 0,2, 0,2,4.
Using these bounds we can tell for sure that the number in the position of the second 2 (next to last position in the example) must be in (2,5], because 5 is the next stair. If the input were greater than 5, a 5 would have been output in that space instead of a 2. Observe, that if the last number in the encoded array was not 4, but 6, we would exit early returning 0, because we know that the previous number couldn't be bigger than 5.
The complexity is O(n*lg(min(n,m))).
Functions
CombinationsWithReplacement - counts number of combinations with replacements of size k from n numbers. E.g. for (3, 2) it counts 3,3, 3,2, 3,1, 2,2, 2,1, 1,1, so returns 6 It is the same as choose(n - 1 + k, n - 1).
nextBigger - finds next bigger element in a range. E.g. for 4 in sub-array 1,2,3,4,5 it returns 5, and in sub-array 1,3 it returns its parameter Max.
countSpan (lambda) - counts how many different combinations a span we have just passed can have. Consider span 2,2 for 0,2,5,7,2,2,7.
When curr gets to the final position, curr is 7 and prev is the final 2 of the 2,2 span.
It computes maximum and minimum possible values of the prev span. At this point stairs consist of 2,5,7 then maximum possible value is 5 (nextBigger after 2 in the stair 2,5,7). A value of greater than 5 in this span would have output a 5, not a 2.
It computes a minimum value for the span (which is the minimum value for every element in the span), which is prev at this point, (remember curr at this moment equals to 7 and prev to 2). We know for sure that in place of the final 2 output, the original input has to have 7, so the minimum is 7. (This is a consequence of the "output goes up" rule. If we had 7,7,2 and curr would be 2 then the minimum for the previous span (the 7,7) would be 8 which is prev + 1.
It adjusts the number of combinations. For a span of length L with a range of n possibilities (1+max-min), there are possibilities, with k being either L or L-1 depending on what follows the span.
For a span followed by a larger number, like 2,2,7, k = L - 1 because the last position of the 2,2 span has to be 7 (the value of the first number after the span).
For a span followed by a smaller number, like 7,7,2, k = L because
the last element of 7,7 has no special constraints.
Finally, it calls CombinationsWithReplacement to find out the number of branches (or possibilities), computes new res partial results value (remainder values in the modulo arithmetic we are doing), and returns new res value and max for further handling.
solution - iterates over the given Encoder Output array. In the main loop, while in a span it counts the span length, and at span boundaries it updates res by calling countSpan and possibly updates the stairs.
If the current span consists of a bigger number than the previous one, then:
Check validity of the next number. E.g 0,2,5,2,7 is invalid input, becuase there is can't be 7 in the next-to-last position, only 3, or 4, or 5.
It updates the stairs. When we have seen only 0,2, the stairs are 0,2, but after the next 5, the stairs become 0,2,5.
If the current span consists of a smaller number then the previous one, then:
It updates stairs. When we have seen only 0,2,5, our stairs are 0,2,5, but after we have seen 0,2,5,2 the stairs become 0,2.
After the main loop it accounts for the last span by calling countSpan with -1 which triggers the "output goes down" branch of calculations.
normalizeMod, extendedEuclidInternal, extendedEuclid, invMod - these auxiliary functions help to deal with modulo arithmetic.
For stairs I use storage for the encoded array, as the number of stairs never exceeds current position.
#include <algorithm>
#include <cassert>
#include <vector>
#include <tuple>
const int Modulus = 1'000'000'007;
int CombinationsWithReplacement(int n, int k);
template <class It>
auto nextBigger(It begin, It end, int value, int Max) {
auto maxIt = std::upper_bound(begin, end, value);
auto max = Max;
if (maxIt != end) {
max = *maxIt;
}
return max;
}
auto solution(std::vector<int> &B, const int Max) {
auto res = 1;
const auto size = (int)B.size();
auto spanLength = 1;
auto prev = 0;
// Stairs is the list of numbers which could be smaller than number in the next position
const auto stairsBegin = B.begin();
// This includes first entry (zero) into stairs
// We need to include 0 because we can meet another zero later in encoded array
// and we need to be able to find in stairs
auto stairsEnd = stairsBegin + 1;
auto countSpan = [&](int curr) {
const auto max = nextBigger(stairsBegin, stairsEnd, prev, Max);
// At the moment when we switch from the current span to the next span
// prev is the number from previous span and curr from current.
// E.g. 1,1,7, when we move to the third position cur = 7 and prev = 1.
// Observe that, in this case minimum value possible in place of any of 1's can be at least 2=1+1=prev+1.
// But if we consider 7, then we have even more stringent condition for numbers in place of 1, it is 7
const auto min = std::max(prev + 1, curr);
const bool countLast = prev > curr;
const auto branchesCount = CombinationsWithReplacement(max - min + 1, spanLength - (countLast ? 0 : 1));
return std::make_pair(res * (long long)branchesCount % Modulus, max);
};
for (int i = 1; i < size; ++i) {
const auto curr = B[i];
if (curr == prev) {
++spanLength;
}
else {
int max;
std::tie(res, max) = countSpan(curr);
if (prev < curr) {
if (curr > max) {
// 0,1,5,1,7 - invalid because number in the fourth position lies in [2,5]
// and so in the fifth encoded position we can't something bigger than 5
return 0;
}
// It is time to possibly shrink stairs.
// E.g if we had stairs 0,2,4,9,17 and current value is 5,
// then we no more interested in 9 and 17, and we change stairs to 0,2,4,5.
// That's because any number bigger than 9 or 17 also bigger than 5.
const auto s = std::lower_bound(stairsBegin, stairsEnd, curr);
stairsEnd = s;
*stairsEnd++ = curr;
}
else {
assert(curr < prev);
auto it = std::lower_bound(stairsBegin, stairsEnd, curr);
if (it == stairsEnd || *it != curr) {
// 0,5,1 is invalid sequence because original sequence lloks like this 5,>5,>1
// and there is no 1 in any of the two first positions, so
// it can't appear in the third position of the encoded array
return 0;
}
}
spanLength = 1;
}
prev = curr;
}
res = countSpan(-1).first;
return res;
}
template <class T> T normalizeMod(T a, T m) {
if (a < 0) return a + m;
return a;
}
template <class T> std::pair<T, std::pair<T, T>> extendedEuclidInternal(T a, T b) {
T old_x = 1;
T old_y = 0;
T x = 0;
T y = 1;
while (true) {
T q = a / b;
T t = a - b * q;
if (t == 0) {
break;
}
a = b;
b = t;
t = x; x = old_x - x * q; old_x = t;
t = y; y = old_y - y * q; old_y = t;
}
return std::make_pair(b, std::make_pair(x, y));
}
// Returns gcd and Bezout's coefficients
template <class T> std::pair<T, std::pair<T, T>> extendedEuclid(T a, T b) {
if (a > b) {
if (b == 0) return std::make_pair(a, std::make_pair(1, 0));
return extendedEuclidInternal(a, b);
}
else {
if (a == 0) return std::make_pair(b, std::make_pair(0, 1));
auto p = extendedEuclidInternal(b, a);
std::swap(p.second.first, p.second.second);
return p;
}
}
template <class T> T invMod(T a, T m) {
auto p = extendedEuclid(a, m);
assert(p.first == 1);
return normalizeMod(p.second.first, m);
}
int CombinationsWithReplacement(int n, int k) {
int res = 1;
for (long long i = n; i < n + k; ++i) {
res = res * i % Modulus;
}
int denom = 1;
for (long long i = k; i > 0; --i) {
denom = denom * i % Modulus;
}
res = res * (long long)invMod(denom, Modulus) % Modulus;
return res;
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Only the above is needed for the Codility challenge. Below is to run on the command line.
//
// Compile with: gcc -std=gnu++14 -lc++ -lstdc++ array_recovery.cpp
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#include <string.h>
// Usage: 0 1 2,3, 4 M
// Last arg is M, the max value for an input.
// Remaining args are B (the output of the encoder) separated by commas and/or spaces
// Parentheses and brackets are ignored, so you can use the same input form as Codility's tests: ([1,2,3], M)
int main(int argc, char* argv[]) {
int Max;
std::vector<int> B;
const char* delim = " ,[]()";
if (argc < 2 ) {
printf("Usage: %s M 0 1 2,3, 4... \n", argv[0]);
return 1;
}
for (int i = 1; i < argc; i++) {
char* parse;
parse = strtok(argv[i], delim);
while (parse != NULL)
{
B.push_back(atoi(parse));
parse = strtok (NULL, delim);
}
}
Max = B.back();
B.pop_back();
printf("%d\n", solution(B, Max));
return 0;
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Only the above is needed for the Codility challenge. Below is to run on the command line.
//
// Compile with: gcc -std=gnu++14 -lc++ -lstdc++ array_recovery.cpp
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#include <string.h>
// Usage: M 0 1 2,3, 4
// first arg is M, the max value for an input.
// remaining args are B (the output of the encoder) separated by commas and/or spaces
int main(int argc, char* argv[]) {
int Max;
std::vector<int> B;
const char* delim = " ,";
if (argc < 3 ) {
printf("Usage: %s M 0 1 2,3, 4... \n", argv[0]);
return 1;
}
Max = atoi(argv[1]);
for (int i = 2; i < argc; i++) {
char* parse;
parse = strtok(argv[i], delim);
while (parse != NULL)
{
B.push_back(atoi(parse));
parse = strtok (NULL, delim);
}
}
printf("%d\n", solution(B, Max));
return 0;
}
Let's see an example:
Max = 5
Array is
0 1 3 0 1 1 3
1
1 2..5
1 3 4..5
1 3 4..5 1
1 3 4..5 1 2..5
1 3 4..5 1 2..5 >=..2 (sorry, for a cumbersome way of writing)
1 3 4..5 1 3..5 >=..3 4..5
Now count:
1 1 2 1 3 2 which amounts to 12 total.
Here's an idea. One known method to construct the output is to use a stack. We pop it while the element is greater or equal, then output the smaller element if it exists, then push the greater element onto the stack. Now what if we attempted to do this backwards from the output?
First we'll demonstrate the stack method using the c∅dility example.
[2, 5, 3, 7, 9, 6]
2: output 0, stack [2]
5: output 2, stack [2,5]
3: pop 5, output, 2, stack [2,3]
7: output 3, stack [2,3,7]
... etc.
Final output: [0, 2, 2, 3, 7, 3]
Now let's try reconstruction! We'll use stack both as the imaginary stack and as the reconstituted input:
(Input: [2, 5, 3, 7, 9, 6])
Output: [0, 2, 2, 3, 7, 3]
* Something >3 that reached 3 in the stack
stack = [3, 3 < *]
* Something >7 that reached 7 in the stack
but both of those would've popped before 3
stack = [3, 7, 7 < x, 3 < * <= x]
* Something >3, 7 qualifies
stack = [3, 7, 7 < x, 3 < * <= x]
* Something >2, 3 qualifies
stack = [2, 3, 7, 7 < x, 3 < * <= x]
* Something >2 and >=3 since 3 reached 2
stack = [2, 2 < *, 3, 7, 7 < x, 3 < * <= x]
Let's attempt your examples:
Example 1:
[0, 0, 0, 2, 3, 4]
* Something >4
stack = [4, 4 < *]
* Something >3, 4 qualifies
stack = [3, 4, 4 < *]
* Something >2, 3 qualifies
stack = [2, 3, 4, 4 < *]
* The rest is non-increasing with lowerbound 2
stack = [y >= x, x >= 2, 2, 3, 4, >4]
Example 2:
[0, 0, 0, 4]
* Something >4
stack [4, 4 < *]
* Non-increasing
stack = [z >= y, y >= 4, 4, 4 < *]
Calculating the number of combinations is achieved by multiplying together the possibilities for all the sections. A section is either a bounded single cell; or a bound, non-increasing subarray of one or more cells. To calculate the latter we use the multi-choose binomial, (n + k - 1) choose (k - 1). Consider that we can express the differences between the cells of a bound, non-increasing sequence of 3 cells as:
(ub - cell_3) + (cell_3 - cell_2) + (cell_2 - cell_1) + (cell_1 - lb) = ub - lb
Then the number of ways to distribute ub - lb into (x + 1) cells is
(n + k - 1) choose (k - 1)
or
(ub - lb + x) choose x
For example, the number of non-increasing sequences between
(3,4) in two cells is (4 - 3 + 2) choose 2 = 3: [3,3] [4,3] [4,4]
And the number of non-increasing sequences between
(3,4) in three cells is (4 - 3 + 3) choose 3 = 4: [3,3,3] [4,3,3] [4,4,3] [4,4,4]
(Explanation attributed to Brian M. Scott.)
Rough JavaScript sketch (the code is unreliable; it's only meant to illustrate the encoding. The encoder lists [lower_bound, upper_bound], or a non-increasing sequence as [non_inc, length, lower_bound, upper_bound]):
function f(A, M){
console.log(JSON.stringify(A), M);
let i = A.length - 1;
let last = A[i];
let s = [[last,last]];
if (A[i-1] == last){
let d = 1;
s.splice(1,0,['non_inc',d++,last,M]);
while (i > 0 && A[i-1] == last){
s.splice(1,0,['non_inc',d++,last,M]);
i--
}
} else {
s.push([last+1,M]);
i--;
}
if (i == 0)
s.splice(0,1);
for (; i>0; i--){
let x = A[i];
if (x < s[0][0])
s = [[x,x]].concat(s);
if (x > s[0][0]){
let [l, _l] = s[0];
let [lb, ub] = s[1];
s[0] = [x+1, M];
s[1] = [lb, x];
s = [[l,_l], [x,x]].concat(s);
}
if (x == s[0][0]){
let [l,_l] = s[0];
let [lb, ub] = s[1];
let d = 1;
s.splice(0,1);
while (i > 0 && A[i-1] == x){
s =
[['non_inc', d++, lb, M]].concat(s);
i--;
}
if (i > 0)
s = [[l,_l]].concat(s);
}
}
// dirty fix
if (s[0][0] == 0)
s.splice(0,1);
return s;
}
var a = [2, 5, 3, 7, 9, 6]
var b = [0, 2, 2, 3, 7, 3]
console.log(JSON.stringify(a));
console.log(JSON.stringify(f(b,10)));
b = [0,0,0,4]
console.log(JSON.stringify(f(b,10)));
b = [0,2,0,0,0,4]
console.log(JSON.stringify(f(b,10)));
b = [0,0,0,2,3,4]
console.log(JSON.stringify(f(b,10)));
b = [0,2,2]
console.log(JSON.stringify(f(b,4)));
b = [0,3,5,6]
console.log(JSON.stringify(f(b,10)));
b = [0,0,3,0]
console.log(JSON.stringify(f(b,10)));

Number distribution

Problem: We have x checkboxes and we want to check y of them evenly.
Example 1: select 50 checkboxes of 100 total.
[-]
[x]
[-]
[x]
...
Example 2: select 33 checkboxes of 100 total.
[-]
[-]
[x]
[-]
[-]
[x]
...
Example 3: select 66 checkboxes of 100 total:
[-]
[x]
[x]
[-]
[x]
[x]
...
But we're having trouble to come up with a formula to check them in code, especially once you go 11/111 or something similar. Anyone has an idea?
Let's first assume y is divisible by x. Then we denote p = y/x and the solution is simple. Go through the list, every p elements, mark 1 of them.
Now, let's say r = y%x is non zero. Still p = y/x where / is integer devision. So, you need to:
In the first p-r elements, mark 1 elements
In the last r elements, mark 2 elements
Note: This depends on how you define evenly distributed. You might want to spread the r sections withx+1 elements in between p-r sections with x elements, which indeed is again the same problem and could be solved recursively.
Alright so it wasn't actually correct. I think this would do though:
Regardless of divisibility:
if y > 2*x, then mark 1 element every p = y/x elements, x times.
if y < 2*x, then mark all, and do the previous step unmarking y-x out of y checkboxes (so like in the previous case, but x is replaced by y-x)
Note: This depends on how you define evenly distributed. You might want to change between p and p+1 elements for example to distribute them better.
Here's a straightforward solution using integer arithmetic:
void check(char boxes[], int total_count, int check_count)
{
int i;
for (i = 0; i < total_count; i++)
boxes[i] = '-';
for (i = 0; i < check_count; i++)
boxes[i * total_count / check_count] = 'x';
}
total_count is the total number of boxes, and check_count is the number of boxes to check.
First, it sets every box to unchecked. Then, it checks check_count boxes, scaling the counter to the number of boxes.
Caveat: this is left-biased rather than right-biased like in your examples. That is, it prints x--x-- rather than --x--x. You can turn it around by replacing
boxes[i * total_count / check_count] = 'x';
with:
boxes[total_count - (i * total_count / check_count) - 1] = 'x';
Correctness
Assuming 0 <= check_count <= total_count, and that boxes has space for at least total_count items, we can prove that:
No check marks will overlap. i * total_count / check_count increments by at least one on every iteration, because total_count >= check_count.
This will not overflow the buffer. The subscript i * total_count / check_count
Will be >= 0. i, total_count, and check_count will all be >= 0.
Will be < total_count. When n > 0 and d > 0:
(n * d - 1) / d < n
In other words, if we take n * d / d, and nudge the numerator down, the quotient will go down, too.
Therefore, (check_count - 1) * total_count / check_count will be less than total_count, with the assumptions made above. A division by zero won't happen because if check_count is 0, the loop in question will have zero iterations.
Say number of checkboxes is C and the number of Xes is N.
You example states that having C=111 and N=11 is your most troublesome case.
Try this: divide C/N. Call it D. Have index in the array as double number I. Have another variable as counter, M.
double D = (double)C / (double)N;
double I = 0.0;
int M = N;
while (M > 0) {
if (checkboxes[Round(I)].Checked) { // if we selected it, skip to next
I += 1.0;
continue;
}
checkboxes[Round(I)].Checked = true;
M --;
I += D;
if (Round(I) >= C) { // wrap around the end
I -= C;
}
}
Please note that Round(x) should return nearest integer value for x.
This one could work for you.
I think the key is to keep count of how many boxes you expect to have per check.
Say you want 33 checks in 100 boxes. 100 / 33 = 3.030303..., so you expect to have one check every 3.030303... boxes. That means every 3.030303... boxes, you need to add a check. 66 checks in 100 boxes would mean one check every 1.51515... boxes, 11 checks in 111 boxes would mean one check every 10.090909... boxes, and so on.
double count = 0;
for (int i = 0; i < boxes; i++) {
count += 1;
if (count >= boxes/checks) {
checkboxes[i] = true;
count -= count.truncate(); // so 1.6 becomes 0.6 - resetting the count but keeping the decimal part to keep track of "partial boxes" so far
}
}
You might rather use decimal as opposed to double for count, or there's a slight chance the last box will get skipped due to rounding errors.
Bresenham-like algorithm is suitable to distribute checkboxes evenly. Output of 'x' corresponds to Y-coordinate change. It is possible to choose initial err as random value in range [0..places) to avoid biasing.
def Distribute(places, stars):
err = places // 2
res = ''
for i in range(0, places):
err = err - stars
if err < 0 :
res = res + 'x'
err = err + places
else:
res = res + '-'
print(res)
Distribute(24,17)
Distribute(24,12)
Distribute(24,5)
output:
x-xxx-xx-xx-xxx-xx-xxx-x
-x-x-x-x-x-x-x-x-x-x-x-x
--x----x----x---x----x--
Quick html/javascript solution:
<html>
<body>
<div id='container'></div>
<script>
var cbCount = 111;
var cbCheckCount = 11;
var cbRatio = cbCount / cbCheckCount;
var buildCheckCount = 0;
var c = document.getElementById('container');
for (var i=1; i <= cbCount; i++) {
// make a checkbox
var cb = document.createElement('input');
cb.type = 'checkbox';
test = i / cbRatio - buildCheckCount;
if (test >= 1) {
// check the checkbox we just made
cb.checked = 'checked';
buildCheckCount++;
}
c.appendChild(cb);
c.appendChild(document.createElement('br'));
}
</script>
</body></html>
Adapt code from one question's answer or another answer from earlier this month. Set N = x = number of checkboxes and M = y = number to be checked and apply formula (N*i+N)/M - (N*i)/M for section sizes. (Also see Joey Adams' answer.)
In python, the adapted code is:
N=100; M=33; p=0;
for i in range(M):
k = (N+N*i)/M
for j in range(p,k-1): print "-",
print "x",
p=k
which produces
- - x - - x - - x - - x - - [...] x - - x - - - x where [...] represents 25 --x repetitions.
With M=66 the code gives
x - x x - x x - x x - x x - [...] x x - x x - x - x where [...] represents mostly xx- repetitions, with one x- in the middle.
Note, in C or java: Substitute for (i=0; i<M; ++i) in place of for i in range(M):. Substitute for (j=p; j<k-1; ++j) in place of for j in range(p,k-1):.
Correctness: Note that M = x boxes get checked because print "x", is executed M times.
What about using Fisher–Yates shuffle ?
Make array, shuffle and pick first n elements. You do not need to shuffle all of them, just first n of array. Shuffling can be find in most language libraries.

How do I generate a random string of up to a certain length?

I would like to generate a random string (or a series of random strings, repetitions allowed) of length between 1 and n characters from some (finite) alphabet. Each string should be equally likely (in other words, the strings should be uniformly distributed).
The uniformity requirement means that an algorithm like this doesn't work:
alphabet = "abcdefghijklmnopqrstuvwxyz"
len = rand(1, n)
s = ""
for(i = 0; i < len; ++i)
s = s + alphabet[rand(0, 25)]
(pseudo code, rand(a, b) returns a integer between a and b, inclusively, each integer equally likely)
This algorithm generates strings with uniformly distributed lengths, but the actual distribution should be weighted toward longer strings (there are 26 times as many strings with length 2 as there are with length 1, and so on.) How can I achieve this?
What you need to do is generate your length and then your string as two distinct steps. You will need to first chose the length using a weighted approach. You can calculate the number of strings of a given length l for an alphabet of k symbols as k^l. Sum those up and then you have the total number of strings of any length, your first step is to generate a random number between 1 and that value and then bin it accordingly. Modulo off by one errors you would break at 26, 26^2, 26^3, 26^4 and so on. The logarithm based on the number of symbols would be useful for this task.
Once you have you length then you can generate the string as you have above.
Okay, there are 26 possibilities for a 1-character string, 262 for a 2-character string, and so on up to 2626 possibilities for a 26-character string.
That means there are 26 times as many possibilities for an (N)-character string than there are for an (N-1)-character string. You can use that fact to select your length:
def getlen(maxlen):
sz = maxlen
while sz != 1:
if rnd(27) != 1:
return sz
sz--;
return 1
I use 27 in the above code since the total sample space for selecting strings from "ab" is the 26 1-character possibilities and the 262 2-character possibilities. In other words, the ratio is 1:26 so 1-character has a probability of 1/27 (rather than 1/26 as I first answered).
This solution isn't perfect since you're calling rnd multiple times and it would be better to call it once with an possible range of 26N+26N-1+261 and select the length based on where your returned number falls within there but it may be difficult to find a random number generator that'll work on numbers that large (10 characters gives you a possible range of 2610+...+261 which, unless I've done the math wrong, is 146,813,779,479,510).
If you can limit the maximum size so that your rnd function will work in the range, something like this should be workable:
def getlen(chars,maxlen):
assert maxlen >= 1
range = chars
sampspace = 0
for i in 1 .. maxlen:
sampspace = sampspace + range
range = range * chars
range = range / chars
val = rnd(sampspace)
sz = maxlen
while val < sampspace - range:
sampspace = sampspace - range
range = range / chars
sz = sz - 1
return sz
Once you have the length, I would then use your current algorithm to choose the actual characters to populate the string.
Explaining it further:
Let's say our alphabet only consists of "ab". The possible sets up to length 3 are [ab] (2), [ab][ab] (4) and [ab][ab][ab] (8). So there is a 8/14 chance of getting a length of 3, 4/14 of length 2 and 2/14 of length 1.
The 14 is the magic figure: it's the sum of all 2n for n = 1 to the maximum length. So, testing that pseudo-code above with chars = 2 and maxlen = 3:
assert maxlen >= 1 [okay]
range = chars [2]
sampspace = 0
for i in 1 .. 3:
i = 1:
sampspace = sampspace + range [0 + 2 = 2]
range = range * chars [2 * 2 = 4]
i = 2:
sampspace = sampspace + range [2 + 4 = 6]
range = range * chars [4 * 2 = 8]
i = 3:
sampspace = sampspace + range [6 + 8 = 14]
range = range * chars [8 * 2 = 16]
range = range / chars [16 / 2 = 8]
val = rnd(sampspace) [number from 0 to 13 inclusive]
sz = maxlen [3]
while val < sampspace - range: [see below]
sampspace = sampspace - range
range = range / chars
sz = sz - 1
return sz
So, from that code, the first iteration of the final loop will exit with sz = 3 if val is greater than or equal to sampspace - range [14 - 8 = 6]. In other words, for the values 6 through 13 inclusive, 8 of the 14 possibilities.
Otherwise, sampspace becomes sampspace - range [14 - 8 = 6] and range becomes range / chars [8 / 2 = 4].
Then the second iteration of the final loop will exit with sz = 2 if val is greater than or equal to sampspace - range [6 - 4 = 2]. In other words, for the values 2 through 5 inclusive, 4 of the 14 possibilities.
Otherwise, sampspace becomes sampspace - range [6 - 4 = 2] and range becomes range / chars [4 / 2 = 2].
Then the third iteration of the final loop will exit with sz = 1 if val is greater than or equal to sampspace - range [2 - 2 = 0]. In other words, for the values 0 through 1 inclusive, 2 of the 14 possibilities (this iteration will always exit since the value must be greater than or equal to zero.
In retrospect, that second solution is a bit of a nightmare. In my personal opinion, I'd go for the first solution for its simplicity and to avoid the possibility of rather large numbers.
Building on my comment posted as a reply to the OP:
I'd consider it an exercise in base
conversion. You're simply generating a
"random number" in "base 26", where
a=0 and z=25. For a random string of
length n, generate a number between 1
and 26^n. Convert from base 10 to base
26, using symbols from your chosen
alphabet.
Here's a PHP implementation. I won't guaranty that there isn't an off-by-one error or two in here, but any such error should be minor:
<?php
$n = 5;
var_dump(randstr($n));
function randstr($maxlen) {
$dict = 'abcdefghijklmnopqrstuvwxyz';
$rand = rand(0, pow(strlen($dict), $maxlen));
$str = base_convert($rand, 10, 26);
//base convert returns base 26 using 0-9 and 15 letters a-p(?)
//we must convert those to our own set of symbols
return strtr($str, '1234567890abcdefghijklmnopqrstuvwxyz', $dict);
}
Instead of picking a length with uniform distribution, weight it according to how many strings are a given length. If your alphabet is size m, there are mx strings of size x, and (1-mn+1)/(1-m) strings of length n or less. The probability of choosing a string of length x should be mx*(1-m)/(1-mn+1).
Edit:
Regarding overflow - using floating point instead of integers will expand the range, so for a 26-character alphabet and single-precision floats, direct weight calculation shouldn't overflow for n<26.
A more robust approach is to deal with it iteratively. This should also minimize the effects of underflow:
int randomLength() {
for(int i = n; i > 0; i--) {
double d = Math.random();
if(d > (m - 1) / (m - Math.pow(m, -i))) {
return i;
}
}
return 0;
}
To make this more efficient by calculating fewer random numbers, we can reuse them by splitting intervals in more than one place:
int randomLength() {
for(int i = n; i > 0; i -= 5) {
double d = Math.random();
double c = (m - 1) / (m - Math.pow(m, -i))
for(int j = 0; j < 5; j++) {
if(d > c) {
return i - j;
}
c /= m;
}
}
for(int i = n % 0; i > 0; i--) {
double d = Math.random();
if(d > (m - 1) / (m - Math.pow(m, -i))) {
return i;
}
}
return 0;
}
Edit: This answer isn't quite right. See the bottom for a disproof. I'll leave it up for now in the hope someone can come up with a variant that fixes it.
It's possible to do this without calculating the length separately - which, as others have pointed out, requires raising a number to a large power, and generally seems like a messy solution to me.
Proving that this is correct is a little tough, and I'm not sure I trust my expository powers to make it clear, but bear with me. For the purposes of the explanation, we're generating strings of length at most n from an alphabet a of |a| characters.
First, imagine you have a maximum length of n, and you've already decided you're generating a string of at least length n-1. It should be obvious that there are |a|+1 equally likely possibilities: we can generate any of the |a| characters from the alphabet, or we can choose to terminate with n-1 characters. To decide, we simply pick a random number x between 0 and |a| (inclusive); if x is |a|, we terminate at n-1 characters; otherwise, we append the xth character of a to the string. Here's a simple implementation of this procedure in Python:
def pick_character(alphabet):
x = random.randrange(len(alphabet) + 1)
if x == len(alphabet):
return ''
else:
return alphabet[x]
Now, we can apply this recursively. To generate the kth character of the string, we first attempt to generate the characters after k. If our recursive invocation returns anything, then we know the string should be at least length k, and we generate a character of our own from the alphabet and return it. If, however, the recursive invocation returns nothing, we know the string is no longer than k, and we use the above routine to select either the final character or no character. Here's an implementation of this in Python:
def uniform_random_string(alphabet, max_len):
if max_len == 1:
return pick_character(alphabet)
suffix = uniform_random_string(alphabet, max_len - 1)
if suffix:
# String contains characters after ours
return random.choice(alphabet) + suffix
else:
# String contains no characters after our own
return pick_character(alphabet)
If you doubt the uniformity of this function, you can attempt to disprove it: suggest a string for which there are two distinct ways to generate it, or none. If there are no such strings - and alas, I do not have a robust proof of this fact, though I'm fairly certain it's true - and given that the individual selections are uniform, then the result must also select any string with uniform probability.
As promised, and unlike every other solution posted thus far, no raising of numbers to large powers is required; no arbitrary length integers or floating point numbers are needed to store the result, and the validity, at least to my eyes, is fairly easy to demonstrate. It's also shorter than any fully-specified solution thus far. ;)
If anyone wants to chip in with a robust proof of the function's uniformity, I'd be extremely grateful.
Edit: Disproof, provided by a friend:
dato: so imagine alphabet = 'abc' and n = 2
dato: you have 9 strings of length 2, 3 of length 1, 1 of length 0
dato: that's 13 in total
dato: so probability of getting a length 2 string should be 9/13
dato: and probability of getting a length 1 or a length 0 should be 4/13
dato: now if you call uniform_random_string('abc', 2)
dato: that transforms itself into a call to uniform_random_string('abc', 1)
dato: which is an uniform distribution over ['a', 'b', 'c', '']
dato: the first three of those yield all the 2 length strings
dato: and the latter produce all the 1 length strings and the empty strings
dato: but 0.75 > 9/13
dato: and 0.25 < 4/13
// Note space as an available char
alphabet = "abcdefghijklmnopqrstuvwxyz "
result_string = ""
for( ;; )
{
s = ""
for( i = 0; i < n; i++ )
s += alphabet[rand(0, 26)]
first_space = n;
for( i = 0; i < n; i++ )
if( s[ i ] == ' ' )
{
first_space = i;
break;
}
ok = true;
// Reject "duplicate" shorter strings
for( i = first_space + 1; i < n; i++ )
if( s[ i ] != ' ' )
{
ok = false;
break;
}
if( !ok )
continue;
// Extract the short version of the string
for( i = 0; i < first_space; i++ )
result_string += s[ i ];
break;
}
Edit: I forgot to disallow 0-length strings, that will take a bit more code which I don't have time to add now.
Edit: After considering how my answer doesn't scale to large n (takes too long to get lucky and find an accepted string), I like paxdiablo's answer much better. Less code too.
Personally I'd do it like this:
Let's say your alphabet has Z characters. Then the number of possible strings for each length L is:
L | Z
--------------------------
1 | 26
2 | 676 (= 26 * 26)
3 | 17576 (= 26 * 26 * 26)
...and so on.
Now let's say your maximum desired length is N. Then the total number of possible strings from length 1 to N that your function could generate would be the sum of a geometric sequence:
(1 - (Z ^ (N + 1))) / (1 - Z)
Let's call this value S. Then the probability of generating a string of any length L should be:
(Z ^ L) / S
OK, fine. This is all well and good; but how do we generate a random number given a non-uniform probability distribution?
The short answer is: you don't. Get a library to do that for you. I develop mainly in .NET, so one I might turn to would be Math.NET.
That said, it's really not so hard to come up with a rudimentary approach to doing this on your own.
Here's one way: take a generator that gives you a random value within a known uniform distribution, and assign ranges within that distribution of sizes dependent on your desired distribution. Then interpret the random value provided by the generator by determining which range it falls into.
Here's an example in C# of one way you could implement this idea (scroll to the bottom for example output):
RandomStringGenerator class
public class RandomStringGenerator
{
private readonly Random _random;
private readonly char[] _alphabet;
public RandomStringGenerator(string alphabet)
{
if (string.IsNullOrEmpty(alphabet))
throw new ArgumentException("alphabet");
_random = new Random();
_alphabet = alphabet.Distinct().ToArray();
}
public string NextString(int maxLength)
{
// Get a value randomly distributed between 0.0 and 1.0 --
// this is approximately what the System.Random class provides.
double value = _random.NextDouble();
// This is where the magic happens: we "translate" the above number
// to a length based on our computed probability distribution for the given
// alphabet and the desired maximum string length.
int length = GetLengthFromRandomValue(value, _alphabet.Length, maxLength);
// The rest is easy: allocate a char array of the length determined above...
char[] chars = new char[length];
// ...populate it with a bunch of random values from the alphabet...
for (int i = 0; i < length; ++i)
{
chars[i] = _alphabet[_random.Next(0, _alphabet.Length)];
}
// ...and return a newly constructed string.
return new string(chars);
}
static int GetLengthFromRandomValue(double value, int alphabetSize, int maxLength)
{
// Looping really might not be the smartest way to do this,
// but it's the most obvious way that immediately springs to my mind.
for (int length = 1; length <= maxLength; ++length)
{
Range r = GetRangeForLength(length, alphabetSize, maxLength);
if (r.Contains(value))
return length;
}
return maxLength;
}
static Range GetRangeForLength(int length, int alphabetSize, int maxLength)
{
int L = length;
int Z = alphabetSize;
int N = maxLength;
double possibleStrings = (1 - (Math.Pow(Z, N + 1)) / (1 - Z));
double stringsOfGivenLength = Math.Pow(Z, L);
double possibleSmallerStrings = (1 - Math.Pow(Z, L)) / (1 - Z);
double probabilityOfGivenLength = ((double)stringsOfGivenLength / possibleStrings);
double probabilityOfShorterLength = ((double)possibleSmallerStrings / possibleStrings);
double startPoint = probabilityOfShorterLength;
double endPoint = probabilityOfShorterLength + probabilityOfGivenLength;
return new Range(startPoint, endPoint);
}
}
Range struct
public struct Range
{
public readonly double StartPoint;
public readonly double EndPoint;
public Range(double startPoint, double endPoint)
: this()
{
this.StartPoint = startPoint;
this.EndPoint = endPoint;
}
public bool Contains(double value)
{
return this.StartPoint <= value && value <= this.EndPoint;
}
}
Test
static void Main(string[] args)
{
const int N = 5;
const string alphabet = "acegikmoqstvwy";
int Z = alphabet.Length;
var rand = new RandomStringGenerator(alphabet);
var strings = new List<string>();
for (int i = 0; i < 100000; ++i)
{
strings.Add(rand.NextString(N));
}
Console.WriteLine("First 10 results:");
for (int i = 0; i < 10; ++i)
{
Console.WriteLine(strings[i]);
}
// sanity check
double sumOfProbabilities = 0.0;
for (int i = 1; i <= N; ++i)
{
double probability = Math.Pow(Z, i) / ((1 - (Math.Pow(Z, N + 1))) / (1 - Z));
int numStrings = strings.Count(str => str.Length == i);
Console.WriteLine("# strings of length {0}: {1} (probability = {2:0.00%})", i, numStrings, probability);
sumOfProbabilities += probability;
}
Console.WriteLine("Probabilities sum to {0:0.00%}.", sumOfProbabilities);
Console.ReadLine();
}
Output:
First 10 results:
wmkyw
qqowc
ackai
tokmo
eeiyw
cakgg
vceec
qwqyq
aiomt
qkyav
# strings of length 1: 1 (probability = 0.00%)
# strings of length 2: 38 (probability = 0.03%)
# strings of length 3: 475 (probability = 0.47%)
# strings of length 4: 6633 (probability = 6.63%)
# strings of length 5: 92853 (probability = 92.86%)
Probabilities sum to 100.00%.
My idea regarding this is like:
you have 1-n length string.there 26 possible 1 length string,26*26 2 length string and so on.
you can find out the percentage of each length string of the total possible strings.for example percentage of single length string is like
((26/(TOTAL_POSSIBLE_STRINGS_OF_ALL_LENGTH))*100).
similarly you can find out the percentage of other length strings.
Mark them on a number line between 1 to 100.ie suppose percentage of single length string is 3 and double length string is 6 then number line single length string lies between 0-3 while double length string lies between 3-9 and so on.
Now take a random number between 1 to 100.find out the range in which this number lies.I mean suppose for examplethe number you have randomly chosen is 2.Now this number lies between 0-3 so go 1 length string or if the random number chosen is 7 then go for double length string.
In this fashion you can see that length of each string choosen will be proportional to the percentage of the total number of that length string contribute to the all possible strings.
Hope I am clear.
Disclaimer: I have not gone through above solution except one or two.So if it matches with some one solution it will be purely a chance.
Also,I will welcome all the advice and positive criticism and correct me if I am wrong.
Thanks and regard
Mawia
Matthieu: Your idea doesn't work because strings with blanks are still more likely to be generated. In your case, with n=4, you could have the string 'ab' generated as 'a' + 'b' + '' + '' or '' + 'a' + 'b' + '', or other combinations. Thus not all the strings have the same chance of appearing.

What is the fastest possible way to sort an array of 7 integers?

This is a part of a program that analyzes the odds of poker, specifically Texas Hold'em. I have a program I'm happy with, but it needs some small optimizations to be perfect.
I use this type (among others, of course):
type
T7Cards = array[0..6] of integer;
There are two things about this array that may be important when deciding how to sort it:
Every item is a value from 0 to 51. No other values are possible.
There are no duplicates. Never.
With this information, what is the absolutely fastest way to sort this array? I use Delphi, so pascal code would be the best, but I can read C and pseudo, albeit a bit more slowly :-)
At the moment I use quicksort, but the funny thing is that this is almost no faster than bubblesort! Possible because of the small number of items. The sorting counts for almost 50% of the total running time of the method.
EDIT:
Mason Wheeler asked why it's necessary to optimize. One reason is that the method will be called 2118760 times.
Basic poker information: All players are dealt two cards (the pocket) and then five cards are dealt to the table (the 3 first are called the flop, the next is the turn and the last is the river. Each player picks the five best cards to make up their hand)
If I have two cards in the pocket, P1 and P2, I will use the following loops to generate all possible combinations:
for C1 := 0 to 51-4 do
if (C1<>P1) and (C1<>P2) then
for C2 := C1+1 to 51-3 do
if (C2<>P1) and (C2<>P2) then
for C3 := C2+1 to 51-2 do
if (C3<>P1) and (C3<>P2) then
for C4 := C3+1 to 51-1 do
if (C4<>P1) and (C4<>P2) then
for C5 := C4+1 to 51 do
if (C5<>P1) and (C5<>P2) then
begin
//This code will be executed 2 118 760 times
inc(ComboCounter[GetComboFromCards([P1,P2,C1,C2,C3,C4,C5])]);
end;
As I write this I notice one thing more: The last five elements of the array will always be sorted, so it's just a question of putting the first two elements in the right position in the array. That should simplify matters a bit.
So, the new question is: What is the fastest possible way to sort an array of 7 integers when the last 5 elements are already sorted. I believe this could be solved with a couple (?) of if's and swaps :-)
For a very small set, insertion sort can usually beat quicksort because it has very low overhead.
WRT your edit, if you're already mostly in sort order (last 5 elements are already sorted), insertion sort is definitely the way to go. In an almost-sorted set of data, it'll beat quicksort every time, even for large sets. (Especially for large sets! This is insertion sort's best-case scenario and quicksort's worst case.)
Don't know how you are implementing this, but what you could do is have an array of 52 instead of 7, and just insert the card in its slot directly when you get it since there can never be duplicates, that way you never have to sort the array. This might be faster depending on how its used.
I don't know that much about Texas Hold'em: Does it matter what suit P1 and P2 are, or does it only matter if they are of the same suit or not? If only suit(P1)==suit(P2) matters, then you could separate the two cases, you have only 13x12/2 different possibilities for P1/P2, and you can easily precalculate a table for the two cases.
Otherwise, I would suggest something like this:
(* C1 < C2 < P1 *)
for C1:=0 to P1-2 do
for C2:=C1+1 to P1-1 do
Cards[0] = C1;
Cards[1] = C2;
Cards[2] = P1;
(* generate C3...C7 *)
(* C1 < P1 < C2 *)
for C1:=0 to P1-1 do
for C2:=P1+1 to 51 do
Cards[0] = C1;
Cards[1] = P1;
Cards[2] = C2;
(* generate C3...C7 *)
(* P1 < C1 < C2 *)
for C1:=P1+1 to 51 do
for C2:=C1+1 to 51 do
Cards[0] = P1;
Cards[1] = C1;
Cards[2] = C2;
(* generate C3...C7 *)
(this is just a demonstration for one card P1, you would have to expand that for P2, but I think that's straightforward. Although it'll be a lot of typing...)
That way, the sorting doesn't take any time at all. The generated permutations are already ordered.
There are only 5040 permutations of 7 elements. You can programmaticaly generate a program that finds the one represented by your input in a minimal number of comparisons. It will be a big tree of if-then-else instructions, each comparing a fixed pair of nodes, for example if (a[3]<=a[6]).
The tricky part is deciding which 2 elements to compare in a particular internal node. For this, you have to take into account the results of comparisons in the ancestor nodes from root to the particular node (for example a[0]<=a[1], not a[2]<=a[7], a[2]<=a[5]) and the set of possible permutations that satisfy the comparisons. Compare the pair of elements that splits the set into as equal parts as possible (minimize the size of the larger part).
Once you have the permutation, it is trivial to sort it in a minimal set of swaps.
Since the last 5 items are already sorted, the code can be written just to reposition the first 2 items. Since you're using Pascal, I've written and tested a sorting algorithm that can execute 2,118,760 times in about 62 milliseconds.
procedure SortT7Cards(var Cards: T7Cards);
const
CardsLength = Length(Cards);
var
I, J, V: Integer;
V1, V2: Integer;
begin
// Last 5 items will always be sorted, so we want to place the first two into
// the right location.
V1 := Cards[0];
V2 := Cards[1];
if V2 < V1 then
begin
I := V1;
V1 := V2;
V2 := I;
end;
J := 0;
I := 2;
while I < CardsLength do
begin
V := Cards[I];
if V1 < V then
begin
Cards[J] := V1;
Inc(J);
Break;
end;
Cards[J] := V;
Inc(J);
Inc(I);
end;
while I < CardsLength do
begin
V := Cards[I];
if V2 < V then
begin
Cards[J] := V2;
Break;
end;
Cards[J] := V;
Inc(J);
Inc(I);
end;
if J = (CardsLength - 2) then
begin
Cards[J] := V1;
Cards[J + 1] := V2;
end
else if J = (CardsLength - 1) then
begin
Cards[J] := V2;
end;
end;
Use min-sort. Search for minimal and maximal element at once and place them into resultant array. Repeat three times. (EDIT: No, I won't try to measure the speed theoretically :_))
var
cards,result: array[0..6] of integer;
i,min,max: integer;
begin
n=0;
while (n<3) do begin
min:=-1;
max:=52;
for i from 0 to 6 do begin
if cards[i]<min then min:=cards[i]
else if cards[i]>max then max:=cards[i]
end
result[n]:=min;
result[6-n]:=max;
inc(n);
end
for i from 0 to 6 do
if (cards[i]<52) and (cards[i]>=0) then begin
result[3] := cards[i];
break;
end
{ Result is sorted here! }
end
This is the fastest method: since the 5-card list is already sorted, sort the two-card list (a compare & swap), and then merge the two lists, which is O(k * (5+2). In this case (k) will normally be 5: the loop test(1), the compare(2), the copy(3), the input-list increment(4) and the output list increment(5). That's 35 + 2.5. Throw in loop initialization and you get 41.5 statements, total.
You could also unroll the loops which would save you maybe 8 statements or execution, but make the whole routine about 4-5 times longer which may mess with your instruction cache hit ratio.
Given P(0 to 2), C(0 to 5) and copying to H(0 to 6)
with C() already sorted (ascending):
If P(0) > P(1) Then
// Swap:
T = P(0)
P(0) = P(1)
P(1) = T
// 1stmt + (3stmt * 50%) = 2.5stmt
End
P(2), C(5) = 53 \\ Note these are end-of-list flags
k = 0 \\ P() index
J = 0 \\ H() index
i = 0 \\ C() index
// 4 stmt
Do While (j) < 7
If P(k) < C(I) then
H(j) = P(k)
k = k+1
Else
H(j) = C(i)
j = j+1
End if
j = j+1
// 5stmt * 7loops = 35stmt
Loop
And note that this is faster than the other algorithm that would be "fastest" if you had to truly sort all 7 cards: use a bit-mask(52) to map & bit-set all 7 cards into that range of all possible 52 cards (the bit-mask), and then scan the bit-mask in order looking for the 7 bits that are set. That takes 60-120 statements at best (but is still faster than any other sorting approach).
For seven numbers, the most efficient algorithm that exists with regards to the number of comparisons is Ford-Johnson's. In fact, wikipedia references a paper, easily found on google, that claims Ford-Johnson's the best for up to 47 numbers. Unfortunately, references to Ford-Johnson's aren't all that easy to found, and the algorithm uses some complex data structures.
It appears on The Art Of Computer Programming, Volume 3, by Donald Knuth, if you have access to that book.
There's a paper which describes FJ and a more memory efficient version here.
At any rate, because of the memory overhead of that algorithm, I doubt it would be worth your while for integers, as the cost of comparing two integers is rather cheap compared to the cost of allocating memory and manipulating pointers.
Now, you mentioned that 5 cards are already sorted, and you just need to insert two. You can do this with insertion sort most efficiently like this:
Order the two cards so that P1 > P2
Insert P1 going from the high end to the low end
(list) Insert P2 going from after P1 to the low end
(array) Insert P2 going from the low end to the high end
How you do that will depend on the data structure. With an array you'll be swapping each element, so place P1 at 1st, P2 and 7th (ordered high to low), and then swap P1 up, and then P2 down. With a list, you just need to fix the pointers as appropriate.
However once more, because of the particularity of your code, it really is best if you follow nikie suggestion and just generate the for loops apropriately for every variation in which P1 and P2 can appear in the list.
For example, sort P1 and P2 so that P1 < P2. Let's make Po1 and Po2 the position from 0 to 6, of P1 and P2 on the list. Then do this:
Loop Po1 from 0 to 5
Loop Po2 from Po1 + 1 to 6
If (Po2 == 1) C1start := P2 + 1; C1end := 51 - 4
If (Po1 == 0 && Po2 == 2) C1start := P1+1; C1end := P2 - 1
If (Po1 == 0 && Po2 > 2) C1start := P1+1; C1end := 51 - 5
If (Po1 > 0) C1start := 0; C1end := 51 - 6
for C1 := C1start to C1end
// Repeat logic to compute C2start and C2end
// C2 can begin at C1+1, P1+1 or P2+1
// C2 can finish at P1-1, P2-1, 51 - 3, 51 - 4 or 51 -5
etc
You then call a function passing Po1, Po2, P1, P2, C1, C2, C3, C4, C5, and have this function return all possible permutations based on Po1 and Po2 (that's 36 combinations).
Personally, I think that's the fastest you can get. You completely avoid having to order anything, because the data will be pre-ordered. You incur in some comparisons anyway to compute the starts and ends, but their cost is minimized as most of them will be on the outermost loops, so they won't be repeated much. And they can even be more optimized at the cost of more code duplication.
For 7 elements, there are only few options. You can easily write a generator that produces method to sort all possible combinations of 7 elements. Something like this method for 3 elements:
if a[1] < a[2] {
if a[2] < a[3] {
// nothing to do, a[1] < a[2] < a[3]
} else {
if a[1] < a[3] {
// correct order should be a[1], a[3], a[2]
swap a[2], a[3]
} else {
// correct order should be a[3], a[1], a[2]
swap a[2], a[3]
swap a[1], a[3]
}
}
} else {
// here we know that a[1] >= a[2]
...
}
Of course method for 7 elements will be bigger, but it's not that hard to generate.
The code below is close to optimal. It could be made better by composing a list to be traversed while making the tree, but I'm out of time right now. Cheers!
object Sort7 {
def left(i: Int) = i * 4
def right(i: Int) = i * 4 + 1
def up(i: Int) = i * 4 + 2
def value(i: Int) = i * 4 + 3
val a = new Array[Int](7 * 4)
def reset = {
0 until 7 foreach {
i => {
a(left(i)) = -1
a(right(i)) = -1
a(up(i)) = -1
a(value(i)) = scala.util.Random.nextInt(52)
}
}
}
def sortN(i : Int) {
var index = 0
def getNext = if (a(value(i)) < a(value(index))) left(index) else right(index)
var next = getNext
while(a(next) != -1) {
index = a(next)
next = getNext
}
a(next) = i
a(up(i)) = index
}
def sort = 1 until 7 foreach (sortN(_))
def print {
traverse(0)
def traverse(i: Int): Unit = {
if (i != -1) {
traverse(a(left(i)))
println(a(value(i)))
traverse(a(right(i)))
}
}
}
}
In pseudo code:
int64 temp = 0;
int index, bit_position;
for index := 0 to 6 do
temp |= 1 << cards[index];
for index := 0 to 6 do
begin
bit_position = find_first_set(temp);
temp &= ~(1 << bit_position);
cards[index] = bit_position;
end;
It's an application of bucket sort, which should generally be faster than any of the comparison sorts that were suggested.
Note: The second part could also be implemented by iterating over bits in linear time, but in practice it may not be faster:
index = 0;
for bit_position := 0 to 51 do
begin
if (temp & (1 << bit_position)) > 0 then
begin
cards[index] = bit_position;
index++;
end;
end;
Assuming that you need an array of cards at the end of it.
Map the original cards to bits in a 64 bit integer ( or any integer with >= 52 bits ).
If during the initial mapping the array is sorted, don't change it.
Partition the integer into nibbles - each will correspond to values 0x0 to 0xf.
Use the nibbles as indices to corresponding sorted sub-arrays. You'll need 13 sets of 16 sub-arrays ( or just 16 sub-arrays and use a second indirection, or do the bit ops rather than looking the answer up; what is faster will vary by platform ).
Concatenate the non-empty sub-arrays into the final array.
You could use larger than nibbles if you want; bytes would give 7 sets of 256 arrays and make it more likely that the non-empty arrays require concatenating.
This assumes that branches are expensive and cached array accesses cheap.
#include <stdio.h>
#include <stdbool.h>
#include <stdint.h>
// for general case of 7 from 52, rather than assuming last 5 sorted
uint32_t card_masks[16][5] = {
{ 0, 0, 0, 0, 0 },
{ 1, 0, 0, 0, 0 },
{ 2, 0, 0, 0, 0 },
{ 1, 2, 0, 0, 0 },
{ 3, 0, 0, 0, 0 },
{ 1, 3, 0, 0, 0 },
{ 2, 3, 0, 0, 0 },
{ 1, 2, 3, 0, 0 },
{ 4, 0, 0, 0, 0 },
{ 1, 4, 0, 0, 0 },
{ 2, 4, 0, 0, 0 },
{ 1, 2, 4, 0, 0 },
{ 3, 4, 0, 0, 0 },
{ 1, 3, 4, 0, 0 },
{ 2, 3, 4, 0, 0 },
{ 1, 2, 3, 4, 0 },
};
void sort7 ( uint32_t* cards) {
uint64_t bitset = ( ( 1LL << cards[ 0 ] ) | ( 1LL << cards[ 1LL ] ) | ( 1LL << cards[ 2 ] ) | ( 1LL << cards[ 3 ] ) | ( 1LL << cards[ 4 ] ) | ( 1LL << cards[ 5 ] ) | ( 1LL << cards[ 6 ] ) ) >> 1;
uint32_t* p = cards;
uint32_t base = 0;
do {
uint32_t* card_mask = card_masks[ bitset & 0xf ];
// you might remove this test somehow, as well as unrolling the outer loop
// having separate arrays for each nibble would save 7 additions and the increment of base
while ( *card_mask )
*(p++) = base + *(card_mask++);
bitset >>= 4;
base += 4;
} while ( bitset );
}
void print_cards ( uint32_t* cards ) {
printf ( "[ %d %d %d %d %d %d %d ]\n", cards[0], cards[1], cards[2], cards[3], cards[4], cards[5], cards[6] );
}
int main ( void ) {
uint32_t cards[7] = { 3, 9, 23, 17, 2, 42, 52 };
print_cards ( cards );
sort7 ( cards );
print_cards ( cards );
return 0;
}
Use a sorting network, like in this C++ code:
template<class T>
inline void sort7(T data) {
#define SORT2(x,y) {if(data##x>data##y)std::swap(data##x,data##y);}
//DD = Define Data, create a local copy of the data to aid the optimizer.
#define DD1(a) register auto data##a=*(data+a);
#define DD2(a,b) register auto data##a=*(data+a);register auto data##b=*(data+b);
//CB = Copy Back
#define CB1(a) *(data+a)=data##a;
#define CB2(a,b) *(data+a)=data##a;*(data+b)=data##b;
DD2(1,2) SORT2(1,2)
DD2(3,4) SORT2(3,4)
DD2(5,6) SORT2(5,6)
DD1(0) SORT2(0,2)
SORT2(3,5)
SORT2(4,6)
SORT2(0,1)
SORT2(4,5)
SORT2(2,6) CB1(6)
SORT2(0,4)
SORT2(1,5)
SORT2(0,3) CB1(0)
SORT2(2,5) CB1(5)
SORT2(1,3) CB1(1)
SORT2(2,4) CB1(4)
SORT2(2,3) CB2(2,3)
#undef CB1
#undef CB2
#undef DD1
#undef DD2
#undef SORT2
}
Use the function above if you want to pass it an iterator or a pointer and use the function below if you want to pass it the seven arguments one by one. BTW, using templates allows compilers to generate really optimized code so don't get ride of the template<> unless you want C code (or some other language's code).
template<class T>
inline void sort7(T& e0, T& e1, T& e2, T& e3, T& e4, T& e5, T& e6) {
#define SORT2(x,y) {if(data##x>data##y)std::swap(data##x,data##y);}
#define DD1(a) register auto data##a=e##a;
#define DD2(a,b) register auto data##a=e##a;register auto data##b=e##b;
#define CB1(a) e##a=data##a;
#define CB2(a,b) e##a=data##a;e##b=data##b;
DD2(1,2) SORT2(1,2)
DD2(3,4) SORT2(3,4)
DD2(5,6) SORT2(5,6)
DD1(0) SORT2(0,2)
SORT2(3,5)
SORT2(4,6)
SORT2(0,1)
SORT2(4,5)
SORT2(2,6) CB1(6)
SORT2(0,4)
SORT2(1,5)
SORT2(0,3) CB1(0)
SORT2(2,5) CB1(5)
SORT2(1,3) CB1(1)
SORT2(2,4) CB1(4)
SORT2(2,3) CB2(2,3)
#undef CB1
#undef CB2
#undef DD1
#undef DD2
#undef SORT2
}
Take a look at this:
http://en.wikipedia.org/wiki/Sorting_algorithm
You would need to pick one that will have a stable worst case cost...
Another option could be to keep the array sorted the whole time, so an addition of a card would keep the array sorted automatically, that way you could skip to sorting...
What JRL is referring to is a bucket sort. Since you have a finite discrete set of possible values, you can declare 52 buckets and just drop each element in a bucket in O(1) time. Hence bucket sort is O(n). Without the guarantee of a finite number of different elements, the fastest theoretical sort is O(n log n) which things like merge sort an quick sort are. It's just a balance of best and worst case scenarios then.
But long answer short, use bucket sort.
If you like the above mentioned suggestion to keep a 52 element array which always keeps your array sorted, then may be you could keep another list of 7 elements which would reference the 7 valid elements in the 52 element array. This way we can even avoid parsing the 52 element array.
I guess for this to be really efficient, we would need to have a linked list type of structure which be supports operations: InsertAtPosition() and DeleteAtPosition() and be efficient at that.
There are a lot of loops in the answers. Given his speed requirement and the tiny size of the data set I would not do ANY loops.
I have not tried it but I suspect the best answer is a fully unrolled bubble sort. It would also probably gain a fair amount of advantage from being done in assembly.
I wonder if this is the right approach, though. How are you going to analyze a 7 card hand?? I think you're going to end up converting it to some other representation for analysis anyway. Would not a 4x13 array be a more useful representation? (And it would render the sorting issue moot, anyway.)
Considering that last 5 elements are always sorted:
for i := 0 to 1 do begin
j := i;
x := array[j];
while (j+1 <= 6) and (array[j+1] < x) do begin
array[j] := array[j+1];
inc(j);
end;
array[j] := X;
end;
bubble sort is your friend. Other sorts have too many overhead codes and not suitable for small number of elements
Cheers
Here is your basic O(n) sort. I'm not sure how it compares to the others. It uses unrolled loops.
char card[7]; // the original table of 7 numbers in range 0..51
char table[52]; // workspace
// clear the workspace
memset(table, 0, sizeof(table));
// set the 7 bits corresponding to the 7 cards
table[card[0]] = 1;
table[card[1]] = 1;
...
table[card[6]] = 1;
// read the cards back out
int j = 0;
if (table[0]) card[j++] = 0;
if (table[1]) card[j++] = 1;
...
if (table[51]) card[j++] = 51;
If you are looking for a very low overhead, optimal sort, you should create a sorting network. You can generate the code for a 7 integer network using the Bose-Nelson algorithm.
This would guarentee a fixed number of compares and an equal number of swaps in the worst case.
The generated code is ugly, but it is optimal.
Your data is in a sorted array and I'll assume you swap the new two if needed so also sorted, so
a. if you want to keep it in place then use a form of insertion sort;
b. if you want to have it the result in another array do a merging by copying.
With the small numbers, binary chop is overkill, and ternary chop is appropriate anyway:
One new card will mostly like split into two and three, viz. 2+3 or 3+2,
two cards into singles and pairs, e.g. 2+1+2.
So the most time-space efficient approach to placing the smaller new card is to compare with a[1] (viz. skip a[0]) and then search left or right to find the card it should displace, then swap and move right (shifting rather than bubbling), comparing with the larger new card till you find where it goes. After this you'll be shifting forward by twos (two cards have been inserted).
The variables holding the new cards (and swaps) should be registers.
The look up approach would be faster but use more memory.

Generating permutations lazily

I'm looking for an algorithm to generate permutations of a set in such a way that I could make a lazy list of them in Clojure. i.e. I'd like to iterate over a list of permutations where each permutation is not calculated until I request it, and all of the permutations don't have to be stored in memory at once.
Alternatively I'm looking for an algorithm where given a certain set, it will return the "next" permutation of that set, in such a way that repeatedly calling the function on its own output will cycle through all permutations of the original set, in some order (what the order is doesn't matter).
Is there such an algorithm? Most of the permutation-generating algorithms I've seen tend to generate them all at once (usually recursively), which doesn't scale to very large sets. An implementation in Clojure (or another functional language) would be helpful but I can figure it out from pseudocode.
Yes, there is a "next permutation" algorithm, and it's quite simple too. The C++ standard template library (STL) even has a function called next_permutation.
The algorithm actually finds the next permutation -- the lexicographically next one. The idea is this: suppose you are given a sequence, say "32541". What is the next permutation?
If you think about it, you'll see that it is "34125". And your thoughts were probably something this: In "32541",
there is no way to keep the "32" fixed and find a later permutation in the "541" part, because that permutation is already the last one for 5,4, and 1 -- it is sorted in decreasing order.
So you'll have to change the "2" to something bigger -- in fact, to the smallest number bigger than it in the "541" part, namely 4.
Now, once you've decided that the permutation will start as "34", the rest of the numbers should be in increasing order, so the answer is "34125".
The algorithm is to implement precisely that line of reasoning:
Find the longest "tail" that is ordered in decreasing order. (The "541" part.)
Change the number just before the tail (the "2") to the smallest number bigger than it in the tail (the 4).
Sort the tail in increasing order.
You can do (1.) efficiently by starting at the end and going backwards as long as the previous element is not smaller than the current element. You can do (2.) by just swapping the "4" with the '2", so you'll have "34521". Once you do this, you can avoid using a sorting algorithm for (3.), because the tail was, and is still (think about this), sorted in decreasing order, so it only needs to be reversed.
The C++ code does precisely this (look at the source in /usr/include/c++/4.0.0/bits/stl_algo.h on your system, or see this article); it should be simple to translate it to your language: [Read "BidirectionalIterator" as "pointer", if you're unfamiliar with C++ iterators. The code returns false if there is no next permutation, i.e. we are already in decreasing order.]
template <class BidirectionalIterator>
bool next_permutation(BidirectionalIterator first,
BidirectionalIterator last) {
if (first == last) return false;
BidirectionalIterator i = first;
++i;
if (i == last) return false;
i = last;
--i;
for(;;) {
BidirectionalIterator ii = i--;
if (*i <*ii) {
BidirectionalIterator j = last;
while (!(*i <*--j));
iter_swap(i, j);
reverse(ii, last);
return true;
}
if (i == first) {
reverse(first, last);
return false;
}
}
}
It might seem that it can take O(n) time per permutation, but if you think about it more carefully, you can prove that it takes O(n!) time for all permutations in total, so only O(1) -- constant time -- per permutation.
The good thing is that the algorithm works even when you have a sequence with repeated elements: with, say, "232254421", it would find the tail as "54421", swap the "2" and "4" (so "232454221"), reverse the rest, giving "232412245", which is the next permutation.
Assuming that we're talking about lexicographic order over the values being permuted, there are two general approaches that you can use:
transform one permutation of the elements to the next permutation (as ShreevatsaR posted), or
directly compute the nth permutation, while counting n from 0 upward.
For those (like me ;-) who don't speak c++ as natives, approach 1 can be implemented from the following pseudo-code, assuming zero-based indexing of an array with index zero on the "left" (substituting some other structure, such as a list, is "left as an exercise" ;-):
1. scan the array from right-to-left (indices descending from N-1 to 0)
1.1. if the current element is less than its right-hand neighbor,
call the current element the pivot,
and stop scanning
1.2. if the left end is reached without finding a pivot,
reverse the array and return
(the permutation was the lexicographically last, so its time to start over)
2. scan the array from right-to-left again,
to find the rightmost element larger than the pivot
(call that one the successor)
3. swap the pivot and the successor
4. reverse the portion of the array to the right of where the pivot was found
5. return
Here's an example starting with a current permutation of CADB:
1. scanning from the right finds A as the pivot in position 1
2. scanning again finds B as the successor in position 3
3. swapping pivot and successor gives CBDA
4. reversing everything following position 1 (i.e. positions 2..3) gives CBAD
5. CBAD is the next permutation after CADB
For the second approach (direct computation of the nth permutation), remember that there are N! permutations of N elements. Therefore, if you are permuting N elements, the first (N-1)! permutations must begin with the smallest element, the next (N-1)! permutations must begin with the second smallest, and so on. This leads to the following recursive approach (again in pseudo-code, numbering the permutations and positions from 0):
To find permutation x of array A, where A has N elements:
0. if A has one element, return it
1. set p to ( x / (N-1)! ) mod N
2. the desired permutation will be A[p] followed by
permutation ( x mod (N-1)! )
of the elements remaining in A after position p is removed
So, for example, the 13th permutation of ABCD is found as follows:
perm 13 of ABCD: {p = (13 / 3!) mod 4 = (13 / 6) mod 4 = 2; ABCD[2] = C}
C followed by perm 1 of ABD {because 13 mod 3! = 13 mod 6 = 1}
perm 1 of ABD: {p = (1 / 2!) mod 3 = (1 / 2) mod 2 = 0; ABD[0] = A}
A followed by perm 1 of BD {because 1 mod 2! = 1 mod 2 = 1}
perm 1 of BD: {p = (1 / 1!) mod 2 = (1 / 1) mod 2 = 1; BD[1] = D}
D followed by perm 0 of B {because 1 mod 1! = 1 mod 1 = 0}
B (because there's only one element)
DB
ADB
CADB
Incidentally, the "removal" of elements can be represented by a parallel array of booleans which indicates which elements are still available, so it is not necessary to create a new array on each recursive call.
So, to iterate across the permutations of ABCD, just count from 0 to 23 (4!-1) and directly compute the corresponding permutation.
You should check the Permutations article on wikipeda. Also, there is the concept of Factoradic numbers.
Anyway, the mathematical problem is quite hard.
In C# you can use an iterator, and stop the permutation algorithm using yield. The problem with this is that you cannot go back and forth, or use an index.
More examples of permutation algorithms to generate them.
Source: http://www.ddj.com/architect/201200326
Uses the Fike's Algorithm, that is the one of fastest known.
Uses the Algo to the Lexographic order.
Uses the nonlexographic, but runs faster than item 2.
1.
PROGRAM TestFikePerm;
CONST marksize = 5;
VAR
marks : ARRAY [1..marksize] OF INTEGER;
ii : INTEGER;
permcount : INTEGER;
PROCEDURE WriteArray;
VAR i : INTEGER;
BEGIN
FOR i := 1 TO marksize
DO Write ;
WriteLn;
permcount := permcount + 1;
END;
PROCEDURE FikePerm ;
{Outputs permutations in nonlexicographic order. This is Fike.s algorithm}
{ with tuning by J.S. Rohl. The array marks[1..marksizn] is global. The }
{ procedure WriteArray is global and displays the results. This must be}
{ evoked with FikePerm(2) in the calling procedure.}
VAR
dn, dk, temp : INTEGER;
BEGIN
IF
THEN BEGIN { swap the pair }
WriteArray;
temp :=marks[marksize];
FOR dn := DOWNTO 1
DO BEGIN
marks[marksize] := marks[dn];
marks [dn] := temp;
WriteArray;
marks[dn] := marks[marksize]
END;
marks[marksize] := temp;
END {of bottom level sequence }
ELSE BEGIN
FikePerm;
temp := marks[k];
FOR dk := DOWNTO 1
DO BEGIN
marks[k] := marks[dk];
marks[dk][ := temp;
FikePerm;
marks[dk] := marks[k];
END; { of loop on dk }
marks[k] := temp;l
END { of sequence for other levels }
END; { of FikePerm procedure }
BEGIN { Main }
FOR ii := 1 TO marksize
DO marks[ii] := ii;
permcount := 0;
WriteLn ;
WrieLn;
FikePerm ; { It always starts with 2 }
WriteLn ;
ReadLn;
END.
2.
PROGRAM TestLexPerms;
CONST marksize = 5;
VAR
marks : ARRAY [1..marksize] OF INTEGER;
ii : INTEGER;
permcount : INTEGER;
PROCEDURE WriteArray;
VAR i : INTEGER;
BEGIN
FOR i := 1 TO marksize
DO Write ;
permcount := permcount + 1;
WriteLn;
END;
PROCEDURE LexPerm ;
{ Outputs permutations in lexicographic order. The array marks is global }
{ and has n or fewer marks. The procedure WriteArray () is global and }
{ displays the results. }
VAR
work : INTEGER:
mp, hlen, i : INTEGER;
BEGIN
IF
THEN BEGIN { Swap the pair }
work := marks[1];
marks[1] := marks[2];
marks[2] := work;
WriteArray ;
END
ELSE BEGIN
FOR mp := DOWNTO 1
DO BEGIN
LexPerm<>;
hlen := DIV 2;
FOR i := 1 TO hlen
DO BEGIN { Another swap }
work := marks[i];
marks[i] := marks[n - i];
marks[n - i] := work
END;
work := marks[n]; { More swapping }
marks[n[ := marks[mp];
marks[mp] := work;
WriteArray;
END;
LexPerm<>
END;
END;
BEGIN { Main }
FOR ii := 1 TO marksize
DO marks[ii] := ii;
permcount := 1; { The starting position is permutation }
WriteLn < Starting position: >;
WriteLn
LexPerm ;
WriteLn < PermCount is , permcount>;
ReadLn;
END.
3.
PROGRAM TestAllPerms;
CONST marksize = 5;
VAR
marks : ARRAY [1..marksize] of INTEGER;
ii : INTEGER;
permcount : INTEGER;
PROCEDURE WriteArray;
VAR i : INTEGER;
BEGIN
FOR i := 1 TO marksize
DO Write ;
WriteLn;
permcount := permcount + 1;
END;
PROCEDURE AllPerm (n : INTEGER);
{ Outputs permutations in nonlexicographic order. The array marks is }
{ global and has n or few marks. The procedure WriteArray is global and }
{ displays the results. }
VAR
work : INTEGER;
mp, swaptemp : INTEGER;
BEGIN
IF
THEN BEGIN { Swap the pair }
work := marks[1];
marks[1] := marks[2];
marks[2] := work;
WriteArray;
END
ELSE BEGIN
FOR mp := DOWNTO 1
DO BEGIN
ALLPerm<< n - 1>>;
IF >
THEN swaptemp := 1
ELSE swaptemp := mp;
work := marks[n];
marks[n] := marks[swaptemp};
marks[swaptemp} := work;
WriteArray;
AllPerm< n-1 >;
END;
END;
BEGIN { Main }
FOR ii := 1 TO marksize
DO marks[ii] := ii
permcount :=1;
WriteLn < Starting position; >;
WriteLn;
Allperm < marksize>;
WriteLn < Perm count is , permcount>;
ReadLn;
END.
the permutations function in clojure.contrib.lazy_seqs already claims to do just this.
It looks necromantic in 2022 but I'm sharing it anyway
Here an implementation of C++ next_permutation in Java can be found. The idea of using it in Clojure might be something like
(println (lazy-seq (iterator-seq (NextPermutationIterator. (list 'a 'b 'c)))))
disclaimer: I'm the author and maintainer of the project

Resources