Stable merging two arrays to maximize product of adjacent elements - algorithm

The following is an interview question which I am unable to answer in a complexity less than an exponential complexity. Though it seems to be an DP problem, I am unable to form the base cases and analyze it properly. Any help is appreciated.
You are given 2 arrays of size 'n' each. You need to stable-merge
these arrays such that in the new array sum of product of consecutive
elements is maximized.
For example
A= { 2, 1, 5}
B= { 3, 7, 9}
Stable merging A = {a1, a2, a3} and B = {b1, b2, b3} will create an array C with 2*n elements. For example, say C = { b1, a1, a2, a3, b2, b3 } by merging (stable) A and B. Then the sum = b1*a1 + a2*a3 + b2*b3 should be a maximum.

Lets define c[i,j] as solution of same problem but array start from i to end for left. And j to end for right.
So c[0,0] will give solution to original problem.
c[i,j] consists of.
MaxValue = the max value.
NeedsPairing = true or false = depending on left most element is unpaired.
Child = [p,q] or NULL = defining child key which ends up optimal sum till this level.
Now defining the optimal substructure for this DP
c[i,j] = if(NeedsPairing) { left[i]*right[j] } + Max { c[i+1, j], c[i, j+1] }
It's captured more in detail in this code.
if (lstart == lend)
{
if (rstart == rend)
{
nodeResult = new NodeData() { Max = 0, Child = null, NeedsPairing = false };
}
else
{
nodeResult = new NodeData()
{
Max = ComputeMax(right, rstart),
NeedsPairing = (rend - rstart) % 2 != 0,
Child = null
};
}
}
else
{
if (rstart == rend)
{
nodeResult = new NodeData()
{
Max = ComputeMax(left, lstart),
NeedsPairing = (lend - lstart) % 2 != 0,
Child = null
};
}
else
{
var downLef = Solve(left, lstart + 1, right, rstart);
var lefResNode = new NodeData()
{
Child = Tuple.Create(lstart + 1, rstart),
};
if (downLef.NeedsPairing)
{
lefResNode.Max = downLef.Max + left[lstart] * right[rstart];
lefResNode.NeedsPairing = false;
}
else
{
lefResNode.Max = downLef.Max;
lefResNode.NeedsPairing = true;
}
var downRt = Solve(left, lstart, right, rstart + 1);
var rtResNode = new NodeData()
{
Child = Tuple.Create(lstart, rstart + 1),
};
if (downRt.NeedsPairing)
{
rtResNode.Max = downRt.Max + right[rstart] * left[lstart];
rtResNode.NeedsPairing = false;
}
else
{
rtResNode.Max = downRt.Max;
rtResNode.NeedsPairing = true;
}
if (lefResNode.Max > rtResNode.Max)
{
nodeResult = lefResNode;
}
else
{
nodeResult = rtResNode;
}
}
}
And we use memoization to prevent solving sub problem again.
Dictionary<Tuple<int, int>, NodeData> memoization = new Dictionary<Tuple<int, int>, NodeData>();
And in end we use NodeData.Child to trace back the path.

For A = {a1,a2,...,an}, B = {b1,b2,...,bn},
Define DP[i,j] as the maximum stable-merging sum between {ai,...,an} and {bj,...,bn}.
(1 <= i <= n+1, 1 <= j <= n+1)
DP[n+1,n+1] = 0, DP[n+1,k] = bk*bk+1 +...+ bn-1*bn, DP[k,n+1] = ak*ak+1 +...+ an-1*an.
DP[n,k] = max{an*bk + bk+1*bk+2 +..+ bn-1*bn, DP[n,k+2] + bk*bk+1}
DP[k,n] = max{ak*bn + ak+1*ak+2 +..+ an-1*an, DP[k+2,n] + ak*ak+1}
DP[i,j] = max{DP[i+2,j] + ai*ai+1, DP[i,j+2] + bi*bi+1, DP[i+1,j+1] + ai*bi}.
And you return DP[1,1].
Explanation:
In each step you have to consider 3 options: take first 2 elements from remaining A, take first 2 element from remaining B, or take both from A and B (Since you can't change the order of A and B, you will have to take the first from A and first from B).

My solution is rather simple. I just explore all the possible stable merges. Following the working C++ program:
#include<iostream>
using namespace std;
void find_max_sum(int *arr1, int len1, int *arr2, int len2, int sum, int& max_sum){
if(len1 >= 2)
find_max_sum(arr1+2, len1-2, arr2, len2, sum+(arr1[0]*arr1[1]), max_sum);
if(len1 >= 1 && len2 >= 1)
find_max_sum(arr1+1, len1-1, arr2+1, len2-1, sum+(arr1[0]*arr2[0]), max_sum);
if(len2 >= 2)
find_max_sum(arr1, len1, arr2+2, len2-2, sum+(arr2[0]*arr2[1]), max_sum);
if(len1 == 0 && len2 == 0 && sum > max_sum)
max_sum = sum;
}
int main(){
int arr1[3] = {2,1,3};
int arr2[3] = {3,7,9};
int max_sum=0;
find_max_sum(arr1, 3, arr2, 3, 0, max_sum);
cout<<max_sum<<endl;
return 0;
}

Define F(i, j) as the maximal pairwise sum that can be achieved by stable merging Ai...An and Bj...Bn.
At each step in the merge, we can choose one of three options:
Take the first two remaining elements of A.
Take the first remaining element of A and the first remaining element of B.
Take the first two remaining elements of B.
Thus, F(i, j) can be defined recursively as:
F(n, n) = 0
F(i, j) = max
(
AiAi+1 + F(i+2, j), //Option 1
AiBj + F(i+1, j+1), //Option 2
BjBj+1 + F(i, j+2) //Option 3
)
To find the optimal merging of the two lists, we need to find F(0, 0), naively, this would involve computing intermediate values many times, but by caching each F(i, j) as it is found, the complexity is reduced to O(n^2).
Here is some quick and dirty c++ that does this:
#include <iostream>
#define INVALID -1
int max(int p, int q, int r)
{
return p >= q && p >= r ? p : q >= r ? q : r;
}
int F(int i, int j, int * a, int * b, int len, int * cache)
{
if (cache[i * (len + 1) + j] != INVALID)
return cache[i * (len + 1) + j];
int p = 0, q = 0, r = 0;
if (i < len && j < len)
p = a[i] * b[j] + F(i + 1, j + 1, a, b, len, cache);
if (i + 1 < len)
q = a[i] * a[i + 1] + F(i + 2, j, a, b, len, cache);
if (j + 1 < len)
r = b[j] * b[j + 1] + F(i, j + 2, a, b, len, cache);
return cache[i * (len + 1) + j] = max(p, q, r);
}
int main(int argc, char ** argv)
{
int a[] = {2, 1, 3};
int b[] = {3, 7, 9};
int len = 3;
int cache[(len + 1) * (len + 1)];
for (int i = 0; i < (len + 1) * (len + 1); i++)
cache[i] = INVALID;
cache[(len + 1) * (len + 1) - 1] = 0;
std::cout << F(0, 0, a, b, len, cache) << std::endl;
}
If you need the actual merged sequence rather than just the sum, you will also have to cache which of p, q, r was selected and backtrack.

One way to solve it by dynamic programming is to always store:
S[ i ][ j ][ l ] = "Best way to merge A[1,...,i] and B[1,...,j] such that, if l == 0, the last element is A[i], and if l == 1, the last element is B[j]".
Then, the DP would be (pseudo-code, insert any number at A[0] and B[0], and let the actual input be in A[1]...A[n], B[1]...B[n]):
S[0][0][0] = S[0][0][1] = S[1][0][0] = S[0][1][1] = 0; // If there is only 0 or 1 element at the merged vector, the answer is 0
S[1][0][1] = S[0][1][1] = -infinity; // These two cases are impossible
for i = 1...n:
for j = 1...n:
// Note that the cases involving A[0] or B[0] are correctly handled by "-infinity"
// First consider the case when the last element is A[i]
S[i][j][0] = max(S[i-1][j][0] + A[i-1]*A[i], // The second to last is A[i-1].
S[i-1][j][1] + B[j]*A[i]); // The second to last is B[j]
// Similarly consider when the last element is B[j]
S[i][j][1] = max(S[i][j-1][0] + A[i]*B[j], // The second to last is A[i]
S[i][j-1][1] + B[j-1]*B[j]); // The second to last is B[j-1]
// The answer is the best way to merge all elements of A and B, leaving either A[n] or B[n] at the end.
return max(S[n][n][0], S[n][n][1]);

Merge it and sort it. May be merge sort. Sorted array give max value.(Merge is just append the arrays). complexity is nlogn.

Here's a solution in Clojure, if you're interested in something a little more off the beaten path. It's O(n3), as it just generates all n2 stable merges and spends n time summing the products. There's a lot less messing with offsets and arithmetic than the array-based imperative solutions I've seen, which hopefully makes the algorithm stand out more. And it's pretty flexible, too: if you want to, for example, include c2*c3 as well as c1*c2 and c3*c4, you can simply replace (partition 2 coll) with (partition 2 1 coll).
;; return a list of all possible ways to stably merge the two input collections
(defn stable-merges [xs ys]
(lazy-seq
(cond (empty? xs) [ys]
(empty? ys) [xs]
:else (concat (let [[x & xs] xs]
(for [merge (stable-merges xs ys)]
(cons x merge)))
(let [[y & ys] ys]
(for [merge (stable-merges xs ys)]
(cons y merge)))))))
;; split up into chunks of two, multiply, and add the results
(defn sum-of-products [coll]
(apply + (for [[a b] (partition 2 coll)]
(* a b))))
;; try all the merges, find the one with the biggest sum
(defn best-merge [xs ys]
(apply max-key sum-of-products (stable-merges xs ys)))
user> (best-merge [2 1 5] [3 7 9])
(2 1 3 5 7 9)

I think it would be better if you provide few more test cases. But I think the normal merging of two arrays similar to merging done in merge sort will solve the problem.
The pseudocode for merging arrays is given on Wiki.
Basically it is the normal merging algorithm used in Merge Sort. In
Merge sort the, arrays are sorted but here we are applying same merging
algorithm for unsorted arrays.
Step 0: Let i be the index for first array(A) and j be the index for second array(B). i=0 , j=0
Step 1: Compare A[i]=2 & B[j]=3. Since 2<3 it will be the first element of the new merged array(C). i=1, j=0 (Add that number to the new array which is lesser)
Step 2: Again Compare A[i]=1 and B[j]=3. 1<3 therefore insert 1 in C. i++, j=0;
Step 3: Again Compare A[i]=3 and B[j]=3. Any number can go in C(both are same). i++, j=0; (Basically we are increasing the index of that array from which number is inserted)
Step 4: Since the array A is complete just directly insert the elements of Array B in C. Otherwise repeat previous steps.
Array C = { 2, 1, 3, 3, 7,9}
I haven't done much research on it. So if there is any test case which could fail, please provide one.

Related

Counting inversions in a segment with updates

I'm trying to solve a problem which goes like this:
Problem
Given an array of integers "arr" of size "n", process two types of queries. There are "q" queries you need to answer.
Query type 1
input: l r
result: output number of inversions in [l, r]
Query type 2
input: x y
result: update the value at arr [x] to y
Inversion
For every index j < i, if arr [j] > arr [i], the pair (j, i) is one inversion.
Input
n = 5
q = 3
arr = {1, 4, 3, 5, 2}
queries:
type = 1, l = 1, r = 5
type = 2, x = 1, y = 4
type = 1, l = 1, r = 5
Output
4
6
Constraints
Time: 4 secs
1 <= n, q <= 100000
1 <= arr [i] <= 40
1 <= l, r, x <= n
1 <= y <= 40
I know how to solve a simpler version of this problem without updates, i.e. to simply count the number of inversions for each position using a segment tree or fenwick tree in O(N*log(N)). The only solution I have to this problem is O(q*N*log(N)) (I think) with segment tree other than the O(q*N2) trivial algorithm. This however does not fit within the time constraints of the problem. I would like to have hints towards a better algorithm to solve the problem in O(N*log(N)) (if it's possible) or maybe O(N*log2(N)).
I first came across this problem two days ago and have been spending a few hours here and there to try and solve it. However, I'm finding it non-trivial to do so and would like to have some help/hints regarding the same. Thanks for your time and patience.
Updates
Solution
With the suggestion, answer and help by Tanvir Wahid, I've implemented the source code for the problem in C++ and would like to share it here for anyone who might stumble across this problem and not have an intuitive idea on how to solve it. Thank you!
Let's build a segment tree with each node containing information about how many inversions exist and the frequency count of elements present in its segment of authority.
node {
integer inversion_count : 0
array [40] frequency : {0...0}
}
Building the segment tree and handling updates
For each leaf node, initialise inversion count to 0 and increase frequency of the represented element from the input array to 1. The frequency of the parent nodes can be calculated by summing up frequencies of the left and right childrens. The inversion count of parent nodes can be calculated by summing up the inversion counts of left and right children nodes added with the new inversions created upon merging the two segments of their authority which can be calculated using the frequencies of elements in each child. This calculation basically finds out the product of frequencies of bigger elements in the left child and frequencies of smaller elements in the right child.
parent.inversion_count = left.inversion_count + right.inversion_count
for i in [39, 0]
for j in [0, i)
parent.inversion_count += left.frequency [i] * right.frequency [j]
Updates are handled similarly.
Answering range queries on inversion counts
To answer the query for the number of inversions in the range [l, r], we calculate the inversions using the source code attached below.
Time Complexity: O(q*log(n))
Note
The source code attached does break some good programming habits. The sole purpose of the code is to "solve" the given problem and not to accomplish anything else.
Source Code
/**
Lost Arrow (Aryan V S)
Saturday 2020-10-10
**/
#include "bits/stdc++.h"
using namespace std;
struct node {
int64_t inv = 0;
vector <int> freq = vector <int> (40, 0);
void combine (const node& l, const node& r) {
inv = l.inv + r.inv;
for (int i = 39; i >= 0; --i) {
for (int j = 0; j < i; ++j) {
// frequency of bigger numbers in the left * frequency of smaller numbers on the right
inv += 1LL * l.freq [i] * r.freq [j];
}
freq [i] = l.freq [i] + r.freq [i];
}
}
};
void build (vector <node>& tree, vector <int>& a, int v, int tl, int tr) {
if (tl == tr) {
tree [v].inv = 0;
tree [v].freq [a [tl]] = 1;
}
else {
int tm = (tl + tr) / 2;
build(tree, a, 2 * v + 1, tl, tm);
build(tree, a, 2 * v + 2, tm + 1, tr);
tree [v].combine(tree [2 * v + 1], tree [2 * v + 2]);
}
}
void update (vector <node>& tree, int v, int tl, int tr, int pos, int val) {
if (tl == tr) {
tree [v].inv = 0;
tree [v].freq = vector <int> (40, 0);
tree [v].freq [val] = 1;
}
else {
int tm = (tl + tr) / 2;
if (pos <= tm)
update(tree, 2 * v + 1, tl, tm, pos, val);
else
update(tree, 2 * v + 2, tm + 1, tr, pos, val);
tree [v].combine(tree [2 * v + 1], tree [2 * v + 2]);
}
}
node inv_cnt (vector <node>& tree, int v, int tl, int tr, int l, int r) {
if (l > r)
return node();
if (tl == l && tr == r)
return tree [v];
int tm = (tl + tr) / 2;
node result;
result.combine(inv_cnt(tree, 2 * v + 1, tl, tm, l, min(r, tm)), inv_cnt(tree, 2 * v + 2, tm + 1, tr, max(l, tm + 1), r));
return result;
}
void solve () {
int n, q;
cin >> n >> q;
vector <int> a (n);
for (int i = 0; i < n; ++i) {
cin >> a [i];
--a [i];
}
vector <node> tree (4 * n);
build(tree, a, 0, 0, n - 1);
while (q--) {
int type, x, y;
cin >> type >> x >> y;
--x; --y;
if (type == 1) {
node result = inv_cnt(tree, 0, 0, n - 1, x, y);
cout << result.inv << '\n';
}
else if (type == 2) {
update(tree, 0, 0, n - 1, x, y);
}
else
assert(false);
}
}
int main () {
std::ios::sync_with_stdio(false);
std::cin.tie(nullptr);
std::cout.precision(10);
std::cout << std::fixed << std::boolalpha;
int t = 1;
// std::cin >> t;
while (t--)
solve();
return 0;
}
arr[i] can be at most 40. We can use this to our advantage. What we need is a segment tree. Each node will hold 41 values (A long long int which represents inversions for this range and a array of size 40 for count of each numbers. A struct will do). How do we merge two children of a node. We know inversions for left child and right child. Also know frequency of each numbers in both of them. Inversion of parent node will be summation of inversions of both children plus number of inversions between left and right child. We can easily find inversions between two children from frequency of numbers. Query can be done in similar way. Complexity O(40*qlog(n))

How can I find permutations of size N with non repeated objects from a list with repeated objects? [duplicate]

I want to write a function that takes an array of letters as an argument and a number of those letters to select.
Say you provide an array of 8 letters and want to select 3 letters from that. Then you should get:
8! / ((8 - 3)! * 3!) = 56
Arrays (or words) in return consisting of 3 letters each.
Art of Computer Programming Volume 4: Fascicle 3 has a ton of these that might fit your particular situation better than how I describe.
Gray Codes
An issue that you will come across is of course memory and pretty quickly, you'll have problems by 20 elements in your set -- 20C3 = 1140. And if you want to iterate over the set it's best to use a modified gray code algorithm so you aren't holding all of them in memory. These generate the next combination from the previous and avoid repetitions. There are many of these for different uses. Do we want to maximize the differences between successive combinations? minimize? et cetera.
Some of the original papers describing gray codes:
Some Hamilton Paths and a Minimal Change Algorithm
Adjacent Interchange Combination Generation Algorithm
Here are some other papers covering the topic:
An Efficient Implementation of the Eades, Hickey, Read Adjacent Interchange Combination Generation Algorithm (PDF, with code in Pascal)
Combination Generators
Survey of Combinatorial Gray Codes (PostScript)
An Algorithm for Gray Codes
Chase's Twiddle (algorithm)
Phillip J Chase, `Algorithm 382: Combinations of M out of N Objects' (1970)
The algorithm in C...
Index of Combinations in Lexicographical Order (Buckles Algorithm 515)
You can also reference a combination by its index (in lexicographical order). Realizing that the index should be some amount of change from right to left based on the index we can construct something that should recover a combination.
So, we have a set {1,2,3,4,5,6}... and we want three elements. Let's say {1,2,3} we can say that the difference between the elements is one and in order and minimal. {1,2,4} has one change and is lexicographically number 2. So the number of 'changes' in the last place accounts for one change in the lexicographical ordering. The second place, with one change {1,3,4} has one change but accounts for more change since it's in the second place (proportional to the number of elements in the original set).
The method I've described is a deconstruction, as it seems, from set to the index, we need to do the reverse – which is much trickier. This is how Buckles solves the problem. I wrote some C to compute them, with minor changes – I used the index of the sets rather than a number range to represent the set, so we are always working from 0...n.
Note:
Since combinations are unordered, {1,3,2} = {1,2,3} --we order them to be lexicographical.
This method has an implicit 0 to start the set for the first difference.
Index of Combinations in Lexicographical Order (McCaffrey)
There is another way:, its concept is easier to grasp and program but it's without the optimizations of Buckles. Fortunately, it also does not produce duplicate combinations:
The set that maximizes , where .
For an example: 27 = C(6,4) + C(5,3) + C(2,2) + C(1,1). So, the 27th lexicographical combination of four things is: {1,2,5,6}, those are the indexes of whatever set you want to look at. Example below (OCaml), requires choose function, left to reader:
(* this will find the [x] combination of a [set] list when taking [k] elements *)
let combination_maccaffery set k x =
(* maximize function -- maximize a that is aCb *)
(* return largest c where c < i and choose(c,i) <= z *)
let rec maximize a b x =
if (choose a b ) <= x then a else maximize (a-1) b x
in
let rec iterate n x i = match i with
| 0 -> []
| i ->
let max = maximize n i x in
max :: iterate n (x - (choose max i)) (i-1)
in
if x < 0 then failwith "errors" else
let idxs = iterate (List.length set) x k in
List.map (List.nth set) (List.sort (-) idxs)
A small and simple combinations iterator
The following two algorithms are provided for didactic purposes. They implement an iterator and (a more general) folder overall combinations.
They are as fast as possible, having the complexity O(nCk). The memory consumption is bound by k.
We will start with the iterator, which will call a user provided function for each combination
let iter_combs n k f =
let rec iter v s j =
if j = k then f v
else for i = s to n - 1 do iter (i::v) (i+1) (j+1) done in
iter [] 0 0
A more general version will call the user provided function along with the state variable, starting from the initial state. Since we need to pass the state between different states we won't use the for-loop, but instead, use recursion,
let fold_combs n k f x =
let rec loop i s c x =
if i < n then
loop (i+1) s c ##
let c = i::c and s = s + 1 and i = i + 1 in
if s < k then loop i s c x else f c x
else x in
loop 0 0 [] x
In C#:
public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int k)
{
return k == 0 ? new[] { new T[0] } :
elements.SelectMany((e, i) =>
elements.Skip(i + 1).Combinations(k - 1).Select(c => (new[] {e}).Concat(c)));
}
Usage:
var result = Combinations(new[] { 1, 2, 3, 4, 5 }, 3);
Result:
123
124
125
134
135
145
234
235
245
345
Short java solution:
import java.util.Arrays;
public class Combination {
public static void main(String[] args){
String[] arr = {"A","B","C","D","E","F"};
combinations2(arr, 3, 0, new String[3]);
}
static void combinations2(String[] arr, int len, int startPosition, String[] result){
if (len == 0){
System.out.println(Arrays.toString(result));
return;
}
for (int i = startPosition; i <= arr.length-len; i++){
result[result.length - len] = arr[i];
combinations2(arr, len-1, i+1, result);
}
}
}
Result will be
[A, B, C]
[A, B, D]
[A, B, E]
[A, B, F]
[A, C, D]
[A, C, E]
[A, C, F]
[A, D, E]
[A, D, F]
[A, E, F]
[B, C, D]
[B, C, E]
[B, C, F]
[B, D, E]
[B, D, F]
[B, E, F]
[C, D, E]
[C, D, F]
[C, E, F]
[D, E, F]
May I present my recursive Python solution to this problem?
def choose_iter(elements, length):
for i in xrange(len(elements)):
if length == 1:
yield (elements[i],)
else:
for next in choose_iter(elements[i+1:], length-1):
yield (elements[i],) + next
def choose(l, k):
return list(choose_iter(l, k))
Example usage:
>>> len(list(choose_iter("abcdefgh",3)))
56
I like it for its simplicity.
Lets say your array of letters looks like this: "ABCDEFGH". You have three indices (i, j, k) indicating which letters you are going to use for the current word, You start with:
A B C D E F G H
^ ^ ^
i j k
First you vary k, so the next step looks like that:
A B C D E F G H
^ ^ ^
i j k
If you reached the end you go on and vary j and then k again.
A B C D E F G H
^ ^ ^
i j k
A B C D E F G H
^ ^ ^
i j k
Once you j reached G you start also to vary i.
A B C D E F G H
^ ^ ^
i j k
A B C D E F G H
^ ^ ^
i j k
...
Written in code this look something like that
void print_combinations(const char *string)
{
int i, j, k;
int len = strlen(string);
for (i = 0; i < len - 2; i++)
{
for (j = i + 1; j < len - 1; j++)
{
for (k = j + 1; k < len; k++)
printf("%c%c%c\n", string[i], string[j], string[k]);
}
}
}
The following recursive algorithm picks all of the k-element combinations from an ordered set:
choose the first element i of your combination
combine i with each of the combinations of k-1 elements chosen recursively from the set of elements larger than i.
Iterate the above for each i in the set.
It is essential that you pick the rest of the elements as larger than i, to avoid repetition. This way [3,5] will be picked only once, as [3] combined with [5], instead of twice (the condition eliminates [5] + [3]). Without this condition you get variations instead of combinations.
Short example in Python:
def comb(sofar, rest, n):
if n == 0:
print sofar
else:
for i in range(len(rest)):
comb(sofar + rest[i], rest[i+1:], n-1)
>>> comb("", "abcde", 3)
abc
abd
abe
acd
ace
ade
bcd
bce
bde
cde
For explanation, the recursive method is described with the following example:
Example: A B C D E
All combinations of 3 would be:
A with all combinations of 2 from the rest (B C D E)
B with all combinations of 2 from the rest (C D E)
C with all combinations of 2 from the rest (D E)
I found this thread useful and thought I would add a Javascript solution that you can pop into Firebug. Depending on your JS engine, it could take a little time if the starting string is large.
function string_recurse(active, rest) {
if (rest.length == 0) {
console.log(active);
} else {
string_recurse(active + rest.charAt(0), rest.substring(1, rest.length));
string_recurse(active, rest.substring(1, rest.length));
}
}
string_recurse("", "abc");
The output should be as follows:
abc
ab
ac
a
bc
b
c
In C++ the following routine will produce all combinations of length distance(first,k) between the range [first,last):
#include <algorithm>
template <typename Iterator>
bool next_combination(const Iterator first, Iterator k, const Iterator last)
{
/* Credits: Mark Nelson http://marknelson.us */
if ((first == last) || (first == k) || (last == k))
return false;
Iterator i1 = first;
Iterator i2 = last;
++i1;
if (last == i1)
return false;
i1 = last;
--i1;
i1 = k;
--i2;
while (first != i1)
{
if (*--i1 < *i2)
{
Iterator j = k;
while (!(*i1 < *j)) ++j;
std::iter_swap(i1,j);
++i1;
++j;
i2 = k;
std::rotate(i1,j,last);
while (last != j)
{
++j;
++i2;
}
std::rotate(k,i2,last);
return true;
}
}
std::rotate(first,k,last);
return false;
}
It can be used like this:
#include <string>
#include <iostream>
int main()
{
std::string s = "12345";
std::size_t comb_size = 3;
do
{
std::cout << std::string(s.begin(), s.begin() + comb_size) << std::endl;
} while (next_combination(s.begin(), s.begin() + comb_size, s.end()));
return 0;
}
This will print the following:
123
124
125
134
135
145
234
235
245
345
static IEnumerable<string> Combinations(List<string> characters, int length)
{
for (int i = 0; i < characters.Count; i++)
{
// only want 1 character, just return this one
if (length == 1)
yield return characters[i];
// want more than one character, return this one plus all combinations one shorter
// only use characters after the current one for the rest of the combinations
else
foreach (string next in Combinations(characters.GetRange(i + 1, characters.Count - (i + 1)), length - 1))
yield return characters[i] + next;
}
}
Simple recursive algorithm in Haskell
import Data.List
combinations 0 lst = [[]]
combinations n lst = do
(x:xs) <- tails lst
rest <- combinations (n-1) xs
return $ x : rest
We first define the special case, i.e. selecting zero elements. It produces a single result, which is an empty list (i.e. a list that contains an empty list).
For n > 0, x goes through every element of the list and xs is every element after x.
rest picks n - 1 elements from xs using a recursive call to combinations. The final result of the function is a list where each element is x : rest (i.e. a list which has x as head and rest as tail) for every different value of x and rest.
> combinations 3 "abcde"
["abc","abd","abe","acd","ace","ade","bcd","bce","bde","cde"]
And of course, since Haskell is lazy, the list is gradually generated as needed, so you can partially evaluate exponentially large combinations.
> let c = combinations 8 "abcdefghijklmnopqrstuvwxyz"
> take 10 c
["abcdefgh","abcdefgi","abcdefgj","abcdefgk","abcdefgl","abcdefgm","abcdefgn",
"abcdefgo","abcdefgp","abcdefgq"]
And here comes granddaddy COBOL, the much maligned language.
Let's assume an array of 34 elements of 8 bytes each (purely arbitrary selection.) The idea is to enumerate all possible 4-element combinations and load them into an array.
We use 4 indices, one each for each position in the group of 4
The array is processed like this:
idx1 = 1
idx2 = 2
idx3 = 3
idx4 = 4
We vary idx4 from 4 to the end. For each idx4 we get a unique combination
of groups of four. When idx4 comes to the end of the array, we increment idx3 by 1 and set idx4 to idx3+1. Then we run idx4 to the end again. We proceed in this manner, augmenting idx3,idx2, and idx1 respectively until the position of idx1 is less than 4 from the end of the array. That finishes the algorithm.
1 --- pos.1
2 --- pos 2
3 --- pos 3
4 --- pos 4
5
6
7
etc.
First iterations:
1234
1235
1236
1237
1245
1246
1247
1256
1257
1267
etc.
A COBOL example:
01 DATA_ARAY.
05 FILLER PIC X(8) VALUE "VALUE_01".
05 FILLER PIC X(8) VALUE "VALUE_02".
etc.
01 ARAY_DATA OCCURS 34.
05 ARAY_ITEM PIC X(8).
01 OUTPUT_ARAY OCCURS 50000 PIC X(32).
01 MAX_NUM PIC 99 COMP VALUE 34.
01 INDEXXES COMP.
05 IDX1 PIC 99.
05 IDX2 PIC 99.
05 IDX3 PIC 99.
05 IDX4 PIC 99.
05 OUT_IDX PIC 9(9).
01 WHERE_TO_STOP_SEARCH PIC 99 COMP.
* Stop the search when IDX1 is on the third last array element:
COMPUTE WHERE_TO_STOP_SEARCH = MAX_VALUE - 3
MOVE 1 TO IDX1
PERFORM UNTIL IDX1 > WHERE_TO_STOP_SEARCH
COMPUTE IDX2 = IDX1 + 1
PERFORM UNTIL IDX2 > MAX_NUM
COMPUTE IDX3 = IDX2 + 1
PERFORM UNTIL IDX3 > MAX_NUM
COMPUTE IDX4 = IDX3 + 1
PERFORM UNTIL IDX4 > MAX_NUM
ADD 1 TO OUT_IDX
STRING ARAY_ITEM(IDX1)
ARAY_ITEM(IDX2)
ARAY_ITEM(IDX3)
ARAY_ITEM(IDX4)
INTO OUTPUT_ARAY(OUT_IDX)
ADD 1 TO IDX4
END-PERFORM
ADD 1 TO IDX3
END-PERFORM
ADD 1 TO IDX2
END_PERFORM
ADD 1 TO IDX1
END-PERFORM.
Another C# version with lazy generation of the combination indices. This version maintains a single array of indices to define a mapping between the list of all values and the values for the current combination, i.e. constantly uses O(k) additional space during the entire runtime. The code generates individual combinations, including the first one, in O(k) time.
public static IEnumerable<T[]> Combinations<T>(this T[] values, int k)
{
if (k < 0 || values.Length < k)
yield break; // invalid parameters, no combinations possible
// generate the initial combination indices
var combIndices = new int[k];
for (var i = 0; i < k; i++)
{
combIndices[i] = i;
}
while (true)
{
// return next combination
var combination = new T[k];
for (var i = 0; i < k; i++)
{
combination[i] = values[combIndices[i]];
}
yield return combination;
// find first index to update
var indexToUpdate = k - 1;
while (indexToUpdate >= 0 && combIndices[indexToUpdate] >= values.Length - k + indexToUpdate)
{
indexToUpdate--;
}
if (indexToUpdate < 0)
yield break; // done
// update combination indices
for (var combIndex = combIndices[indexToUpdate] + 1; indexToUpdate < k; indexToUpdate++, combIndex++)
{
combIndices[indexToUpdate] = combIndex;
}
}
}
Test code:
foreach (var combination in new[] {'a', 'b', 'c', 'd', 'e'}.Combinations(3))
{
System.Console.WriteLine(String.Join(" ", combination));
}
Output:
a b c
a b d
a b e
a c d
a c e
a d e
b c d
b c e
b d e
c d e
Here is an elegant, generic implementation in Scala, as described on 99 Scala Problems.
object P26 {
def flatMapSublists[A,B](ls: List[A])(f: (List[A]) => List[B]): List[B] =
ls match {
case Nil => Nil
case sublist#(_ :: tail) => f(sublist) ::: flatMapSublists(tail)(f)
}
def combinations[A](n: Int, ls: List[A]): List[List[A]] =
if (n == 0) List(Nil)
else flatMapSublists(ls) { sl =>
combinations(n - 1, sl.tail) map {sl.head :: _}
}
}
If you can use SQL syntax - say, if you're using LINQ to access fields of an structure or array, or directly accessing a database that has a table called "Alphabet" with just one char field "Letter", you can adapt following code:
SELECT A.Letter, B.Letter, C.Letter
FROM Alphabet AS A, Alphabet AS B, Alphabet AS C
WHERE A.Letter<>B.Letter AND A.Letter<>C.Letter AND B.Letter<>C.Letter
AND A.Letter<B.Letter AND B.Letter<C.Letter
This will return all combinations of 3 letters, notwithstanding how many letters you have in table "Alphabet" (it can be 3, 8, 10, 27, etc.).
If what you want is all permutations, rather than combinations (i.e. you want "ACB" and "ABC" to count as different, rather than appear just once) just delete the last line (the AND one) and it's done.
Post-Edit: After re-reading the question, I realise what's needed is the general algorithm, not just a specific one for the case of selecting 3 items. Adam Hughes' answer is the complete one, unfortunately I cannot vote it up (yet). This answer's simple but works only for when you want exactly 3 items.
I had a permutation algorithm I used for project euler, in python:
def missing(miss,src):
"Returns the list of items in src not present in miss"
return [i for i in src if i not in miss]
def permutation_gen(n,l):
"Generates all the permutations of n items of the l list"
for i in l:
if n<=1: yield [i]
r = [i]
for j in permutation_gen(n-1,missing([i],l)): yield r+j
If
n<len(l)
you should have all combination you need without repetition, do you need it?
It is a generator, so you use it in something like this:
for comb in permutation_gen(3,list("ABCDEFGH")):
print comb
https://gist.github.com/3118596
There is an implementation for JavaScript. It has functions to get k-combinations and all combinations of an array of any objects. Examples:
k_combinations([1,2,3], 2)
-> [[1,2], [1,3], [2,3]]
combinations([1,2,3])
-> [[1],[2],[3],[1,2],[1,3],[2,3],[1,2,3]]
Lets say your array of letters looks like this: "ABCDEFGH". You have three indices (i, j, k) indicating which letters you are going to use for the current word, You start with:
A B C D E F G H
^ ^ ^
i j k
First you vary k, so the next step looks like that:
A B C D E F G H
^ ^ ^
i j k
If you reached the end you go on and vary j and then k again.
A B C D E F G H
^ ^ ^
i j k
A B C D E F G H
^ ^ ^
i j k
Once you j reached G you start also to vary i.
A B C D E F G H
^ ^ ^
i j k
A B C D E F G H
^ ^ ^
i j k
...
function initializePointers($cnt) {
$pointers = [];
for($i=0; $i<$cnt; $i++) {
$pointers[] = $i;
}
return $pointers;
}
function incrementPointers(&$pointers, &$arrLength) {
for($i=0; $i<count($pointers); $i++) {
$currentPointerIndex = count($pointers) - $i - 1;
$currentPointer = $pointers[$currentPointerIndex];
if($currentPointer < $arrLength - $i - 1) {
++$pointers[$currentPointerIndex];
for($j=1; ($currentPointerIndex+$j)<count($pointers); $j++) {
$pointers[$currentPointerIndex+$j] = $pointers[$currentPointerIndex]+$j;
}
return true;
}
}
return false;
}
function getDataByPointers(&$arr, &$pointers) {
$data = [];
for($i=0; $i<count($pointers); $i++) {
$data[] = $arr[$pointers[$i]];
}
return $data;
}
function getCombinations($arr, $cnt)
{
$len = count($arr);
$result = [];
$pointers = initializePointers($cnt);
do {
$result[] = getDataByPointers($arr, $pointers);
} while(incrementPointers($pointers, count($arr)));
return $result;
}
$result = getCombinations([0, 1, 2, 3, 4, 5], 3);
print_r($result);
Based on https://stackoverflow.com/a/127898/2628125, but more abstract, for any size of pointers.
Here you have a lazy evaluated version of that algorithm coded in C#:
static bool nextCombination(int[] num, int n, int k)
{
bool finished, changed;
changed = finished = false;
if (k > 0)
{
for (int i = k - 1; !finished && !changed; i--)
{
if (num[i] < (n - 1) - (k - 1) + i)
{
num[i]++;
if (i < k - 1)
{
for (int j = i + 1; j < k; j++)
{
num[j] = num[j - 1] + 1;
}
}
changed = true;
}
finished = (i == 0);
}
}
return changed;
}
static IEnumerable Combinations<T>(IEnumerable<T> elements, int k)
{
T[] elem = elements.ToArray();
int size = elem.Length;
if (k <= size)
{
int[] numbers = new int[k];
for (int i = 0; i < k; i++)
{
numbers[i] = i;
}
do
{
yield return numbers.Select(n => elem[n]);
}
while (nextCombination(numbers, size, k));
}
}
And test part:
static void Main(string[] args)
{
int k = 3;
var t = new[] { "dog", "cat", "mouse", "zebra"};
foreach (IEnumerable<string> i in Combinations(t, k))
{
Console.WriteLine(string.Join(",", i));
}
}
Hope this help you!
Another version, that forces all the first k to appear firstly, then all the first k+1 combinations, then all the first k+2 etc.. It means that if you have sorted array, the most important on the top, it would take them and expand gradually to the next ones - only when it is must do so.
private static bool NextCombinationFirstsAlwaysFirst(int[] num, int n, int k)
{
if (k > 1 && NextCombinationFirstsAlwaysFirst(num, num[k - 1], k - 1))
return true;
if (num[k - 1] + 1 == n)
return false;
++num[k - 1];
for (int i = 0; i < k - 1; ++i)
num[i] = i;
return true;
}
For instance, if you run the first method ("nextCombination") on k=3, n=5 you'll get:
0 1 2
0 1 3
0 1 4
0 2 3
0 2 4
0 3 4
1 2 3
1 2 4
1 3 4
2 3 4
But if you'll run
int[] nums = new int[k];
for (int i = 0; i < k; ++i)
nums[i] = i;
do
{
Console.WriteLine(string.Join(" ", nums));
}
while (NextCombinationFirstsAlwaysFirst(nums, n, k));
You'll get this (I added empty lines for clarity):
0 1 2
0 1 3
0 2 3
1 2 3
0 1 4
0 2 4
1 2 4
0 3 4
1 3 4
2 3 4
It's adding "4" only when must to, and also after "4" was added it adds "3" again only when it must to (after doing 01, 02, 12).
Array.prototype.combs = function(num) {
var str = this,
length = str.length,
of = Math.pow(2, length) - 1,
out, combinations = [];
while(of) {
out = [];
for(var i = 0, y; i < length; i++) {
y = (1 << i);
if(y & of && (y !== of))
out.push(str[i]);
}
if (out.length >= num) {
combinations.push(out);
}
of--;
}
return combinations;
}
Clojure version:
(defn comb [k l]
(if (= 1 k) (map vector l)
(apply concat
(map-indexed
#(map (fn [x] (conj x %2))
(comb (dec k) (drop (inc %1) l)))
l))))
Algorithm:
Count from 1 to 2^n.
Convert each digit to its binary representation.
Translate each 'on' bit to elements of your set, based on position.
In C#:
void Main()
{
var set = new [] {"A", "B", "C", "D" }; //, "E", "F", "G", "H", "I", "J" };
var kElement = 2;
for(var i = 1; i < Math.Pow(2, set.Length); i++) {
var result = Convert.ToString(i, 2).PadLeft(set.Length, '0');
var cnt = Regex.Matches(Regex.Escape(result), "1").Count;
if (cnt == kElement) {
for(int j = 0; j < set.Length; j++)
if ( Char.GetNumericValue(result[j]) == 1)
Console.Write(set[j]);
Console.WriteLine();
}
}
}
Why does it work?
There is a bijection between the subsets of an n-element set and n-bit sequences.
That means we can figure out how many subsets there are by counting sequences.
e.g., the four element set below can be represented by {0,1} X {0, 1} X {0, 1} X {0, 1} (or 2^4) different sequences.
So - all we have to do is count from 1 to 2^n to find all the combinations. (We ignore the empty set.) Next, translate the digits to their binary representation. Then substitute elements of your set for 'on' bits.
If you want only k element results, only print when k bits are 'on'.
(If you want all subsets instead of k length subsets, remove the cnt/kElement part.)
(For proof, see MIT free courseware Mathematics for Computer Science, Lehman et al, section 11.2.2. https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-042j-mathematics-for-computer-science-fall-2010/readings/ )
short python code, yielding index positions
def yield_combos(n,k):
# n is set size, k is combo size
i = 0
a = [0]*k
while i > -1:
for j in range(i+1, k):
a[j] = a[j-1]+1
i=j
yield a
while a[i] == i + n - k:
i -= 1
a[i] += 1
All said and and done here comes the O'caml code for that.
Algorithm is evident from the code..
let combi n lst =
let rec comb l c =
if( List.length c = n) then [c] else
match l with
[] -> []
| (h::t) -> (combi t (h::c))#(combi t c)
in
combi lst []
;;
Here is a method which gives you all combinations of specified size from a random length string. Similar to quinmars' solution, but works for varied input and k.
The code can be changed to wrap around, ie 'dab' from input 'abcd' w k=3.
public void run(String data, int howMany){
choose(data, howMany, new StringBuffer(), 0);
}
//n choose k
private void choose(String data, int k, StringBuffer result, int startIndex){
if (result.length()==k){
System.out.println(result.toString());
return;
}
for (int i=startIndex; i<data.length(); i++){
result.append(data.charAt(i));
choose(data,k,result, i+1);
result.setLength(result.length()-1);
}
}
Output for "abcde":
abc abd abe acd ace ade bcd bce bde cde
Short javascript version (ES 5)
let combine = (list, n) =>
n == 0 ?
[[]] :
list.flatMap((e, i) =>
combine(
list.slice(i + 1),
n - 1
).map(c => [e].concat(c))
);
let res = combine([1,2,3,4], 3);
res.forEach(e => console.log(e.join()));
Another python recusive solution.
def combination_indicies(n, k, j = 0, stack = []):
if len(stack) == k:
yield list(stack)
return
for i in range(j, n):
stack.append(i)
for x in combination_indicies(n, k, i + 1, stack):
yield x
stack.pop()
list(combination_indicies(5, 3))
Output:
[[0, 1, 2],
[0, 1, 3],
[0, 1, 4],
[0, 2, 3],
[0, 2, 4],
[0, 3, 4],
[1, 2, 3],
[1, 2, 4],
[1, 3, 4],
[2, 3, 4]]
I created a solution in SQL Server 2005 for this, and posted it on my website: http://www.jessemclain.com/downloads/code/sql/fn_GetMChooseNCombos.sql.htm
Here is an example to show usage:
SELECT * FROM dbo.fn_GetMChooseNCombos('ABCD', 2, '')
results:
Word
----
AB
AC
AD
BC
BD
CD
(6 row(s) affected)
Here is my proposition in C++
I tried to impose as little restriction on the iterator type as i could so this solution assumes just forward iterator, and it can be a const_iterator. This should work with any standard container. In cases where arguments don't make sense it throws std::invalid_argumnent
#include <vector>
#include <stdexcept>
template <typename Fci> // Fci - forward const iterator
std::vector<std::vector<Fci> >
enumerate_combinations(Fci begin, Fci end, unsigned int combination_size)
{
if(begin == end && combination_size > 0u)
throw std::invalid_argument("empty set and positive combination size!");
std::vector<std::vector<Fci> > result; // empty set of combinations
if(combination_size == 0u) return result; // there is exactly one combination of
// size 0 - emty set
std::vector<Fci> current_combination;
current_combination.reserve(combination_size + 1u); // I reserve one aditional slot
// in my vector to store
// the end sentinel there.
// The code is cleaner thanks to that
for(unsigned int i = 0u; i < combination_size && begin != end; ++i, ++begin)
{
current_combination.push_back(begin); // Construction of the first combination
}
// Since I assume the itarators support only incrementing, I have to iterate over
// the set to get its size, which is expensive. Here I had to itrate anyway to
// produce the first cobination, so I use the loop to also check the size.
if(current_combination.size() < combination_size)
throw std::invalid_argument("combination size > set size!");
result.push_back(current_combination); // Store the first combination in the results set
current_combination.push_back(end); // Here I add mentioned earlier sentinel to
// simplyfy rest of the code. If I did it
// earlier, previous statement would get ugly.
while(true)
{
unsigned int i = combination_size;
Fci tmp; // Thanks to the sentinel I can find first
do // iterator to change, simply by scaning
{ // from right to left and looking for the
tmp = current_combination[--i]; // first "bubble". The fact, that it's
++tmp; // a forward iterator makes it ugly but I
} // can't help it.
while(i > 0u && tmp == current_combination[i + 1u]);
// Here is probably my most obfuscated expression.
// Loop above looks for a "bubble". If there is no "bubble", that means, that
// current_combination is the last combination, Expression in the if statement
// below evaluates to true and the function exits returning result.
// If the "bubble" is found however, the ststement below has a sideeffect of
// incrementing the first iterator to the left of the "bubble".
if(++current_combination[i] == current_combination[i + 1u])
return result;
// Rest of the code sets posiotons of the rest of the iterstors
// (if there are any), that are to the right of the incremented one,
// to form next combination
while(++i < combination_size)
{
current_combination[i] = current_combination[i - 1u];
++current_combination[i];
}
// Below is the ugly side of using the sentinel. Well it had to haave some
// disadvantage. Try without it.
result.push_back(std::vector<Fci>(current_combination.begin(),
current_combination.end() - 1));
}
}
Here is a code I recently wrote in Java, which calculates and returns all the combination of "num" elements from "outOf" elements.
// author: Sourabh Bhat (heySourabh#gmail.com)
public class Testing
{
public static void main(String[] args)
{
// Test case num = 5, outOf = 8.
int num = 5;
int outOf = 8;
int[][] combinations = getCombinations(num, outOf);
for (int i = 0; i < combinations.length; i++)
{
for (int j = 0; j < combinations[i].length; j++)
{
System.out.print(combinations[i][j] + " ");
}
System.out.println();
}
}
private static int[][] getCombinations(int num, int outOf)
{
int possibilities = get_nCr(outOf, num);
int[][] combinations = new int[possibilities][num];
int arrayPointer = 0;
int[] counter = new int[num];
for (int i = 0; i < num; i++)
{
counter[i] = i;
}
breakLoop: while (true)
{
// Initializing part
for (int i = 1; i < num; i++)
{
if (counter[i] >= outOf - (num - 1 - i))
counter[i] = counter[i - 1] + 1;
}
// Testing part
for (int i = 0; i < num; i++)
{
if (counter[i] < outOf)
{
continue;
} else
{
break breakLoop;
}
}
// Innermost part
combinations[arrayPointer] = counter.clone();
arrayPointer++;
// Incrementing part
counter[num - 1]++;
for (int i = num - 1; i >= 1; i--)
{
if (counter[i] >= outOf - (num - 1 - i))
counter[i - 1]++;
}
}
return combinations;
}
private static int get_nCr(int n, int r)
{
if(r > n)
{
throw new ArithmeticException("r is greater then n");
}
long numerator = 1;
long denominator = 1;
for (int i = n; i >= r + 1; i--)
{
numerator *= i;
}
for (int i = 2; i <= n - r; i++)
{
denominator *= i;
}
return (int) (numerator / denominator);
}
}

How many numbers with length N with K digits D consecutively

Given positive numbers N, K, D (1<= N <= 10^5, 1<=K<=N, 1<=D<=9). How many numbers with N digits are there, that have K consecutive digits D? Write the answer mod (10^9 + 7).
For example: N = 4, K = 3, D = 6, there are 18 numbers:
1666, 2666, 3666, 4666, 5666, 6660,
6661, 6662, 6663, 6664, 6665, 6666, 6667, 6668, 6669, 7666, 8666 and 9666.
Can we calculate the answer in O(N*K) (maybe dynamic programming)?
I've tried using combination.
If
N = 4, K = 3, D = 6. The number I have to find is abcd.
+) if (a = b = c = D), I choose digit for d. There are 10 ways (6660, 6661, 6662, 6663, 6664, 6665, 6666, 6667, 6668, 6669)
+) if (b = c = d = D), I choose digit for a (a > 0). There are 9 ways (1666, 2666, 3666, 4666, 5666, 6666, 7666, 8666, 9666)
But in two cases, the number 6666 is counted twice. N and K is very large, how can I count all of them?
If one is looking for a mathematical solution (vs. necessarily an algorithmic one) it's good to look at it in terms of the base cases and some formulas. They might turn out to be something you can do some kind of refactoring and get a tidy formula for. So just for the heck of it...here's a take on it that doesn't deal with the special treatment of zeros. Because that throws some wrenches in.
Let's look at a couple of base cases, and call our answer F(N,K) (not considering D, as it isn't relevant to account for; but taking it as a parameter anyway.):
when N = 0
You'll never find any length sequences of digits when there's no digit.
F(0, K) = 0 for any K.
when N = 1
Fairly obvious. If you're looking for K sequential digits in a single digit, the options are limited. Looking for more than one? No dice.
F(1, K) = 0 for any K > 1
Looking for exactly one? Okay, there's one.
F(1, 1) = 1
Sequences of zero sequential digits allowed? Then all ten digits are fine.
F(1, 0) = 10
for N > 1
when K = 0
Basically, all N-digit numbers will qualify. So the number of possibilities meeting the bar is 10^N. (e.g. when N is 3 then 000, 001, 002, ... 999 for any D)
F(N, 0) = 10^N for any N > 1
when K = 1
Possibilities meeting the condition is any number with at least one D in it. How many N-digit numbers are there which contain at least one digit D? Well, it's going to be 10^N minus all the numbers that have no instances of the digit D. 10^N - 9^N
F(N, 1) = 10^N - 9^N for any N > 1
when N < K
No way to get K sequential digits if N is less than K
F(N, K) = 0 when N < K
when N = K
Only one possible way to get K sequential digits in N digits.
F(N, K) = 1 when N = K
when N > K
Okay, we already know that N > 1 and K > 1. So this is going to be the workhorse where we hope to use subexpressions for things we've already solved.
Let's start by considering popping off the digit at the head, leaving N-1 digits on the tail. If that N-1 series could achieve a series of K digits all by itself, then adding another digit will not change anything about that. That gives us a term 10 * F(N-1, K)
But if our head digit is a D, that is special. Our cases will be:
It might be the missing key for a series that started with K-1 instances of D, creating a full range of K.
It might complete a range of K-1 instances of D, but on a case that already had a K series of adjacent D (that we thus accounted for in the above term)
It might not help at all.
So let's consider two separate categories of tail series: those that start with K-1 instances of D and those that do not. Let's say we have N=7 shown as D:DDDXYZ and with K=4. We subtract one from N and from K to get 6 and 3, and if we subtract them we get how many trailing any-digits (XYZ here) are allowed to vary. Our term for the union of (1) and (2) to add in is 10^((N-1)-(K-1)).
Now it's time for some subtraction for our double-counts. We haven't double counted any cases that didn't start with K-1 instances of D, so we keep our attention on that (DDDXYZ). If the value in the X slot is a D then it was definitely double counted. We can subtract out the term for that as 10^(((N - 1) - 1) - (K - 1)); in this case giving us all the pairings of YZ digits you can get with X as D. (100).
The last thing to get rid of are the cases where X is not a D, but in whatever that leftover after the X position there was still a K length series of D. Again we reuse our function, and subtract a term 9 * F(N - K, K, D).
Paste it all together and simplify a couple of terms you get:
F(N, K) = 10 * F(N-1,K,D) + 10^(N-K) - 10^(10,N-K-1) - 9 * F(N-K-1,K,D)
Now we have a nice functional definition suitable for Haskelling or whatever. But I'm still awkward with that, and it's easy enough to test in C++. So here it is (assuming availability of a long integer power function):
long F(int N, int K, int D) {
if (N == 0) return 0;
if (K > N) return 0;
if (K == N) return 1;
if (N == 1) {
if (K == 0) return 10;
if (K == 1) return 1;
return 0;
}
if (K == 0)
return power(10, N);
if (K == 1)
return power(10, N) - power(9, N);
return (
10 * F(N - 1, K, D)
+ power(10, N - K)
- power(10, N - K - 1)
- 9 * F(N - K - 1, K, D)
);
}
To double-check this against an exhaustive generator, here's a little C++ test program that builds the list of vectors that it scans using std::search_n. It checks the slow way against the fast way for N and K. I ran it from 0 to 9 for each:
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
// http://stackoverflow.com/a/1505791/211160
long power(int x, int p) {
if (p == 0) return 1;
if (p == 1) return x;
long tmp = power(x, p/2);
if (p%2 == 0) return tmp * tmp;
else return x * tmp * tmp;
}
long F(int N, int K, int D) {
if (N == 0) return 0;
if (K > N) return 0;
if (K == N) return 1;
if (N == 1) {
if (K == 0) return 10;
if (K == 1) return 1;
return 0;
}
if (K == 0)
return power(10, N);
if (K == 1)
return power(10, N) - power(9, N);
return (
10 * F(N - 1, K, D)
+ power(10, N - K)
- power(10, N - K - 1)
- 9 * F(N - K - 1, K, D)
);
}
long FSlowCore(int N, int K, int D, vector<int> & digits) {
if (N == 0) {
if (search_n(digits.begin(), digits.end(), K, D) != end(digits)) {
return 1;
} else
return 0;
}
long total = 0;
digits.push_back(0);
for (int curDigit = 0; curDigit <= 9; curDigit++) {
total += FSlowCore(N - 1, K, D, digits);
digits.back()++;
}
digits.pop_back();
return total;
}
long FSlow(int N, int K, int D) {
vector<int> digits;
return FSlowCore(N, K, D, digits);
}
bool TestF(int N, int K, int D) {
long slow = FSlow(N, K, D);
long fast = F(N, K, D);
cout << "when N = " << N
<< " and K = " << K
<< " and D = " << D << ":\n";
cout << "Fast way gives " << fast << "\n";
cout << "Slow way gives " << slow << "\n";
cout << "\n";
return slow == fast;
}
int main() {
for (int N = 0; N < 10; N++) {
for (int K = 0; K < 10; K++) {
if (!TestF(N, K, 6)) {
exit(1);
}
}
}
}
Of course, since it counts leading zeros it will be different from the answers you got. See the test output here in this gist.
Modifying to account for the special-case zero handling is left as an exercise for the reader (as is modular arithmetic). Eliminating the zeros make it messier. Either way, this may be an avenue of attack for reducing the number of math operations even further with some transformations...perhaps.
Miquel is almost correct, but he missed a lot of cases. So, with N = 8, K = 5, and D = 6, we will need to look for those numbers that has the form:
66666xxx
y66666xx
xy66666x
xxy66666
with additional condition that y cannot be D.
So we can have our formula for this example:
66666xxx = 10^3
y66666xx = 8*10^2 // As 0 can also not be the first number
xy66666x = 9*9*10
xxy66666 = 9*10*9
So, the result is 3420.
For case N = 4, K = 3 and D = 6, we have
666x = 10
y666 = 8//Again, 0 is not counted!
So, we have 18 cases!
Note: We need to be careful that the first number cannot be 0! And we need to handle the case when D is zero too!
Update Java working code, Time complexity O(N-K)
static long cal(int n, int k, int d) {
long Mod = 1000000007;
long result = 0;
for (int i = 0; i <= n - k; i++) {//For all starting positions
if (i != 0 || d != 0) {
int left = n - k;
int upper_half = i;//Amount of digit that preceding DDD
int lower_half = n - k - i;//Amount of digit following DDD
long tmp = 1;
if (upper_half == 1) {
if (d == 0) {
tmp = 9;
} else {
tmp = 8;
}
}else if(upper_half >= 2){
//The pattern is x..yDDD...
tmp = (long) (9 * 9 * Math.pow(10, upper_half - 2));
}
tmp *= Math.pow(10, lower_half);
//System.out.println(tmp + " " + upper_half + " " + lower_half);
result += tmp;
result %= Mod;
}
}
return result;
}
Sample Tests:
N = 8, K = 5, D = 6
Output
3420
N = 4, K = 3, D = 6
Output
18
N = 4, K = 3, D = 0
Output
9

Discover long patterns

Given a sorted list of numbers, I would like to find the longest subsequence where the differences between successive elements are geometrically increasing. So if the list is
1, 2, 3, 4, 7, 15, 27, 30, 31, 81
then the subsequence is 1, 3, 7, 15, 31. Alternatively consider 1, 2, 5, 6, 11, 15, 23, 41, 47 which has subsequence 5, 11, 23, 47 with a = 3 and k = 2.
Can this be solved in O(n2) time? Where n is the length of the list.
I am interested both in the general case where the progression of differences is ak, ak2, ak3, etc., where both a and k are integers, and in the special case where a = 1, so the progression of difference is k, k2, k3, etc.
Update
I have made an improvement of the algorithm that it takes an average of O(M + N^2) and memory needs of O(M+N). Mainly is the same that the protocol described below, but to calculate the possible factors A,K for ech diference D, I preload a table. This table takes less than a second to be constructed for M=10^7.
I have made a C implementation that takes less than 10minutes to solve N=10^5 diferent random integer elements.
Here is the source code in C: To execute just do: gcc -O3 -o findgeo findgeo.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <memory.h>
#include <time.h>
struct Factor {
int a;
int k;
struct Factor *next;
};
struct Factor *factors = 0;
int factorsL=0;
void ConstructFactors(int R) {
int a,k,C;
int R2;
struct Factor *f;
float seconds;
clock_t end;
clock_t start = clock();
if (factors) free(factors);
factors = malloc (sizeof(struct Factor) *((R>>1) + 1));
R2 = R>>1 ;
for (a=0;a<=R2;a++) {
factors[a].a= a;
factors[a].k=1;
factors[a].next=NULL;
}
factorsL=R2+1;
R2 = floor(sqrt(R));
for (k=2; k<=R2; k++) {
a=1;
C=a*k*(k+1);
while (C<R) {
C >>= 1;
f=malloc(sizeof(struct Factor));
*f=factors[C];
factors[C].a=a;
factors[C].k=k;
factors[C].next=f;
a++;
C=a*k*(k+1);
}
}
end = clock();
seconds = (float)(end - start) / CLOCKS_PER_SEC;
printf("Construct Table: %f\n",seconds);
}
void DestructFactors() {
int i;
struct Factor *f;
for (i=0;i<factorsL;i++) {
while (factors[i].next) {
f=factors[i].next->next;
free(factors[i].next);
factors[i].next=f;
}
}
free(factors);
factors=NULL;
factorsL=0;
}
int ipow(int base, int exp)
{
int result = 1;
while (exp)
{
if (exp & 1)
result *= base;
exp >>= 1;
base *= base;
}
return result;
}
void findGeo(int **bestSolution, int *bestSolutionL,int *Arr, int L) {
int i,j,D;
int mustExistToBeBetter;
int R=Arr[L-1]-Arr[0];
int *possibleSolution;
int possibleSolutionL=0;
int exp;
int NextVal;
int idx;
int kMax,aMax;
float seconds;
clock_t end;
clock_t start = clock();
kMax = floor(sqrt(R));
aMax = floor(R/2);
ConstructFactors(R);
*bestSolutionL=2;
*bestSolution=malloc(0);
possibleSolution = malloc(sizeof(int)*(R+1));
struct Factor *f;
int *H=malloc(sizeof(int)*(R+1));
memset(H,0, sizeof(int)*(R+1));
for (i=0;i<L;i++) {
H[ Arr[i]-Arr[0] ]=1;
}
for (i=0; i<L-2;i++) {
for (j=i+2; j<L; j++) {
D=Arr[j]-Arr[i];
if (D & 1) continue;
f = factors + (D >>1);
while (f) {
idx=Arr[i] + f->a * f->k - Arr[0];
if ((f->k <= kMax)&& (f->a<aMax)&&(idx<=R)&&H[idx]) {
if (f->k ==1) {
mustExistToBeBetter = Arr[i] + f->a * (*bestSolutionL);
} else {
mustExistToBeBetter = Arr[i] + f->a * f->k * (ipow(f->k,*bestSolutionL) - 1)/(f->k-1);
}
if (mustExistToBeBetter< Arr[L-1]+1) {
idx= floor(mustExistToBeBetter - Arr[0]);
} else {
idx = R+1;
}
if ((idx<=R)&&H[idx]) {
possibleSolution[0]=Arr[i];
possibleSolution[1]=Arr[i] + f->a*f->k;
possibleSolution[2]=Arr[j];
possibleSolutionL=3;
exp = f->k * f->k * f->k;
NextVal = Arr[j] + f->a * exp;
idx=NextVal - Arr[0];
while ( (idx<=R) && H[idx]) {
possibleSolution[possibleSolutionL]=NextVal;
possibleSolutionL++;
exp = exp * f->k;
NextVal = NextVal + f->a * exp;
idx=NextVal - Arr[0];
}
if (possibleSolutionL > *bestSolutionL) {
free(*bestSolution);
*bestSolution = possibleSolution;
possibleSolution = malloc(sizeof(int)*(R+1));
*bestSolutionL=possibleSolutionL;
kMax= floor( pow (R, 1/ (*bestSolutionL) ));
aMax= floor(R / (*bestSolutionL));
}
}
}
f=f->next;
}
}
}
if (*bestSolutionL == 2) {
free(*bestSolution);
possibleSolutionL=0;
for (i=0; (i<2)&&(i<L); i++ ) {
possibleSolution[possibleSolutionL]=Arr[i];
possibleSolutionL++;
}
*bestSolution = possibleSolution;
*bestSolutionL=possibleSolutionL;
} else {
free(possibleSolution);
}
DestructFactors();
free(H);
end = clock();
seconds = (float)(end - start) / CLOCKS_PER_SEC;
printf("findGeo: %f\n",seconds);
}
int compareInt (const void * a, const void * b)
{
return *(int *)a - *(int *)b;
}
int main(void) {
int N=100000;
int R=10000000;
int *A = malloc(sizeof(int)*N);
int *Sol;
int SolL;
int i;
int *S=malloc(sizeof(int)*R);
for (i=0;i<R;i++) S[i]=i+1;
for (i=0;i<N;i++) {
int r = rand() % (R-i);
A[i]=S[r];
S[r]=S[R-i-1];
}
free(S);
qsort(A,N,sizeof(int),compareInt);
/*
int step = floor(R/N);
A[0]=1;
for (i=1;i<N;i++) {
A[i]=A[i-1]+step;
}
*/
findGeo(&Sol,&SolL,A,N);
printf("[");
for (i=0;i<SolL;i++) {
if (i>0) printf(",");
printf("%d",Sol[i]);
}
printf("]\n");
printf("Size: %d\n",SolL);
free(Sol);
free(A);
return EXIT_SUCCESS;
}
Demostration
I will try to demonstrate that the algorithm that I proposed is in average for an equally distributed random sequence. I’m not a mathematician and I am not used to do this kind of demonstrations, so please fill free to correct me any error that you can see.
There are 4 indented loops, the two firsts are the N^2 factor. The M is for the calculation of the possible factors table).
The third loop is executed only once in average for each pair. You can see this checking the size of the pre-calculated factors table. It’s size is M when N->inf. So the average steps for each pair is M/M=1.
So the proof happens to check that the forth loop. (The one that traverses the good made sequences is executed less that or equal O(N^2) for all the pairs.
To demonstrate that, I will consider two cases: one where M>>N and other where M ~= N. Where M is the maximum difference of the initial array: M= S(n)-S(1).
For the first case, (M>>N) the probability to find a coincidence is p=N/M. To start a sequence, it must coincide the second and the b+1 element where b is the length of the best sequence until now. So the loop will enter times. And the average length of this series (supposing an infinite series) is . So the total number of times that the loop will be executed is . And this is close to 0 when M>>N. The problem here is when M~=N.
Now lets consider this case where M~=N. Lets consider that b is the best sequence length until now. For the case A=k=1, then the sequence must start before N-b, so the number of sequences will be N-b, and the times that will go for the loop will be a maximum of (N-b)*b.
For A>1 and k=1 we can extrapolate to where d is M/N (the average distance between numbers). If we add for all A’s from 1 to dN/b then we see a top limit of:
For the cases where k>=2, we see that the sequence must start before , So the loop will enter an average of and adding for all As from 1 to dN/k^b, it gives a limit of
Here, the worst case is when b is minimum. Because we are considering minimum series, lets consider a very worst case of b= 2 so the number of passes for the 4th loop for a given k will be less than
.
And if we add all k’s from 2 to infinite will be:
So adding all the passes for k=1 and k>=2, we have a maximum of:
Note that d=M/N=1/p.
So we have two limits, One that goes to infinite when d=1/p=M/N goes to 1 and other that goes to infinite when d goes to infinite. So our limit is the minimum of both, and the worst case is when both equetions cross. So if we solve the equation:
we see that the maximum is when d=1.353
So it is demonstrated that the forth loops will be processed less than 1.55N^2 times in total.
Of course, this is for the average case. For the worst case I am not able to find a way to generate series whose forth loop are higher than O(N^2), and I strongly believe that they does not exist, but I am not a mathematician to prove it.
Old Answer
Here is a solution in average of O((n^2)*cube_root(M)) where M is the difference between the first and last element of the array. And memory requirements of O(M+N).
1.- Construct an array H of length M so that M[i - S[0]]=true if i exists in the initial array and false if it does not exist.
2.- For each pair in the array S[j], S[i] do:
2.1 Check if it can be the first and third elements of a possible solution. To do so, calculate all possible A,K pairs that meet the equation S(i) = S(j) + AK + AK^2. Check this SO question to see how to solve this problem. And check that exist the second element: S[i]+ A*K
2.2 Check also that exist the element one position further that the best solution that we have. For example, if the best solution that we have until now is 4 elements long then check that exist the element A[j] + AK + AK^2 + AK^3 + AK^4
2.3 If 2.1 and 2.2 are true, then iterate how long is this series and set as the bestSolution until now is is longer that the last.
Here is the code in javascript:
function getAKs(A) {
if (A / 2 != Math.floor(A / 2)) return [];
var solution = [];
var i;
var SR3 = Math.pow(A, 1 / 3);
for (i = 1; i <= SR3; i++) {
var B, C;
C = i;
B = A / (C * (C + 1));
if (B == Math.floor(B)) {
solution.push([B, C]);
}
B = i;
C = (-1 + Math.sqrt(1 + 4 * A / B)) / 2;
if (C == Math.floor(C)) {
solution.push([B, C]);
}
}
return solution;
}
function getBestGeometricSequence(S) {
var i, j, k;
var bestSolution = [];
var H = Array(S[S.length-1]-S[0]);
for (i = 0; i < S.length; i++) H[S[i] - S[0]] = true;
for (i = 0; i < S.length; i++) {
for (j = 0; j < i; j++) {
var PossibleAKs = getAKs(S[i] - S[j]);
for (k = 0; k < PossibleAKs.length; k++) {
var A = PossibleAKs[k][0];
var K = PossibleAKs[k][17];
var mustExistToBeBetter;
if (K==1) {
mustExistToBeBetter = S[j] + A * bestSolution.length;
} else {
mustExistToBeBetter = S[j] + A * K * (Math.pow(K,bestSolution.length) - 1)/(K-1);
}
if ((H[S[j] + A * K - S[0]]) && (H[mustExistToBeBetter - S[0]])) {
var possibleSolution=[S[j],S[j] + A * K,S[i]];
exp = K * K * K;
var NextVal = S[i] + A * exp;
while (H[NextVal - S[0]] === true) {
possibleSolution.push(NextVal);
exp = exp * K;
NextVal = NextVal + A * exp;
}
if (possibleSolution.length > bestSolution.length) {
bestSolution = possibleSolution;
}
}
}
}
}
return bestSolution;
}
//var A= [ 1, 2, 3,5,7, 15, 27, 30,31, 81];
var A=[];
for (i=1;i<=3000;i++) {
A.push(i);
}
var sol=getBestGeometricSequence(A);
$("#result").html(JSON.stringify(sol));
You can check the code here: http://jsfiddle.net/6yHyR/1/
I maintain the other solution because I believe that it is still better when M is very big compared to N.
Just to start with something, here is a simple solution in JavaScript:
var input = [0.7, 1, 2, 3, 4, 7, 15, 27, 30, 31, 81],
output = [], indexes, values, i, index, value, i_max_length,
i1, i2, i3, j1, j2, j3, difference12a, difference23a, difference12b, difference23b,
scale_factor, common_ratio_a, common_ratio_b, common_ratio_c,
error, EPSILON = 1e-9, common_ratio_is_integer,
resultDiv = $("#result");
for (i1 = 0; i1 < input.length - 2; ++i1) {
for (i2 = i1 + 1; i2 < input.length - 1; ++i2) {
scale_factor = difference12a = input[i2] - input[i1];
for (i3 = i2 + 1; i3 < input.length; ++i3) {
difference23a = input[i3] - input[i2];
common_ratio_1a = difference23a / difference12a;
common_ratio_2a = Math.round(common_ratio_1a);
error = Math.abs((common_ratio_2a - common_ratio_1a) / common_ratio_1a);
common_ratio_is_integer = error < EPSILON;
if (common_ratio_2a > 1 && common_ratio_is_integer) {
indexes = [i1, i2, i3];
j1 = i2;
j2 = i3
difference12b = difference23a;
for (j3 = j2 + 1; j3 < input.length; ++j3) {
difference23b = input[j3] - input[j2];
common_ratio_1b = difference23b / difference12b;
common_ratio_2b = Math.round(common_ratio_1b);
error = Math.abs((common_ratio_2b - common_ratio_1b) / common_ratio_1b);
common_ratio_is_integer = error < EPSILON;
if (common_ratio_is_integer && common_ratio_2a === common_ratio_2b) {
indexes.push(j3);
j1 = j2;
j2 = j3
difference12b = difference23b;
}
}
values = [];
for (i = 0; i < indexes.length; ++i) {
index = indexes[i];
value = input[index];
values.push(value);
}
output.push(values);
}
}
}
}
if (output !== []) {
i_max_length = 0;
for (i = 1; i < output.length; ++i) {
if (output[i_max_length].length < output[i].length)
i_max_length = i;
}
for (i = 0; i < output.length; ++i) {
if (output[i_max_length].length == output[i].length)
resultDiv.append("<p>[" + output[i] + "]</p>");
}
}
Output:
[1, 3, 7, 15, 31]
I find the first three items of every subsequence candidate, calculate the scale factor and the common ratio from them, and if the common ratio is integer, then I iterate over the remaining elements after the third one, and add those to the subsequence, which fit into the geometric progression defined by the first three items. As a last step, I select the sebsequence/s which has/have the largest length.
In fact it is exactly the same question as Longest equally-spaced subsequence, you just have to consider the logarithm of your data. If the sequence is a, ak, ak^2, ak^3, the logarithmique value is ln(a), ln(a) + ln(k), ln(a)+2ln(k), ln(a)+3ln(k), so it is equally spaced. The opposite is of course true. There is a lot of different code in the question above.
I don't think the special case a=1 can be resolved more efficiently than an adaptation from an algorithm above.
Here is my solution in Javascript. It should be close to O(n^2) except may be in some pathological cases.
function bsearch(Arr,Val, left,right) {
if (left == right) return left;
var m=Math.floor((left + right) /2);
if (Val <= Arr[m]) {
return bsearch(Arr,Val,left,m);
} else {
return bsearch(Arr,Val,m+1,right);
}
}
function findLongestGeometricSequence(S) {
var bestSolution=[];
var i,j,k;
var H={};
for (i=0;i<S.length;i++) H[S[i]]=true;
for (i=0;i<S.length;i++) {
for (j=0;j<i;j++) {
for (k=j+1;k<i;) {
var possibleSolution=[S[j],S[k],S[i]];
var K = (S[i] - S[k]) / (S[k] - S[j]);
var A = (S[k] - S[j]) * (S[k] - S[j]) / (S[i] - S[k]);
if ((Math.floor(K) == K) && (Math.floor(A)==A)) {
exp= K*K*K;
var NextVal= S[i] + A * exp;
while (H[NextVal] === true) {
possibleSolution.push(NextVal);
exp = exp * K;
NextVal= NextVal + A * exp;
}
if (possibleSolution.length > bestSolution.length)
bestSolution=possibleSolution;
K--;
} else {
K=Math.floor(K);
}
if (K>0) {
var NextPossibleMidValue= (S[i] + K*S[j]) / (K +1);
k++;
if (S[k]<NextPossibleMidValue) {
k=bsearch(S,NextPossibleMidValue, k+1, i);
}
} else {
k=i;
}
}
}
}
return bestSolution;
}
function Run() {
var MyS= [0.7, 1, 2, 3, 4, 5,6,7, 15, 27, 30,31, 81];
var sol = findLongestGeometricSequence(MyS);
alert(JSON.stringify(sol));
}
Small Explanation
If we take 3 numbers of the array S(j) < S(k) < S(i) then you can calculate a and k so that: S(k) = S(j) + a*k and S(i) = S(k) + a*k^2 (2 equations and 2 incognits). With that in mind, you can check if exist a number in the array that is S(next) = S(i) + a*k^3. If that is the case, then continue checknng for S(next2) = S(next) + a*k^4 and so on.
This would be a O(n^3) solution, but you can hava advantage that k must be integer in order to limit the S(k) points selected.
In case that a is known, then you can calculate a(k) and you need to check only one number in the third loop, so this case will be clearly a O(n^2).
I think this task is related with not so long ago posted Longest equally-spaced subsequence. I've just modified my algorithm in Python a little bit:
from math import sqrt
def add_precalc(precalc, end, (a, k), count, res, N):
if end + a * k ** res[1]["count"] > N: return
x = end + a * k ** count
if x > N or x < 0: return
if precalc[x] is None: return
if (a, k) not in precalc[x]:
precalc[x][(a, k)] = count
return
def factors(n):
res = []
for x in range(1, int(sqrt(n)) + 1):
if n % x == 0:
y = n / x
res.append((x, y))
res.append((y, x))
return res
def work(input):
precalc = [None] * (max(input) + 1)
for x in input: precalc[x] = {}
N = max(input)
res = ((0, 0), {"end":0, "count":0})
for i, x in enumerate(input):
for y in input[i::-1]:
for a, k in factors(x - y):
if (a, k) in precalc[x]: continue
add_precalc(precalc, x, (a, k), 2, res, N)
for step, count in precalc[x].iteritems():
count += 1
if count > res[1]["count"]: res = (step, {"end":x, "count":count})
add_precalc(precalc, x, step, count, res, N)
precalc[x] = None
d = [res[1]["end"]]
for x in range(res[1]["count"] - 1, 0, -1):
d.append(d[-1] - res[0][0] * res[0][1] ** x)
d.reverse()
return d
explanation
Traversing the array
For each previous element of the array calculate factors of the difference between current and taken previous element and then precalculate next possible element of the sequence and saving it to precalc array
So when arriving at element i there're already all possible sequences with element i in the precalc array, so we have to calculate next possible element and save it to precalc.
Currently there's one place in algorithm that could be slow - factorization of each previous number. I think it could be made faster with two optimizations:
more effective factorization algorithm
find a way not to see at each element of array, using the fact that array is sorted and there's already a precalculated sequences
Python:
def subseq(a):
seq = []
aset = set(a)
for i, x in enumerate(a):
# elements after x
for j, x2 in enumerate(a[i+1:]):
j += i + 1 # enumerate starts j at 0, we want a[j] = x2
bk = x2 - x # b*k (assuming k and k's exponent start at 1)
# given b*k, bruteforce values of k
for k in range(1, bk + 1):
items = [x, x2] # our subsequence so far
nextdist = bk * k # what x3 - x2 should look like
while items[-1] + nextdist in aset:
items.append(items[-1] + nextdist)
nextdist *= k
if len(items) > len(seq):
seq = items
return seq
Running time is O(dn^3), where d is the (average?) distance between two elements,
and n is of course len(a).

Algorithm to return all combinations of k elements from n

I want to write a function that takes an array of letters as an argument and a number of those letters to select.
Say you provide an array of 8 letters and want to select 3 letters from that. Then you should get:
8! / ((8 - 3)! * 3!) = 56
Arrays (or words) in return consisting of 3 letters each.
Art of Computer Programming Volume 4: Fascicle 3 has a ton of these that might fit your particular situation better than how I describe.
Gray Codes
An issue that you will come across is of course memory and pretty quickly, you'll have problems by 20 elements in your set -- 20C3 = 1140. And if you want to iterate over the set it's best to use a modified gray code algorithm so you aren't holding all of them in memory. These generate the next combination from the previous and avoid repetitions. There are many of these for different uses. Do we want to maximize the differences between successive combinations? minimize? et cetera.
Some of the original papers describing gray codes:
Some Hamilton Paths and a Minimal Change Algorithm
Adjacent Interchange Combination Generation Algorithm
Here are some other papers covering the topic:
An Efficient Implementation of the Eades, Hickey, Read Adjacent Interchange Combination Generation Algorithm (PDF, with code in Pascal)
Combination Generators
Survey of Combinatorial Gray Codes (PostScript)
An Algorithm for Gray Codes
Chase's Twiddle (algorithm)
Phillip J Chase, `Algorithm 382: Combinations of M out of N Objects' (1970)
The algorithm in C...
Index of Combinations in Lexicographical Order (Buckles Algorithm 515)
You can also reference a combination by its index (in lexicographical order). Realizing that the index should be some amount of change from right to left based on the index we can construct something that should recover a combination.
So, we have a set {1,2,3,4,5,6}... and we want three elements. Let's say {1,2,3} we can say that the difference between the elements is one and in order and minimal. {1,2,4} has one change and is lexicographically number 2. So the number of 'changes' in the last place accounts for one change in the lexicographical ordering. The second place, with one change {1,3,4} has one change but accounts for more change since it's in the second place (proportional to the number of elements in the original set).
The method I've described is a deconstruction, as it seems, from set to the index, we need to do the reverse – which is much trickier. This is how Buckles solves the problem. I wrote some C to compute them, with minor changes – I used the index of the sets rather than a number range to represent the set, so we are always working from 0...n.
Note:
Since combinations are unordered, {1,3,2} = {1,2,3} --we order them to be lexicographical.
This method has an implicit 0 to start the set for the first difference.
Index of Combinations in Lexicographical Order (McCaffrey)
There is another way:, its concept is easier to grasp and program but it's without the optimizations of Buckles. Fortunately, it also does not produce duplicate combinations:
The set that maximizes , where .
For an example: 27 = C(6,4) + C(5,3) + C(2,2) + C(1,1). So, the 27th lexicographical combination of four things is: {1,2,5,6}, those are the indexes of whatever set you want to look at. Example below (OCaml), requires choose function, left to reader:
(* this will find the [x] combination of a [set] list when taking [k] elements *)
let combination_maccaffery set k x =
(* maximize function -- maximize a that is aCb *)
(* return largest c where c < i and choose(c,i) <= z *)
let rec maximize a b x =
if (choose a b ) <= x then a else maximize (a-1) b x
in
let rec iterate n x i = match i with
| 0 -> []
| i ->
let max = maximize n i x in
max :: iterate n (x - (choose max i)) (i-1)
in
if x < 0 then failwith "errors" else
let idxs = iterate (List.length set) x k in
List.map (List.nth set) (List.sort (-) idxs)
A small and simple combinations iterator
The following two algorithms are provided for didactic purposes. They implement an iterator and (a more general) folder overall combinations.
They are as fast as possible, having the complexity O(nCk). The memory consumption is bound by k.
We will start with the iterator, which will call a user provided function for each combination
let iter_combs n k f =
let rec iter v s j =
if j = k then f v
else for i = s to n - 1 do iter (i::v) (i+1) (j+1) done in
iter [] 0 0
A more general version will call the user provided function along with the state variable, starting from the initial state. Since we need to pass the state between different states we won't use the for-loop, but instead, use recursion,
let fold_combs n k f x =
let rec loop i s c x =
if i < n then
loop (i+1) s c ##
let c = i::c and s = s + 1 and i = i + 1 in
if s < k then loop i s c x else f c x
else x in
loop 0 0 [] x
In C#:
public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int k)
{
return k == 0 ? new[] { new T[0] } :
elements.SelectMany((e, i) =>
elements.Skip(i + 1).Combinations(k - 1).Select(c => (new[] {e}).Concat(c)));
}
Usage:
var result = Combinations(new[] { 1, 2, 3, 4, 5 }, 3);
Result:
123
124
125
134
135
145
234
235
245
345
Short java solution:
import java.util.Arrays;
public class Combination {
public static void main(String[] args){
String[] arr = {"A","B","C","D","E","F"};
combinations2(arr, 3, 0, new String[3]);
}
static void combinations2(String[] arr, int len, int startPosition, String[] result){
if (len == 0){
System.out.println(Arrays.toString(result));
return;
}
for (int i = startPosition; i <= arr.length-len; i++){
result[result.length - len] = arr[i];
combinations2(arr, len-1, i+1, result);
}
}
}
Result will be
[A, B, C]
[A, B, D]
[A, B, E]
[A, B, F]
[A, C, D]
[A, C, E]
[A, C, F]
[A, D, E]
[A, D, F]
[A, E, F]
[B, C, D]
[B, C, E]
[B, C, F]
[B, D, E]
[B, D, F]
[B, E, F]
[C, D, E]
[C, D, F]
[C, E, F]
[D, E, F]
May I present my recursive Python solution to this problem?
def choose_iter(elements, length):
for i in xrange(len(elements)):
if length == 1:
yield (elements[i],)
else:
for next in choose_iter(elements[i+1:], length-1):
yield (elements[i],) + next
def choose(l, k):
return list(choose_iter(l, k))
Example usage:
>>> len(list(choose_iter("abcdefgh",3)))
56
I like it for its simplicity.
Lets say your array of letters looks like this: "ABCDEFGH". You have three indices (i, j, k) indicating which letters you are going to use for the current word, You start with:
A B C D E F G H
^ ^ ^
i j k
First you vary k, so the next step looks like that:
A B C D E F G H
^ ^ ^
i j k
If you reached the end you go on and vary j and then k again.
A B C D E F G H
^ ^ ^
i j k
A B C D E F G H
^ ^ ^
i j k
Once you j reached G you start also to vary i.
A B C D E F G H
^ ^ ^
i j k
A B C D E F G H
^ ^ ^
i j k
...
Written in code this look something like that
void print_combinations(const char *string)
{
int i, j, k;
int len = strlen(string);
for (i = 0; i < len - 2; i++)
{
for (j = i + 1; j < len - 1; j++)
{
for (k = j + 1; k < len; k++)
printf("%c%c%c\n", string[i], string[j], string[k]);
}
}
}
The following recursive algorithm picks all of the k-element combinations from an ordered set:
choose the first element i of your combination
combine i with each of the combinations of k-1 elements chosen recursively from the set of elements larger than i.
Iterate the above for each i in the set.
It is essential that you pick the rest of the elements as larger than i, to avoid repetition. This way [3,5] will be picked only once, as [3] combined with [5], instead of twice (the condition eliminates [5] + [3]). Without this condition you get variations instead of combinations.
Short example in Python:
def comb(sofar, rest, n):
if n == 0:
print sofar
else:
for i in range(len(rest)):
comb(sofar + rest[i], rest[i+1:], n-1)
>>> comb("", "abcde", 3)
abc
abd
abe
acd
ace
ade
bcd
bce
bde
cde
For explanation, the recursive method is described with the following example:
Example: A B C D E
All combinations of 3 would be:
A with all combinations of 2 from the rest (B C D E)
B with all combinations of 2 from the rest (C D E)
C with all combinations of 2 from the rest (D E)
I found this thread useful and thought I would add a Javascript solution that you can pop into Firebug. Depending on your JS engine, it could take a little time if the starting string is large.
function string_recurse(active, rest) {
if (rest.length == 0) {
console.log(active);
} else {
string_recurse(active + rest.charAt(0), rest.substring(1, rest.length));
string_recurse(active, rest.substring(1, rest.length));
}
}
string_recurse("", "abc");
The output should be as follows:
abc
ab
ac
a
bc
b
c
In C++ the following routine will produce all combinations of length distance(first,k) between the range [first,last):
#include <algorithm>
template <typename Iterator>
bool next_combination(const Iterator first, Iterator k, const Iterator last)
{
/* Credits: Mark Nelson http://marknelson.us */
if ((first == last) || (first == k) || (last == k))
return false;
Iterator i1 = first;
Iterator i2 = last;
++i1;
if (last == i1)
return false;
i1 = last;
--i1;
i1 = k;
--i2;
while (first != i1)
{
if (*--i1 < *i2)
{
Iterator j = k;
while (!(*i1 < *j)) ++j;
std::iter_swap(i1,j);
++i1;
++j;
i2 = k;
std::rotate(i1,j,last);
while (last != j)
{
++j;
++i2;
}
std::rotate(k,i2,last);
return true;
}
}
std::rotate(first,k,last);
return false;
}
It can be used like this:
#include <string>
#include <iostream>
int main()
{
std::string s = "12345";
std::size_t comb_size = 3;
do
{
std::cout << std::string(s.begin(), s.begin() + comb_size) << std::endl;
} while (next_combination(s.begin(), s.begin() + comb_size, s.end()));
return 0;
}
This will print the following:
123
124
125
134
135
145
234
235
245
345
static IEnumerable<string> Combinations(List<string> characters, int length)
{
for (int i = 0; i < characters.Count; i++)
{
// only want 1 character, just return this one
if (length == 1)
yield return characters[i];
// want more than one character, return this one plus all combinations one shorter
// only use characters after the current one for the rest of the combinations
else
foreach (string next in Combinations(characters.GetRange(i + 1, characters.Count - (i + 1)), length - 1))
yield return characters[i] + next;
}
}
Simple recursive algorithm in Haskell
import Data.List
combinations 0 lst = [[]]
combinations n lst = do
(x:xs) <- tails lst
rest <- combinations (n-1) xs
return $ x : rest
We first define the special case, i.e. selecting zero elements. It produces a single result, which is an empty list (i.e. a list that contains an empty list).
For n > 0, x goes through every element of the list and xs is every element after x.
rest picks n - 1 elements from xs using a recursive call to combinations. The final result of the function is a list where each element is x : rest (i.e. a list which has x as head and rest as tail) for every different value of x and rest.
> combinations 3 "abcde"
["abc","abd","abe","acd","ace","ade","bcd","bce","bde","cde"]
And of course, since Haskell is lazy, the list is gradually generated as needed, so you can partially evaluate exponentially large combinations.
> let c = combinations 8 "abcdefghijklmnopqrstuvwxyz"
> take 10 c
["abcdefgh","abcdefgi","abcdefgj","abcdefgk","abcdefgl","abcdefgm","abcdefgn",
"abcdefgo","abcdefgp","abcdefgq"]
And here comes granddaddy COBOL, the much maligned language.
Let's assume an array of 34 elements of 8 bytes each (purely arbitrary selection.) The idea is to enumerate all possible 4-element combinations and load them into an array.
We use 4 indices, one each for each position in the group of 4
The array is processed like this:
idx1 = 1
idx2 = 2
idx3 = 3
idx4 = 4
We vary idx4 from 4 to the end. For each idx4 we get a unique combination
of groups of four. When idx4 comes to the end of the array, we increment idx3 by 1 and set idx4 to idx3+1. Then we run idx4 to the end again. We proceed in this manner, augmenting idx3,idx2, and idx1 respectively until the position of idx1 is less than 4 from the end of the array. That finishes the algorithm.
1 --- pos.1
2 --- pos 2
3 --- pos 3
4 --- pos 4
5
6
7
etc.
First iterations:
1234
1235
1236
1237
1245
1246
1247
1256
1257
1267
etc.
A COBOL example:
01 DATA_ARAY.
05 FILLER PIC X(8) VALUE "VALUE_01".
05 FILLER PIC X(8) VALUE "VALUE_02".
etc.
01 ARAY_DATA OCCURS 34.
05 ARAY_ITEM PIC X(8).
01 OUTPUT_ARAY OCCURS 50000 PIC X(32).
01 MAX_NUM PIC 99 COMP VALUE 34.
01 INDEXXES COMP.
05 IDX1 PIC 99.
05 IDX2 PIC 99.
05 IDX3 PIC 99.
05 IDX4 PIC 99.
05 OUT_IDX PIC 9(9).
01 WHERE_TO_STOP_SEARCH PIC 99 COMP.
* Stop the search when IDX1 is on the third last array element:
COMPUTE WHERE_TO_STOP_SEARCH = MAX_VALUE - 3
MOVE 1 TO IDX1
PERFORM UNTIL IDX1 > WHERE_TO_STOP_SEARCH
COMPUTE IDX2 = IDX1 + 1
PERFORM UNTIL IDX2 > MAX_NUM
COMPUTE IDX3 = IDX2 + 1
PERFORM UNTIL IDX3 > MAX_NUM
COMPUTE IDX4 = IDX3 + 1
PERFORM UNTIL IDX4 > MAX_NUM
ADD 1 TO OUT_IDX
STRING ARAY_ITEM(IDX1)
ARAY_ITEM(IDX2)
ARAY_ITEM(IDX3)
ARAY_ITEM(IDX4)
INTO OUTPUT_ARAY(OUT_IDX)
ADD 1 TO IDX4
END-PERFORM
ADD 1 TO IDX3
END-PERFORM
ADD 1 TO IDX2
END_PERFORM
ADD 1 TO IDX1
END-PERFORM.
Another C# version with lazy generation of the combination indices. This version maintains a single array of indices to define a mapping between the list of all values and the values for the current combination, i.e. constantly uses O(k) additional space during the entire runtime. The code generates individual combinations, including the first one, in O(k) time.
public static IEnumerable<T[]> Combinations<T>(this T[] values, int k)
{
if (k < 0 || values.Length < k)
yield break; // invalid parameters, no combinations possible
// generate the initial combination indices
var combIndices = new int[k];
for (var i = 0; i < k; i++)
{
combIndices[i] = i;
}
while (true)
{
// return next combination
var combination = new T[k];
for (var i = 0; i < k; i++)
{
combination[i] = values[combIndices[i]];
}
yield return combination;
// find first index to update
var indexToUpdate = k - 1;
while (indexToUpdate >= 0 && combIndices[indexToUpdate] >= values.Length - k + indexToUpdate)
{
indexToUpdate--;
}
if (indexToUpdate < 0)
yield break; // done
// update combination indices
for (var combIndex = combIndices[indexToUpdate] + 1; indexToUpdate < k; indexToUpdate++, combIndex++)
{
combIndices[indexToUpdate] = combIndex;
}
}
}
Test code:
foreach (var combination in new[] {'a', 'b', 'c', 'd', 'e'}.Combinations(3))
{
System.Console.WriteLine(String.Join(" ", combination));
}
Output:
a b c
a b d
a b e
a c d
a c e
a d e
b c d
b c e
b d e
c d e
Here is an elegant, generic implementation in Scala, as described on 99 Scala Problems.
object P26 {
def flatMapSublists[A,B](ls: List[A])(f: (List[A]) => List[B]): List[B] =
ls match {
case Nil => Nil
case sublist#(_ :: tail) => f(sublist) ::: flatMapSublists(tail)(f)
}
def combinations[A](n: Int, ls: List[A]): List[List[A]] =
if (n == 0) List(Nil)
else flatMapSublists(ls) { sl =>
combinations(n - 1, sl.tail) map {sl.head :: _}
}
}
If you can use SQL syntax - say, if you're using LINQ to access fields of an structure or array, or directly accessing a database that has a table called "Alphabet" with just one char field "Letter", you can adapt following code:
SELECT A.Letter, B.Letter, C.Letter
FROM Alphabet AS A, Alphabet AS B, Alphabet AS C
WHERE A.Letter<>B.Letter AND A.Letter<>C.Letter AND B.Letter<>C.Letter
AND A.Letter<B.Letter AND B.Letter<C.Letter
This will return all combinations of 3 letters, notwithstanding how many letters you have in table "Alphabet" (it can be 3, 8, 10, 27, etc.).
If what you want is all permutations, rather than combinations (i.e. you want "ACB" and "ABC" to count as different, rather than appear just once) just delete the last line (the AND one) and it's done.
Post-Edit: After re-reading the question, I realise what's needed is the general algorithm, not just a specific one for the case of selecting 3 items. Adam Hughes' answer is the complete one, unfortunately I cannot vote it up (yet). This answer's simple but works only for when you want exactly 3 items.
I had a permutation algorithm I used for project euler, in python:
def missing(miss,src):
"Returns the list of items in src not present in miss"
return [i for i in src if i not in miss]
def permutation_gen(n,l):
"Generates all the permutations of n items of the l list"
for i in l:
if n<=1: yield [i]
r = [i]
for j in permutation_gen(n-1,missing([i],l)): yield r+j
If
n<len(l)
you should have all combination you need without repetition, do you need it?
It is a generator, so you use it in something like this:
for comb in permutation_gen(3,list("ABCDEFGH")):
print comb
https://gist.github.com/3118596
There is an implementation for JavaScript. It has functions to get k-combinations and all combinations of an array of any objects. Examples:
k_combinations([1,2,3], 2)
-> [[1,2], [1,3], [2,3]]
combinations([1,2,3])
-> [[1],[2],[3],[1,2],[1,3],[2,3],[1,2,3]]
Lets say your array of letters looks like this: "ABCDEFGH". You have three indices (i, j, k) indicating which letters you are going to use for the current word, You start with:
A B C D E F G H
^ ^ ^
i j k
First you vary k, so the next step looks like that:
A B C D E F G H
^ ^ ^
i j k
If you reached the end you go on and vary j and then k again.
A B C D E F G H
^ ^ ^
i j k
A B C D E F G H
^ ^ ^
i j k
Once you j reached G you start also to vary i.
A B C D E F G H
^ ^ ^
i j k
A B C D E F G H
^ ^ ^
i j k
...
function initializePointers($cnt) {
$pointers = [];
for($i=0; $i<$cnt; $i++) {
$pointers[] = $i;
}
return $pointers;
}
function incrementPointers(&$pointers, &$arrLength) {
for($i=0; $i<count($pointers); $i++) {
$currentPointerIndex = count($pointers) - $i - 1;
$currentPointer = $pointers[$currentPointerIndex];
if($currentPointer < $arrLength - $i - 1) {
++$pointers[$currentPointerIndex];
for($j=1; ($currentPointerIndex+$j)<count($pointers); $j++) {
$pointers[$currentPointerIndex+$j] = $pointers[$currentPointerIndex]+$j;
}
return true;
}
}
return false;
}
function getDataByPointers(&$arr, &$pointers) {
$data = [];
for($i=0; $i<count($pointers); $i++) {
$data[] = $arr[$pointers[$i]];
}
return $data;
}
function getCombinations($arr, $cnt)
{
$len = count($arr);
$result = [];
$pointers = initializePointers($cnt);
do {
$result[] = getDataByPointers($arr, $pointers);
} while(incrementPointers($pointers, count($arr)));
return $result;
}
$result = getCombinations([0, 1, 2, 3, 4, 5], 3);
print_r($result);
Based on https://stackoverflow.com/a/127898/2628125, but more abstract, for any size of pointers.
Here you have a lazy evaluated version of that algorithm coded in C#:
static bool nextCombination(int[] num, int n, int k)
{
bool finished, changed;
changed = finished = false;
if (k > 0)
{
for (int i = k - 1; !finished && !changed; i--)
{
if (num[i] < (n - 1) - (k - 1) + i)
{
num[i]++;
if (i < k - 1)
{
for (int j = i + 1; j < k; j++)
{
num[j] = num[j - 1] + 1;
}
}
changed = true;
}
finished = (i == 0);
}
}
return changed;
}
static IEnumerable Combinations<T>(IEnumerable<T> elements, int k)
{
T[] elem = elements.ToArray();
int size = elem.Length;
if (k <= size)
{
int[] numbers = new int[k];
for (int i = 0; i < k; i++)
{
numbers[i] = i;
}
do
{
yield return numbers.Select(n => elem[n]);
}
while (nextCombination(numbers, size, k));
}
}
And test part:
static void Main(string[] args)
{
int k = 3;
var t = new[] { "dog", "cat", "mouse", "zebra"};
foreach (IEnumerable<string> i in Combinations(t, k))
{
Console.WriteLine(string.Join(",", i));
}
}
Hope this help you!
Another version, that forces all the first k to appear firstly, then all the first k+1 combinations, then all the first k+2 etc.. It means that if you have sorted array, the most important on the top, it would take them and expand gradually to the next ones - only when it is must do so.
private static bool NextCombinationFirstsAlwaysFirst(int[] num, int n, int k)
{
if (k > 1 && NextCombinationFirstsAlwaysFirst(num, num[k - 1], k - 1))
return true;
if (num[k - 1] + 1 == n)
return false;
++num[k - 1];
for (int i = 0; i < k - 1; ++i)
num[i] = i;
return true;
}
For instance, if you run the first method ("nextCombination") on k=3, n=5 you'll get:
0 1 2
0 1 3
0 1 4
0 2 3
0 2 4
0 3 4
1 2 3
1 2 4
1 3 4
2 3 4
But if you'll run
int[] nums = new int[k];
for (int i = 0; i < k; ++i)
nums[i] = i;
do
{
Console.WriteLine(string.Join(" ", nums));
}
while (NextCombinationFirstsAlwaysFirst(nums, n, k));
You'll get this (I added empty lines for clarity):
0 1 2
0 1 3
0 2 3
1 2 3
0 1 4
0 2 4
1 2 4
0 3 4
1 3 4
2 3 4
It's adding "4" only when must to, and also after "4" was added it adds "3" again only when it must to (after doing 01, 02, 12).
Array.prototype.combs = function(num) {
var str = this,
length = str.length,
of = Math.pow(2, length) - 1,
out, combinations = [];
while(of) {
out = [];
for(var i = 0, y; i < length; i++) {
y = (1 << i);
if(y & of && (y !== of))
out.push(str[i]);
}
if (out.length >= num) {
combinations.push(out);
}
of--;
}
return combinations;
}
Clojure version:
(defn comb [k l]
(if (= 1 k) (map vector l)
(apply concat
(map-indexed
#(map (fn [x] (conj x %2))
(comb (dec k) (drop (inc %1) l)))
l))))
Algorithm:
Count from 1 to 2^n.
Convert each digit to its binary representation.
Translate each 'on' bit to elements of your set, based on position.
In C#:
void Main()
{
var set = new [] {"A", "B", "C", "D" }; //, "E", "F", "G", "H", "I", "J" };
var kElement = 2;
for(var i = 1; i < Math.Pow(2, set.Length); i++) {
var result = Convert.ToString(i, 2).PadLeft(set.Length, '0');
var cnt = Regex.Matches(Regex.Escape(result), "1").Count;
if (cnt == kElement) {
for(int j = 0; j < set.Length; j++)
if ( Char.GetNumericValue(result[j]) == 1)
Console.Write(set[j]);
Console.WriteLine();
}
}
}
Why does it work?
There is a bijection between the subsets of an n-element set and n-bit sequences.
That means we can figure out how many subsets there are by counting sequences.
e.g., the four element set below can be represented by {0,1} X {0, 1} X {0, 1} X {0, 1} (or 2^4) different sequences.
So - all we have to do is count from 1 to 2^n to find all the combinations. (We ignore the empty set.) Next, translate the digits to their binary representation. Then substitute elements of your set for 'on' bits.
If you want only k element results, only print when k bits are 'on'.
(If you want all subsets instead of k length subsets, remove the cnt/kElement part.)
(For proof, see MIT free courseware Mathematics for Computer Science, Lehman et al, section 11.2.2. https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-042j-mathematics-for-computer-science-fall-2010/readings/ )
short python code, yielding index positions
def yield_combos(n,k):
# n is set size, k is combo size
i = 0
a = [0]*k
while i > -1:
for j in range(i+1, k):
a[j] = a[j-1]+1
i=j
yield a
while a[i] == i + n - k:
i -= 1
a[i] += 1
All said and and done here comes the O'caml code for that.
Algorithm is evident from the code..
let combi n lst =
let rec comb l c =
if( List.length c = n) then [c] else
match l with
[] -> []
| (h::t) -> (combi t (h::c))#(combi t c)
in
combi lst []
;;
Here is a method which gives you all combinations of specified size from a random length string. Similar to quinmars' solution, but works for varied input and k.
The code can be changed to wrap around, ie 'dab' from input 'abcd' w k=3.
public void run(String data, int howMany){
choose(data, howMany, new StringBuffer(), 0);
}
//n choose k
private void choose(String data, int k, StringBuffer result, int startIndex){
if (result.length()==k){
System.out.println(result.toString());
return;
}
for (int i=startIndex; i<data.length(); i++){
result.append(data.charAt(i));
choose(data,k,result, i+1);
result.setLength(result.length()-1);
}
}
Output for "abcde":
abc abd abe acd ace ade bcd bce bde cde
Short javascript version (ES 5)
let combine = (list, n) =>
n == 0 ?
[[]] :
list.flatMap((e, i) =>
combine(
list.slice(i + 1),
n - 1
).map(c => [e].concat(c))
);
let res = combine([1,2,3,4], 3);
res.forEach(e => console.log(e.join()));
Another python recusive solution.
def combination_indicies(n, k, j = 0, stack = []):
if len(stack) == k:
yield list(stack)
return
for i in range(j, n):
stack.append(i)
for x in combination_indicies(n, k, i + 1, stack):
yield x
stack.pop()
list(combination_indicies(5, 3))
Output:
[[0, 1, 2],
[0, 1, 3],
[0, 1, 4],
[0, 2, 3],
[0, 2, 4],
[0, 3, 4],
[1, 2, 3],
[1, 2, 4],
[1, 3, 4],
[2, 3, 4]]
I created a solution in SQL Server 2005 for this, and posted it on my website: http://www.jessemclain.com/downloads/code/sql/fn_GetMChooseNCombos.sql.htm
Here is an example to show usage:
SELECT * FROM dbo.fn_GetMChooseNCombos('ABCD', 2, '')
results:
Word
----
AB
AC
AD
BC
BD
CD
(6 row(s) affected)
Here is my proposition in C++
I tried to impose as little restriction on the iterator type as i could so this solution assumes just forward iterator, and it can be a const_iterator. This should work with any standard container. In cases where arguments don't make sense it throws std::invalid_argumnent
#include <vector>
#include <stdexcept>
template <typename Fci> // Fci - forward const iterator
std::vector<std::vector<Fci> >
enumerate_combinations(Fci begin, Fci end, unsigned int combination_size)
{
if(begin == end && combination_size > 0u)
throw std::invalid_argument("empty set and positive combination size!");
std::vector<std::vector<Fci> > result; // empty set of combinations
if(combination_size == 0u) return result; // there is exactly one combination of
// size 0 - emty set
std::vector<Fci> current_combination;
current_combination.reserve(combination_size + 1u); // I reserve one aditional slot
// in my vector to store
// the end sentinel there.
// The code is cleaner thanks to that
for(unsigned int i = 0u; i < combination_size && begin != end; ++i, ++begin)
{
current_combination.push_back(begin); // Construction of the first combination
}
// Since I assume the itarators support only incrementing, I have to iterate over
// the set to get its size, which is expensive. Here I had to itrate anyway to
// produce the first cobination, so I use the loop to also check the size.
if(current_combination.size() < combination_size)
throw std::invalid_argument("combination size > set size!");
result.push_back(current_combination); // Store the first combination in the results set
current_combination.push_back(end); // Here I add mentioned earlier sentinel to
// simplyfy rest of the code. If I did it
// earlier, previous statement would get ugly.
while(true)
{
unsigned int i = combination_size;
Fci tmp; // Thanks to the sentinel I can find first
do // iterator to change, simply by scaning
{ // from right to left and looking for the
tmp = current_combination[--i]; // first "bubble". The fact, that it's
++tmp; // a forward iterator makes it ugly but I
} // can't help it.
while(i > 0u && tmp == current_combination[i + 1u]);
// Here is probably my most obfuscated expression.
// Loop above looks for a "bubble". If there is no "bubble", that means, that
// current_combination is the last combination, Expression in the if statement
// below evaluates to true and the function exits returning result.
// If the "bubble" is found however, the ststement below has a sideeffect of
// incrementing the first iterator to the left of the "bubble".
if(++current_combination[i] == current_combination[i + 1u])
return result;
// Rest of the code sets posiotons of the rest of the iterstors
// (if there are any), that are to the right of the incremented one,
// to form next combination
while(++i < combination_size)
{
current_combination[i] = current_combination[i - 1u];
++current_combination[i];
}
// Below is the ugly side of using the sentinel. Well it had to haave some
// disadvantage. Try without it.
result.push_back(std::vector<Fci>(current_combination.begin(),
current_combination.end() - 1));
}
}
Here is a code I recently wrote in Java, which calculates and returns all the combination of "num" elements from "outOf" elements.
// author: Sourabh Bhat (heySourabh#gmail.com)
public class Testing
{
public static void main(String[] args)
{
// Test case num = 5, outOf = 8.
int num = 5;
int outOf = 8;
int[][] combinations = getCombinations(num, outOf);
for (int i = 0; i < combinations.length; i++)
{
for (int j = 0; j < combinations[i].length; j++)
{
System.out.print(combinations[i][j] + " ");
}
System.out.println();
}
}
private static int[][] getCombinations(int num, int outOf)
{
int possibilities = get_nCr(outOf, num);
int[][] combinations = new int[possibilities][num];
int arrayPointer = 0;
int[] counter = new int[num];
for (int i = 0; i < num; i++)
{
counter[i] = i;
}
breakLoop: while (true)
{
// Initializing part
for (int i = 1; i < num; i++)
{
if (counter[i] >= outOf - (num - 1 - i))
counter[i] = counter[i - 1] + 1;
}
// Testing part
for (int i = 0; i < num; i++)
{
if (counter[i] < outOf)
{
continue;
} else
{
break breakLoop;
}
}
// Innermost part
combinations[arrayPointer] = counter.clone();
arrayPointer++;
// Incrementing part
counter[num - 1]++;
for (int i = num - 1; i >= 1; i--)
{
if (counter[i] >= outOf - (num - 1 - i))
counter[i - 1]++;
}
}
return combinations;
}
private static int get_nCr(int n, int r)
{
if(r > n)
{
throw new ArithmeticException("r is greater then n");
}
long numerator = 1;
long denominator = 1;
for (int i = n; i >= r + 1; i--)
{
numerator *= i;
}
for (int i = 2; i <= n - r; i++)
{
denominator *= i;
}
return (int) (numerator / denominator);
}
}

Resources