find all subsets that sum to x - using an initial code - algorithm

I am trying to build upon a problem, to solve another similar problem... given below is a code for finding the total number of subsets that sum to a particular value, and I am trying to modify the code so that I can return all subsets that sum to that value (instead of finding the count).
Code for finding the total number of suibsets that sum to 'sum':
/**
* method to return number of sets with a given sum.
**/
public static int count = 0;
public static void countSubsetSum2(int arr[], int k, int sum) {
if(sum == 0) {
count++;
return;
}
if(sum != 0 && k == 0) {
return;
}
if(sum < arr[k - 1]) {
countSubsetSum2(arr, k-1, sum);
}
countSubsetSum2(arr, k-1, sum - arr[k-1]);
countSubsetSum2(arr, k-1, sum);
}
Can someone propose some changes to this code, to make it return the subsets rather than the subset count?

Firstly, your code isn't correct.
The function, at every step, recurses with the sum excluding and including the current element 1, moving on to the next element, thanks to these lines:
countSubsetSum2(arr, k-1, sum - arr[k-1]);
countSubsetSum2(arr, k-1, sum);
But then there's also this:
if(sum < arr[k - 1]) {
countSubsetSum2(arr, k-1, sum);
}
which causes it to recurse twice with the sum excluding the current element under some circumstances (which it should never do).
Essentially you just need to remove that if-statement.
If all the elements are positive and sum - arr[k-1] < 0, we'd keep going, but we can never get a sum of 0 since the sum can't increase, thus we'd be doing a lot of unnecessary work. So, if the elements are all positive, we can add a check for if(arr[k - 1] <= sum) to the first call to improve the running time. If the elements aren't all positive, the code won't find all sums.
Now on to printing the sums
If you understand the code well, changing it to print the sums instead should be pretty easy. I suggest you work on understanding it a bit more - trace what the program will do by hand, then trace what you want the program to do.
And a hint for solving the actual problem: On noting that countSubsetSum2(arr, k-1, sum - arr[k-1]); recurses with the sum including the current element (and the other recursive call recurses with the sum excluding the current element), what you should do should become clear.
1: Well, technically it's reversed (we start with the target sum and decrease to 0 instead of starting at 0 and increasing to sum), but the same idea is there.

This is the code that works:
import java.util.LinkedList;
import java.util.Iterator;
import java.util.List;
public class subset{
public static int count = 0;
public static List list = new LinkedList();
public static void countSubsetSum2(int arr[], int k, int sum) {
if(sum <= 0 || k < 0) {
count++;
return;
}
if(sum == arr[k]) {
System.out.print(arr[k]);
for(Iterator i = list.iterator(); i.hasNext();)
System.out.print("\t" + i.next());
System.out.println();
}
list.add(arr[k]);
countSubsetSum2(arr, k-1, sum - arr[k]);
list.remove(list.size() - 1);
countSubsetSum2(arr, k-1, sum);
}
public static void main(String[] args)
{
int [] array = {1, 4, 5, 6};
countSubsetSum2(array, 3, 10);
}
}

First off, the code you have there doesn't seem to actually work (I tested it on input [1,2,3, ..., 10] with a sum of 3 and it output 128).
To get it working, first note that you implemented the algorithm in a pretty unorthodox way. Mathematical functions take input and produce output. (Arguably) the most elegant programming functions should also take input and produce output because then we can reason about them as we reason about math.
In your case you don't produce any output (the return type is void) and instead store the result in a static variable. This means it's hard to tell exactly what it means to call countSubsetSum2. In particular, what happens if you call it multiple times? It does something different each time (because the count variable will have a different starting value!) Instead, if you write countSubsetSum2 so that it returns a value then you can define its behavior to be: countSubsetSum2 returns the number of subsets of the input arr[0...k] that sum to sum. And then you can try proving why your implementation meets that specification.
I'm not doing the greatest job of explaining, but I think a more natural way to write it would be:
// Algorithm stops once k is the least element in the array
if (k == 0) {
if (sum == 0 || sum == arr[k]) {
// Either we can sum to "sum"
return 1;
}
else {
// Or we can't sum to "sum"
return 0;
}
}
// Otherwise, let's recursively see if we can sum to "sum"
// Any valid subset either includes arr[k]
return countSubsetSum2(arr, k-1, sum - arr[k]) +
// Or it doesn't
countSubsetSum2(arr, k-1, sum);
As described above, this function takes an input and outputs a value that we can define and prove to be true mathematically (caveat: it's usually not quite a proof because there are crazy edge cases in most programming languages unfortunately).
Anyways, to get back to your question. The issue with the above code is that it doesn't store any data... it just returns the count. Instead, let's generate the actual subsets while we're generating them. In particular, when I say Any valid subset either includes arr[k] I mean... the subset we're generating includes arr[k]; so add it. Below I assumed that the code you wrote above is java-ish. Hopefully it makes sense:
// Algorithm stops once k is the least element in the array
if (k == 0) {
if (sum == 0 || sum == arr[k]) {
// Either we can sum to "sum" using just arr[0]
// So return a list of all of the subsets that sum to "sum"
// There are actually a few edge cases here, so we need to be careful
List<Set<int>> ret = new List<Set<int>>();
// First consider if the singleton containing arr[k] could equal sum
if (sum == arr[k])
{
Set<int> subSet = new Subset<int>();
subSet.Add(arr[k]);
ret.Add(subSet);
}
// Now consider the empty set
if (sum == 0)
{
Set<int> subSet = new Subset<int>();
ret.Add(subSet);
}
return ret;
}
else {
// Or we can't sum to "sum" using just arr[0]
// So return a list of all of the subsets that sum to "sum". None
// (given our inputs!)
List<Set<int>> ret = new List<Set<int>>();
return ret;
}
}
// Otherwise, let's recursively generate subsets summing to "sum"
// Any valid subset either includes arr[k]
List<Set<int>> subsetsThatNeedKthElement = genSubsetSum(arr, k-1, sum - arr[k]);
// Or it doesn't
List<Set<int>> completeSubsets = genSubsetSum(arr, k-1, sum);
// Note that subsetsThatNeedKthElement only sum to "sum" - arr[k]... so we need to add
// arr[k] to each of those subsets to create subsets which sum to "sum"
// On the other hand, completeSubsets contains subsets which already sum to "sum"
// so they're "complete"
// Initialize it with the completed subsets
List<Set<int>> ret = new List<Set<int>>(completeSubsets);
// Now augment the incomplete subsets and add them to the final list
foreach (Set<int> subset in subsetsThatNeedKthElement)
{
subset.Add(arr[k]);
ret.Add(subset);
}
return ret;
The code is pretty cluttered with all the comments; but the key point is that this implementation always returns what it's specified to return (a list of sets of ints from arr[0] to arr[k] which sum to whatever sum was passed in).
FYI, there is another approach which is "bottom-up" (i.e. doesn't use recursion) which should be more performant. If you implement it that way, then you need to store extra data in static state (a "memoized table")... which is a bit ugly but practical. However, when you implement it this way you need to have a more clever way of generating the subsets. Feel free to ask that question in a separate post after giving it a try.

Based, on the comments/suggestions here, I have been able to get the solution for this problem in this way:
public static int counter = 0;
public static List<List<Integer>> lists = new ArrayList<>();
public static void getSubsetCountThatSumToTargetValue(int[] arr, int k, int targetSum, List<Integer> list) {
if(targetSum == 0) {
counter++;
lists.add(list);
return;
}
if(k <= 0) {
return;
}
getSubsetCountThatSumToTargetValue(arr, k - 1, targetSum, list);
List<Integer> appendedlist = new ArrayList<>();
appendedlist.addAll(list);
appendedlist.add(arr[k - 1]);
getSubsetCountThatSumToTargetValue(arr, k - 1, targetSum - arr[k - 1], appendedlist);
}
The main method looks like this:
public static void main(String[] args) {
int[] arr = {1, 2, 3, 4, 5};
SubSetSum.getSubsetCountThatSumToTargetValue(arr, 5, 9, new ArrayList<Integer>());
System.out.println("Result count: " + counter);
System.out.println("lists: " + lists);
}
Output:
Result: 3
lists: [[4, 3, 2], [5, 3, 1], [5, 4]]

A Python implementation with k moving from 0 to len() - 1:
import functools
def sum_of_subsets( numbers, sum_original ):
def _sum_of_subsets( list, k, sum ):
if sum < 0 or k == len( numbers ):
return
if ( sum == numbers[ k ] ):
expression = functools.reduce( lambda result, num: str( num ) if len( result ) == 0 else \
"%s + %d" % ( result, num ),
sorted( list + [ numbers[ k ]] ),
'' )
print "%d = %s" % ( sum_original, expression )
return
list.append( numbers[ k ] )
_sum_of_subsets( list, k + 1, sum - numbers[ k ])
list.pop( -1 )
_sum_of_subsets( list, k + 1, sum )
_sum_of_subsets( [], 0, sum_original )
...
sum_of_subsets( [ 8, 6, 3, 4, 2, 5, 7, 1, 9, 11, 10, 13, 12, 14, 15 ], 15 )
...
15 = 1 + 6 + 8
15 = 3 + 4 + 8
15 = 1 + 2 + 4 + 8
15 = 2 + 5 + 8
15 = 7 + 8
15 = 2 + 3 + 4 + 6
15 = 1 + 3 + 5 + 6
15 = 4 + 5 + 6
15 = 2 + 6 + 7
15 = 6 + 9
15 = 1 + 2 + 3 + 4 + 5
15 = 1 + 3 + 4 + 7
15 = 1 + 2 + 3 + 9
15 = 2 + 3 + 10
15 = 3 + 5 + 7
15 = 1 + 3 + 11
15 = 3 + 12
15 = 2 + 4 + 9
15 = 1 + 4 + 10
15 = 4 + 11
15 = 1 + 2 + 5 + 7
15 = 1 + 2 + 12
15 = 2 + 13
15 = 1 + 5 + 9
15 = 5 + 10
15 = 1 + 14
15 = 15

Related

Reconstructing input to encoder from output

I would like to understand how to solve the Codility ArrayRecovery challenge, but I don't even know what branch of knowledge to consult. Is it combinatorics, optimization, computer science, set theory, or something else?
Edit:
The branch of knowledge to consult is constraint programming, particularly constraint propagation. You also need some combinatorics to know that if you take k numbers at a time from the range [1..n], with the restriction that no number can be bigger than the one before it, that works out to be
(n+k-1)!/k!(n-1)! possible combinations
which is the same as the number of combinations with replacements of n things taken k at a time, which has the mathematical notation . You can read about why it works out like that here.
Peter Norvig provides an excellent example of how to solve this kind of problem with his Sudoku solver.
You can read the full description of the ArrayRecovery problem via the link above. The short story is that there is an encoder that takes a sequence of integers in the range 1 up to some given limit (say 100 for our purposes) and for each element of the input sequence outputs the most recently seen integer that is smaller than the current input, or 0 if none exists.
input 1, 2, 3, 4 => output 0, 1, 2, 3
input 2, 4, 3 => output 0, 2, 2
The full task is, given the output (and the range of allowable input), figure out how many possible inputs could have generated it. But before I even get to that calculation, I'm not confident about how to even approach formulating the equation. That is what I am asking for help with. (Of course a full solution would be welcome, too, if it is explained.)
I just look at some possible outputs and wonder. Here are some sample encoder outputs and the inputs I can come up with, with * meaning any valid input and something like > 4 meaning any valid input greater than 4. If needed, inputs are referred to as A1, A2, A3, ... (1-based indexing)
Edit #2
Part of the problem I was having with this challenge is that I did not manually generate the exactly correct sets of possible inputs for an output. I believe the set below is correct now. Look at this answer's edit history if you want to see my earlier mistakes.
output #1: 0, 0, 0, 4
possible inputs: [>= 4, A1 >= * >= 4, 4, > 4]
output #2: 0, 0, 0, 2, 3, 4 # A5 ↴ See more in discussion below
possible inputs: [>= 2, A1 >= * >=2, 2, 3, 4, > 4]
output #3: 0, 0, 0, 4, 3, 1
possible inputs: none # [4, 3, 1, 1 >= * > 4, 4, > 1] but there is no number 1 >= * > 4
The second input sequence is very tightly constrained compared to the first just by adding 2 more outputs. The third sequence is so constrained as to be impossible.
But the set of constraints on A5 in example #2 is a bit harder to articulate. Of course A5 > O5, that is the basic constraint on all the inputs. But any output > A4 and after O5 has to appear in the input after A4, so A5 has to be an element of the set of numbers that comes after A5 that is also > A4. Since there is only 1 such number (A6 == 4), A5 has to be it, but it gets more complicated if there is a longer string of numbers that follow. (Editor's note: actually it doesn't.)
As the output set gets longer, I worry these constraints just get more complicated and harder to get right. I cannot think of any data structures for efficiently representing these in a way that leads to efficiently calculating the number of possible combinations. I also don't quite see how to algorithmically add constraint sets together.
Here are the constraints I see so far for any given An
An > On
An <= min(Set of other possible numbers from O1 to n-1 > On). How to define the set of possible numbers greater than On?
Numbers greater than On that came after the most recent occurrence of On in the input
An >= max(Set of other possible numbers from O1 to n-1 < On). How to define the set of possible numbers less than On?
Actually this set is empty because On is, by definition, the largest possible number from the previous input sequence. (Which it not to say it is strictly the largest number from the previous input sequence.)
Any number smaller than On that came before the last occurrence of it in the input would be ineligible because of the "nearest" rule. No numbers smaller that On could have occurred after the most recent occurrence because of the "nearest" rule and because of the transitive property: if Ai < On and Aj < Ai then Aj < On
Then there is the set theory:
An must be an element of the set of unaccounted-for elements of the set of On+1 to Om, where m is the smallest m > n such that Om < On. Any output after such Om and larger than Om (which An is) would have to appear as or after Am.
An element is unaccounted-for if it is seen in the output but does not appear in the input in a position that is consistent with the rest of the output. Obviously I need a better definition than this in order to code and algorithm to calculate it.
It seems like perhaps some kind of set theory and/or combinatorics or maybe linear algebra would help with figuring out the number of possible sequences that would account for all of the unaccounted-for outputs and fit the other constraints. (Editor's note: actually, things never get that complicated.)
The code below passes all of Codility's tests. The OP added a main function to use it on the command line.
The constraints are not as complex as the OP thinks. In particular, there is never a situation where you need to add a restriction that an input be an element of some set of specific integers seen elsewhere in the output. Every input position has a well-defined minimum and maximum.
The only complication to that rule is that sometimes the maximum is "the value of the previous input" and that input itself has a range. But even then, all the values like that are consecutive and have the same range, so the number of possibilities can be calculated with basic combinatorics, and those inputs as a group are independent of the other inputs (which only serve to set the range), so the possibilities of that group can be combined with the possibilities of other input positions by simple multiplication.
Algorithm overview
The algorithm makes a single pass through the output array updating the possible numbers of input arrays after every span, which is what I am calling repetitions of numbers in the output. (You might say maximal subsequences of the output where every element is identical.) For example, for output 0,1,1,2 we have three spans: 0, 1,1 and 2. When a new span begins, the number of possibilities for the previous span is calculated.
This decision was based on a few observations:
For spans longer than 1 in length, the minimum value of the input
allowed in the first position is whatever the value is of the input
in the second position. Calculating the number of possibilities of a
span is straightforward combinatorics, but the standard formula
requires knowing the range of the numbers and the length of the span.
Every time the value of the
output changes (and a new span beings), that strongly constrains the value of the previous span:
When the output goes up, the only possible reason is that the previous input was the value of the new, higher output and the input corresponding to the position of the new, higher output, was even higher.
When an output goes down, new constraints are established, but those are a bit harder to articulate. The algorithm stores stairs (see below) in order to quantify the constraints imposed when the output goes down
The aim here was to confine the range of possible values for every span. Once we do that accurately, calculating the number of combinations is straightforward.
Because the encoder backtracks looking to output a number that relates to the input in 2 ways, both smaller and closer, we know we can throw out numbers that are larger and farther away. After a small number appears in the output, no larger number from before that position can have any influence on what follows.
So to confine these ranges of input when the output sequence decreased, we need to store stairs - a list of increasingly larger possible values for the position in the original array. E.g for 0,2,5,7,2,4 stairs build up like this: 0, 0,2, 0,2,5, 0,2,5,7, 0,2, 0,2,4.
Using these bounds we can tell for sure that the number in the position of the second 2 (next to last position in the example) must be in (2,5], because 5 is the next stair. If the input were greater than 5, a 5 would have been output in that space instead of a 2. Observe, that if the last number in the encoded array was not 4, but 6, we would exit early returning 0, because we know that the previous number couldn't be bigger than 5.
The complexity is O(n*lg(min(n,m))).
Functions
CombinationsWithReplacement - counts number of combinations with replacements of size k from n numbers. E.g. for (3, 2) it counts 3,3, 3,2, 3,1, 2,2, 2,1, 1,1, so returns 6 It is the same as choose(n - 1 + k, n - 1).
nextBigger - finds next bigger element in a range. E.g. for 4 in sub-array 1,2,3,4,5 it returns 5, and in sub-array 1,3 it returns its parameter Max.
countSpan (lambda) - counts how many different combinations a span we have just passed can have. Consider span 2,2 for 0,2,5,7,2,2,7.
When curr gets to the final position, curr is 7 and prev is the final 2 of the 2,2 span.
It computes maximum and minimum possible values of the prev span. At this point stairs consist of 2,5,7 then maximum possible value is 5 (nextBigger after 2 in the stair 2,5,7). A value of greater than 5 in this span would have output a 5, not a 2.
It computes a minimum value for the span (which is the minimum value for every element in the span), which is prev at this point, (remember curr at this moment equals to 7 and prev to 2). We know for sure that in place of the final 2 output, the original input has to have 7, so the minimum is 7. (This is a consequence of the "output goes up" rule. If we had 7,7,2 and curr would be 2 then the minimum for the previous span (the 7,7) would be 8 which is prev + 1.
It adjusts the number of combinations. For a span of length L with a range of n possibilities (1+max-min), there are possibilities, with k being either L or L-1 depending on what follows the span.
For a span followed by a larger number, like 2,2,7, k = L - 1 because the last position of the 2,2 span has to be 7 (the value of the first number after the span).
For a span followed by a smaller number, like 7,7,2, k = L because
the last element of 7,7 has no special constraints.
Finally, it calls CombinationsWithReplacement to find out the number of branches (or possibilities), computes new res partial results value (remainder values in the modulo arithmetic we are doing), and returns new res value and max for further handling.
solution - iterates over the given Encoder Output array. In the main loop, while in a span it counts the span length, and at span boundaries it updates res by calling countSpan and possibly updates the stairs.
If the current span consists of a bigger number than the previous one, then:
Check validity of the next number. E.g 0,2,5,2,7 is invalid input, becuase there is can't be 7 in the next-to-last position, only 3, or 4, or 5.
It updates the stairs. When we have seen only 0,2, the stairs are 0,2, but after the next 5, the stairs become 0,2,5.
If the current span consists of a smaller number then the previous one, then:
It updates stairs. When we have seen only 0,2,5, our stairs are 0,2,5, but after we have seen 0,2,5,2 the stairs become 0,2.
After the main loop it accounts for the last span by calling countSpan with -1 which triggers the "output goes down" branch of calculations.
normalizeMod, extendedEuclidInternal, extendedEuclid, invMod - these auxiliary functions help to deal with modulo arithmetic.
For stairs I use storage for the encoded array, as the number of stairs never exceeds current position.
#include <algorithm>
#include <cassert>
#include <vector>
#include <tuple>
const int Modulus = 1'000'000'007;
int CombinationsWithReplacement(int n, int k);
template <class It>
auto nextBigger(It begin, It end, int value, int Max) {
auto maxIt = std::upper_bound(begin, end, value);
auto max = Max;
if (maxIt != end) {
max = *maxIt;
}
return max;
}
auto solution(std::vector<int> &B, const int Max) {
auto res = 1;
const auto size = (int)B.size();
auto spanLength = 1;
auto prev = 0;
// Stairs is the list of numbers which could be smaller than number in the next position
const auto stairsBegin = B.begin();
// This includes first entry (zero) into stairs
// We need to include 0 because we can meet another zero later in encoded array
// and we need to be able to find in stairs
auto stairsEnd = stairsBegin + 1;
auto countSpan = [&](int curr) {
const auto max = nextBigger(stairsBegin, stairsEnd, prev, Max);
// At the moment when we switch from the current span to the next span
// prev is the number from previous span and curr from current.
// E.g. 1,1,7, when we move to the third position cur = 7 and prev = 1.
// Observe that, in this case minimum value possible in place of any of 1's can be at least 2=1+1=prev+1.
// But if we consider 7, then we have even more stringent condition for numbers in place of 1, it is 7
const auto min = std::max(prev + 1, curr);
const bool countLast = prev > curr;
const auto branchesCount = CombinationsWithReplacement(max - min + 1, spanLength - (countLast ? 0 : 1));
return std::make_pair(res * (long long)branchesCount % Modulus, max);
};
for (int i = 1; i < size; ++i) {
const auto curr = B[i];
if (curr == prev) {
++spanLength;
}
else {
int max;
std::tie(res, max) = countSpan(curr);
if (prev < curr) {
if (curr > max) {
// 0,1,5,1,7 - invalid because number in the fourth position lies in [2,5]
// and so in the fifth encoded position we can't something bigger than 5
return 0;
}
// It is time to possibly shrink stairs.
// E.g if we had stairs 0,2,4,9,17 and current value is 5,
// then we no more interested in 9 and 17, and we change stairs to 0,2,4,5.
// That's because any number bigger than 9 or 17 also bigger than 5.
const auto s = std::lower_bound(stairsBegin, stairsEnd, curr);
stairsEnd = s;
*stairsEnd++ = curr;
}
else {
assert(curr < prev);
auto it = std::lower_bound(stairsBegin, stairsEnd, curr);
if (it == stairsEnd || *it != curr) {
// 0,5,1 is invalid sequence because original sequence lloks like this 5,>5,>1
// and there is no 1 in any of the two first positions, so
// it can't appear in the third position of the encoded array
return 0;
}
}
spanLength = 1;
}
prev = curr;
}
res = countSpan(-1).first;
return res;
}
template <class T> T normalizeMod(T a, T m) {
if (a < 0) return a + m;
return a;
}
template <class T> std::pair<T, std::pair<T, T>> extendedEuclidInternal(T a, T b) {
T old_x = 1;
T old_y = 0;
T x = 0;
T y = 1;
while (true) {
T q = a / b;
T t = a - b * q;
if (t == 0) {
break;
}
a = b;
b = t;
t = x; x = old_x - x * q; old_x = t;
t = y; y = old_y - y * q; old_y = t;
}
return std::make_pair(b, std::make_pair(x, y));
}
// Returns gcd and Bezout's coefficients
template <class T> std::pair<T, std::pair<T, T>> extendedEuclid(T a, T b) {
if (a > b) {
if (b == 0) return std::make_pair(a, std::make_pair(1, 0));
return extendedEuclidInternal(a, b);
}
else {
if (a == 0) return std::make_pair(b, std::make_pair(0, 1));
auto p = extendedEuclidInternal(b, a);
std::swap(p.second.first, p.second.second);
return p;
}
}
template <class T> T invMod(T a, T m) {
auto p = extendedEuclid(a, m);
assert(p.first == 1);
return normalizeMod(p.second.first, m);
}
int CombinationsWithReplacement(int n, int k) {
int res = 1;
for (long long i = n; i < n + k; ++i) {
res = res * i % Modulus;
}
int denom = 1;
for (long long i = k; i > 0; --i) {
denom = denom * i % Modulus;
}
res = res * (long long)invMod(denom, Modulus) % Modulus;
return res;
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Only the above is needed for the Codility challenge. Below is to run on the command line.
//
// Compile with: gcc -std=gnu++14 -lc++ -lstdc++ array_recovery.cpp
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#include <string.h>
// Usage: 0 1 2,3, 4 M
// Last arg is M, the max value for an input.
// Remaining args are B (the output of the encoder) separated by commas and/or spaces
// Parentheses and brackets are ignored, so you can use the same input form as Codility's tests: ([1,2,3], M)
int main(int argc, char* argv[]) {
int Max;
std::vector<int> B;
const char* delim = " ,[]()";
if (argc < 2 ) {
printf("Usage: %s M 0 1 2,3, 4... \n", argv[0]);
return 1;
}
for (int i = 1; i < argc; i++) {
char* parse;
parse = strtok(argv[i], delim);
while (parse != NULL)
{
B.push_back(atoi(parse));
parse = strtok (NULL, delim);
}
}
Max = B.back();
B.pop_back();
printf("%d\n", solution(B, Max));
return 0;
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Only the above is needed for the Codility challenge. Below is to run on the command line.
//
// Compile with: gcc -std=gnu++14 -lc++ -lstdc++ array_recovery.cpp
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#include <string.h>
// Usage: M 0 1 2,3, 4
// first arg is M, the max value for an input.
// remaining args are B (the output of the encoder) separated by commas and/or spaces
int main(int argc, char* argv[]) {
int Max;
std::vector<int> B;
const char* delim = " ,";
if (argc < 3 ) {
printf("Usage: %s M 0 1 2,3, 4... \n", argv[0]);
return 1;
}
Max = atoi(argv[1]);
for (int i = 2; i < argc; i++) {
char* parse;
parse = strtok(argv[i], delim);
while (parse != NULL)
{
B.push_back(atoi(parse));
parse = strtok (NULL, delim);
}
}
printf("%d\n", solution(B, Max));
return 0;
}
Let's see an example:
Max = 5
Array is
0 1 3 0 1 1 3
1
1 2..5
1 3 4..5
1 3 4..5 1
1 3 4..5 1 2..5
1 3 4..5 1 2..5 >=..2 (sorry, for a cumbersome way of writing)
1 3 4..5 1 3..5 >=..3 4..5
Now count:
1 1 2 1 3 2 which amounts to 12 total.
Here's an idea. One known method to construct the output is to use a stack. We pop it while the element is greater or equal, then output the smaller element if it exists, then push the greater element onto the stack. Now what if we attempted to do this backwards from the output?
First we'll demonstrate the stack method using the c∅dility example.
[2, 5, 3, 7, 9, 6]
2: output 0, stack [2]
5: output 2, stack [2,5]
3: pop 5, output, 2, stack [2,3]
7: output 3, stack [2,3,7]
... etc.
Final output: [0, 2, 2, 3, 7, 3]
Now let's try reconstruction! We'll use stack both as the imaginary stack and as the reconstituted input:
(Input: [2, 5, 3, 7, 9, 6])
Output: [0, 2, 2, 3, 7, 3]
* Something >3 that reached 3 in the stack
stack = [3, 3 < *]
* Something >7 that reached 7 in the stack
but both of those would've popped before 3
stack = [3, 7, 7 < x, 3 < * <= x]
* Something >3, 7 qualifies
stack = [3, 7, 7 < x, 3 < * <= x]
* Something >2, 3 qualifies
stack = [2, 3, 7, 7 < x, 3 < * <= x]
* Something >2 and >=3 since 3 reached 2
stack = [2, 2 < *, 3, 7, 7 < x, 3 < * <= x]
Let's attempt your examples:
Example 1:
[0, 0, 0, 2, 3, 4]
* Something >4
stack = [4, 4 < *]
* Something >3, 4 qualifies
stack = [3, 4, 4 < *]
* Something >2, 3 qualifies
stack = [2, 3, 4, 4 < *]
* The rest is non-increasing with lowerbound 2
stack = [y >= x, x >= 2, 2, 3, 4, >4]
Example 2:
[0, 0, 0, 4]
* Something >4
stack [4, 4 < *]
* Non-increasing
stack = [z >= y, y >= 4, 4, 4 < *]
Calculating the number of combinations is achieved by multiplying together the possibilities for all the sections. A section is either a bounded single cell; or a bound, non-increasing subarray of one or more cells. To calculate the latter we use the multi-choose binomial, (n + k - 1) choose (k - 1). Consider that we can express the differences between the cells of a bound, non-increasing sequence of 3 cells as:
(ub - cell_3) + (cell_3 - cell_2) + (cell_2 - cell_1) + (cell_1 - lb) = ub - lb
Then the number of ways to distribute ub - lb into (x + 1) cells is
(n + k - 1) choose (k - 1)
or
(ub - lb + x) choose x
For example, the number of non-increasing sequences between
(3,4) in two cells is (4 - 3 + 2) choose 2 = 3: [3,3] [4,3] [4,4]
And the number of non-increasing sequences between
(3,4) in three cells is (4 - 3 + 3) choose 3 = 4: [3,3,3] [4,3,3] [4,4,3] [4,4,4]
(Explanation attributed to Brian M. Scott.)
Rough JavaScript sketch (the code is unreliable; it's only meant to illustrate the encoding. The encoder lists [lower_bound, upper_bound], or a non-increasing sequence as [non_inc, length, lower_bound, upper_bound]):
function f(A, M){
console.log(JSON.stringify(A), M);
let i = A.length - 1;
let last = A[i];
let s = [[last,last]];
if (A[i-1] == last){
let d = 1;
s.splice(1,0,['non_inc',d++,last,M]);
while (i > 0 && A[i-1] == last){
s.splice(1,0,['non_inc',d++,last,M]);
i--
}
} else {
s.push([last+1,M]);
i--;
}
if (i == 0)
s.splice(0,1);
for (; i>0; i--){
let x = A[i];
if (x < s[0][0])
s = [[x,x]].concat(s);
if (x > s[0][0]){
let [l, _l] = s[0];
let [lb, ub] = s[1];
s[0] = [x+1, M];
s[1] = [lb, x];
s = [[l,_l], [x,x]].concat(s);
}
if (x == s[0][0]){
let [l,_l] = s[0];
let [lb, ub] = s[1];
let d = 1;
s.splice(0,1);
while (i > 0 && A[i-1] == x){
s =
[['non_inc', d++, lb, M]].concat(s);
i--;
}
if (i > 0)
s = [[l,_l]].concat(s);
}
}
// dirty fix
if (s[0][0] == 0)
s.splice(0,1);
return s;
}
var a = [2, 5, 3, 7, 9, 6]
var b = [0, 2, 2, 3, 7, 3]
console.log(JSON.stringify(a));
console.log(JSON.stringify(f(b,10)));
b = [0,0,0,4]
console.log(JSON.stringify(f(b,10)));
b = [0,2,0,0,0,4]
console.log(JSON.stringify(f(b,10)));
b = [0,0,0,2,3,4]
console.log(JSON.stringify(f(b,10)));
b = [0,2,2]
console.log(JSON.stringify(f(b,4)));
b = [0,3,5,6]
console.log(JSON.stringify(f(b,10)));
b = [0,0,3,0]
console.log(JSON.stringify(f(b,10)));

Maximum number achievable by converting two adjacent x to one (x+1)

Given a sequence of N integers where 1 <= N <= 500 and the numbers are between 1 and 50. In a step any two adjacent equal numbers x x can be replaced with a single x + 1. What is the maximum number achievable by such steps.
For example if given 2 3 1 1 2 2 then the maximum possible is 4:
2 3 1 1 2 2 ---> 2 3 2 2 2 ---> 2 3 3 2 ---> 2 4 2.
It is evident that I should try to do better than the maximum number available in the sequence. But I can't figure out a good algorithm.
Each substring of the input can make at most one single number (invariant: the log base two of the sum of two to the power of each entry). For every x, we can find the set of substrings that can make x. For each x, this is (1) every occurrence of x (2) the union of two contiguous substrings that can make x - 1. The resulting algorithm is O(N^2)-time.
An algorithm could work like this:
Convert the input to an array where every element has a frequency attribute, collapsing repeated consecutive values in the input into one single node. For example, this input:
1 2 2 4 3 3 3 3
Would be represented like this:
{val: 1, freq: 1} {val: 2, freq: 2} {val: 4, freq: 1} {val: 3, freq: 4}
Then find local minima nodes, like the node (3 3 3 3) in 1 (2 2) 4 (3 3 3 3) 4, i.e. nodes whose neighbours both have higher values. For those local minima that have an even frequency, "lift" those by applying the step. Repeat this until no such local minima (with even frequency) exist any more.
Start of the recursive part of the algorithm:
At both ends of the array, work inwards to "lift" values as long as the more inner neighbour has a higher value. With this rule, the following:
1 2 2 3 5 4 3 3 3 1 1
will completely resolve. First from the left side inward:
1 4 5 4 3 3 3 1 1
Then from the right side:
1 4 6 3 2
Note that when there is an odd frequency (like for the 3s above), there will be a "remainder" that cannot be incremented. The remainder should in this rule always be left on the outward side, so to maximise the potential towards the inner part of the array.
At this point the remaining local minima have odd frequencies. Applying the step to such a node will always leave a "remainder" (like above) with the original value. This remaining node can appear anywhere, but it only makes sense to look at solutions where this remainder is on the left side or the right side of the lift (not in the middle). So for example:
4 1 1 1 1 1 2 3 4
Can resolve to one of these:
4 2 2 1 2 3 4
Or:
4 1 2 2 2 3 4
The 1 in either second or fourth position, is the above mentioned "remainder". Obviously, the second way of resolving is more promising in this example. In general, the choice is obvious when on one side there is a value that is too high to merge with, like the left-most 4 is too high for five 1 values to get to. The 4 is like a wall.
When the frequency of the local minimum is one, there is nothing we can do with it. It actually separates the array in a left and right side that do not influence each other. The same is true for the remainder element discussed above: it separates the array into two parts that do not influence each other.
So the next step in the algorithm is to find such minima (where the choice is obvious), apply that kind of step and separate the problem into two distinct problems which should be solved recursively (from the top). So in the last example, the following two problems would be solved separately:
4
2 2 3 4
Then the best of both solutions will count as the overall solution. In this case that is 5.
The most challenging part of the algorithm is to deal with those local minima for which the choice of where to put the remainder is not obvious. For instance;
3 3 1 1 1 1 1 2 3
This can go to either:
3 3 2 2 1 2 3
3 3 1 2 2 2 3
In this example the end result is the same for both options, but in bigger arrays it would be less and less obvious. So here both options have to be investigated. In general you can have many of them, like 2 in this example:
3 1 1 1 2 3 1 1 1 1 1 3
Each of these two minima has two options. This seems like to explode into too many possibilities for larger arrays. But it is not that bad. The algorithm can take opposite choices in neighbouring minima, and go alternating like this through the whole array. This way alternating sections are favoured, and get the most possible value drawn into them, while the other sections are deprived of value. Now the algorithm turns the tables, and toggles all choices so that the sections that were previously favoured are now deprived, and vice versa. The solution of both these alternatives is derived by resolving each section recursively, and then comparing the two "grand" solutions to pick the best one.
Snippet
Here is a live JavaScript implementation of the above algorithm.
Comments are provided which hopefully should make it readable.
"use strict";
function Node(val, freq) {
// Immutable plain object
return Object.freeze({
val: val,
freq: freq || 1, // Default frequency is 1.
// Max attainable value when merged:
reduced: val + (freq || 1).toString(2).length - 1
});
}
function compress(a) {
// Put repeated elements in a single node
var result = [], i, j;
for (i = 0; i < a.length; i = j) {
for (j = i + 1; j < a.length && a[j] == a[i]; j++);
result.push(Node(a[i], j - i));
}
return result;
}
function decompress(a) {
// Expand nodes into separate, repeated elements
var result = [], i, j;
for (i = 0; i < a.length; i++) {
for (j = 0; j < a[i].freq; j++) {
result.push(a[i].val);
}
}
return result;
}
function str(a) {
return decompress(a).join(' ');
}
function unstr(s) {
s = s.replace(/\D+/g, ' ').trim();
return s.length ? compress(s.split(/\s+/).map(Number)) : [];
}
/*
The function merge modifies an array in-place, performing a "step" on
the indicated element.
The array will get an element with an incremented value
and decreased frequency, unless a join occurs with neighboring
elements with the same value: then the frequencies are accumulated
into one element. When the original frequency was odd there will
be a "remainder" element in the modified array as well.
*/
function merge(a, i, leftWards, stats) {
var val = a[i].val+1,
odd = a[i].freq % 2,
newFreq = a[i].freq >> 1,
last = i;
// Merge with neighbouring nodes of same value:
if ((!odd || !leftWards) && a[i+1] && a[i+1].val === val) {
newFreq += a[++last].freq;
}
if ((!odd || leftWards) && i && a[i-1].val === val) {
newFreq += a[--i].freq;
}
// Replace nodes
a.splice(i, last-i+1, Node(val, newFreq));
if (odd) a.splice(i+leftWards, 0, Node(val-1));
// Update statistics and trace: this is not essential to the algorithm
if (stats) {
stats.total_applied_merges++;
if (stats.trace) stats.trace.push(str(a));
}
return i;
}
/* Function Solve
Parameters:
a: The compressed array to be reduced via merges. It is changed in-place
and should not be relied on after the call.
stats: Optional plain object that will be populated with execution statistics.
Return value:
The array after the best merges were applied to achieve the highest
value, which is stored in the maxValue custom property of the array.
*/
function solve(a, stats) {
var maxValue, i, j, traceOrig, skipLeft, skipRight, sections, goLeft,
b, choice, alternate;
if (!a.length) return a;
if (stats && stats.trace) {
traceOrig = stats.trace;
traceOrig.push(stats.trace = [str(a)]);
}
// Look for valleys of even size, and "lift" them
for (i = 1; i < a.length - 1; i++) {
if (a[i-1].val > a[i].val && a[i].val < a[i+1].val && (a[i].freq % 2) < 1) {
// Found an even valley
i = merge(a, i, false, stats);
if (i) i--;
}
}
// Check left-side elements with always increasing values
for (i = 0; i < a.length-1 && a[i].val < a[i+1].val; i++) {
if (a[i].freq > 1) i = merge(a, i, false, stats) - 1;
};
// Check right-side elements with always increasing values, right-to-left
for (j = a.length-1; j > 0 && a[j-1].val > a[j].val; j--) {
if (a[j].freq > 1) j = merge(a, j, true, stats) + 1;
};
// All resolved?
if (i == j) {
while (a[i].freq > 1) merge(a, i, true, stats);
a.maxValue = a[i].val;
} else {
skipLeft = i;
skipRight = a.length - 1 - j;
// Look for other valleys (odd sized): they will lead to a split into sections
sections = [];
for (i = a.length - 2 - skipRight; i > skipLeft; i--) {
if (a[i-1].val > a[i].val && a[i].val < a[i+1].val) {
// Odd number of elements: if more than one, there
// are two ways to merge them, but maybe
// one of both possibilities can be excluded.
goLeft = a[i+1].val > a[i].reduced;
if (a[i-1].val > a[i].reduced || goLeft) {
if (a[i].freq > 1) i = merge(a, i, goLeft, stats) + goLeft;
// i is the index of the element which has become a 1-sized valley
// Split off the right part of the array, and store the solution
sections.push(solve(a.splice(i--), stats));
}
}
}
if (sections.length) {
// Solve last remaining section
sections.push(solve(a, stats));
sections.reverse();
// Combine the solutions of all sections into one
maxValue = sections[0].maxValue;
for (i = sections.length - 1; i >= 0; i--) {
maxValue = Math.max(sections[i].maxValue, maxValue);
}
} else {
// There is no more valley that can be resolved without branching into two
// directions. Look for the remaining valleys.
sections = [];
b = a.slice(0); // take copy
for (choice = 0; choice < 2; choice++) {
if (choice) a = b; // restore from copy on second iteration
alternate = choice;
for (i = a.length - 2 - skipRight; i > skipLeft; i--) {
if (a[i-1].val > a[i].val && a[i].val < a[i+1].val) {
// Odd number of elements
alternate = !alternate
i = merge(a, i, alternate, stats) + alternate;
sections.push(solve(a.splice(i--), stats));
}
}
// Solve last remaining section
sections.push(solve(a, stats));
}
sections.reverse(); // put in logical order
// Find best section:
maxValue = sections[0].maxValue;
for (i = sections.length - 1; i >= 0; i--) {
maxValue = Math.max(sections[i].maxValue, maxValue);
}
for (i = sections.length - 1; i >= 0 && sections[i].maxValue < maxValue; i--);
// Which choice led to the highest value (choice = 0 or 1)?
choice = (i >= sections.length / 2)
// Discard the not-chosen version
sections = sections.slice(choice * sections.length/2);
}
// Reconstruct the solution from the sections.
a = [].concat.apply([], sections);
a.maxValue = maxValue;
}
if (traceOrig) stats.trace = traceOrig;
return a;
}
function randomValues(len) {
var a = [];
for (var i = 0; i < len; i++) {
// 50% chance for a 1, 25% for a 2, ... etc.
a.push(Math.min(/\.1*/.exec(Math.random().toString(2))[0].length,5));
}
return a;
}
// I/O
var inputEl = document.querySelector('#inp');
var randEl = document.querySelector('#rand');
var lenEl = document.querySelector('#len');
var goEl = document.querySelector('#go');
var outEl = document.querySelector('#out');
goEl.onclick = function() {
// Get the input and structure it
var a = unstr(inputEl.value),
stats = {
total_applied_merges: 0,
trace: a.length < 100 ? [] : undefined
};
// Apply algorithm
a = solve(a, stats);
// Output results
var output = {
value: a.maxValue,
compact: str(a),
total_applied_merges: stats.total_applied_merges,
trace: stats.trace || 'no trace produced (input too large)'
};
outEl.textContent = JSON.stringify(output, null, 4);
}
randEl.onclick = function() {
// Get input (count of numbers to generate):
len = lenEl.value;
// Generate
var a = randomValues(len);
// Output
inputEl.value = a.join(' ');
// Simulate click to find the solution immediately.
goEl.click();
}
// Tests
var tests = [
' ', '',
'1', '1',
'1 1', '2',
'2 2 1 2 2', '3 1 3',
'3 2 1 1 2 2 3', '5',
'3 2 1 1 2 2 3 1 1 1 1 3 2 2 1 1 2', '6',
'3 1 1 1 3', '3 2 1 3',
'2 1 1 1 2 1 1 1 2 1 1 1 1 1 2', '3 1 2 1 4 1 2',
'3 1 1 2 1 1 1 2 3', '4 2 1 2 3',
'1 4 2 1 1 1 1 1 1 1', '1 5 1',
];
var res;
for (var i = 0; i < tests.length; i+=2) {
var res = str(solve(unstr(tests[i])));
if (res !== tests[i+1]) throw 'Test failed: ' + tests[i] + ' returned ' + res + ' instead of ' + tests[i+1];
}
Enter series (space separated):<br>
<input id="inp" size="60" value="2 3 1 1 2 2"><button id="go">Solve</button>
<br>
<input id="len" size="4" value="30"><button id="rand">Produce random series of this size and solve</button>
<pre id="out"></pre>
As you can see the program produces a reduced array with the maximum value included. In general there can be many derived arrays that have this maximum; only one is given.
An O(n*m) time and space algorithm is possible, where, according to your stated limits, n <= 500 and m <= 58 (consider that even for a billion elements, m need only be about 60, representing the largest element ± log2(n)). m is representing the possible numbers 50 + floor(log2(500)):
Consider the condensed sequence, s = {[x, number of x's]}.
If M[i][j] = [num_j,start_idx] where num_j represents the maximum number of contiguous js ending at index i of the condensed sequence; start_idx, the index where the sequence starts or -1 if it cannot join earlier sequences; then we have the following relationship:
M[i][j] = [s[i][1] + M[i-1][j][0], M[i-1][j][1]]
when j equals s[i][0]
j's greater than s[i][0] but smaller than or equal to s[i][0] + floor(log2(s[i][1])), represent converting pairs and merging with an earlier sequence if applicable, with a special case after the new count is odd:
When M[i][j][0] is odd, we do two things: first calculate the best so far by looking back in the matrix to a sequence that could merge with M[i][j] or its paired descendants, and then set a lower bound in the next applicable cells in the row (meaning a merge with an earlier sequence cannot happen via this cell). The reason this works is that:
if s[i + 1][0] > s[i][0], then s[i + 1] could only possibly pair with the new split section of s[i]; and
if s[i + 1][0] < s[i][0], then s[i + 1] might generate a lower j that would combine with the odd j from M[i], potentially making a longer sequence.
At the end, return the largest entry in the matrix, max(j + floor(log2(num_j))), for all j.
JavaScript code (counterexamples would be welcome; the limit on the answer is set at 7 for convenient visualization of the matrix):
function f(str){
var arr = str.split(/\s+/).map(Number);
var s = [,[arr[0],0]];
for (var i=0; i<arr.length; i++){
if (s[s.length - 1][0] == arr[i]){
s[s.length - 1][1]++;
} else {
s.push([arr[i],1]);
}
}
var M = [new Array(8).fill([0,0])],
best = 0;
for (var i=1; i<s.length; i++){
M[i] = new Array(8).fill([0,i]);
var temp = s[i][1],
temp_odd,
temp_start,
odd = false;
for (var j=s[i][0]; temp>0; j++){
var start_idx = odd ? temp_start : M[i][j-1][1];
if (start_idx != -1 && M[start_idx - 1][j][0]){
temp += M[start_idx - 1][j][0];
start_idx = M[start_idx - 1][j][1];
}
if (!odd){
M[i][j] = [temp,start_idx];
temp_odd = temp;
} else {
M[i][j] = [temp_odd,-1];
temp_start = start_idx;
}
if (!odd && temp & 1 && temp > 1){
odd = true;
temp_start = start_idx;
}
best = Math.max(best,j + Math.floor(Math.log2(temp)));
temp >>= 1;
temp_odd >>= 1;
}
}
return [arr, s, best, M];
}
// I/O
var button = document.querySelector('button');
var input = document.querySelector('input');
var pre = document.querySelector('pre');
button.onclick = function() {
var val = input.value;
var result = f(val);
var text = '';
for (var i=0; i<3; i++){
text += JSON.stringify(result[i]) + '\n\n';
}
for (var i in result[3]){
text += JSON.stringify(result[3][i]) + '\n';
}
pre.textContent = text;
}
<input value ="2 2 3 3 2 2 3 3 5">
<button>Solve</button>
<pre></pre>
Here's a brute force solution:
function findMax(array A, int currentMax)
for each pair (i, i+1) of indices for which A[i]==A[i+1] do
currentMax = max(A[i]+1, currentMax)
replace A[i],A[i+1] by a single number A[i]+1
currentMax = max(currentMax, findMax(A, currentMax))
end for
return currentMax
Given the array A, let currentMax=max(A[0], ..., A[n])
print findMax(A, currentMax)
The algorithm terminates because in each recursive call the array shrinks by 1.
It's also clear that it is correct: we try out all possible replacement sequences.
The code is extremely slow when the array is large and there's lots of options regarding replacements, but actually works reasonbly fast on arrays with small number of replaceable pairs. (I'll try to quantify the running time in terms of the number of replaceable pairs.)
A naive working code in Python:
def findMax(L, currMax):
for i in range(len(L)-1):
if L[i] == L[i+1]:
L[i] += 1
del L[i+1]
currMax = max(currMax, L[i])
currMax = max(currMax, findMax(L, currMax))
L[i] -= 1
L.insert(i+1, L[i])
return currMax
# entry point
if __name__ == '__main__':
L1 = [2, 3, 1, 1, 2, 2]
L2 = [2, 3, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]
print findMax(L1, max(L1))
print findMax(L2, max(L2))
The result of the first call is 4, as expected.
The result of the second call is 5 as expected; the sequence that gives the result: 2,3,1,1,2,2,2,2,2,2,2,2, -> 2,3,1,1,3,2,2,2,2,2,2 -> 2,3,1,1,3,3,2,2,2,2, -> 2,3,1,1,3,3,3,2,2 -> 2,3,1,1,3,3,3,3 -> 2,3,1,1,4,3, -> 2,3,1,1,4,4 -> 2,3,1,1,5

n steps with 1, 2 or 3 steps taken. How many ways to get to the top?

If we have n steps and we can go up 1 or 2 steps at a time, there is a Fibonacci relation between the number of steps and the ways to climb them. IF and ONLY if we do not count 2+1 and 1+2 as different.
However, this no longer the case, as well as having to add we add a third option, taking 3 steps. How do I do this?
What I have:
1 step = 1 way
2 steps = 2 ways: 1+1, 2
3 steps = 4 ways: 1+1+1, 2+1, 1+2, 3
I have no idea where to go from here to find out the number of ways for n stairs
I get 7 for n = 4 and 14 for n= 5 i get 14+7+4+2+1 by doing the sum of all the combinations before it. so ways for n steps = n-1 ways + n-2 ways + .... 1 ways assuming i kept all the values. DYNAMIC programming.
1 2 and 3 steps would be the base-case is that correct?
I would say that the formula will look in the following way:
K(1) = 1
K(2) = 2
k(3) = 4
K(n) = K(n-3) + K(n-2) + K(n - 1)
The formula says that in order to reach the n'th step we have to firstly reach:
n-3'th step and then take 3 steps at once i.e. K(n-3)
or n-2'th step and then take 2 steps at once i.e. K(n-2)
or n-1'th step and then take 1 steps at once i.e. K(n-1)
K(4) = 7, K(5) = 13 etc.
You can either utilize the recursive formula or use dynamic programming.
Python solutions:
Recursive O(n)
This is based on the answer by Michael. This requires O(n) CPU and O(n) memory.
import functools
#functools.lru_cache(maxsize=None)
def recursive(n):
if n < 4:
initial = [1, 2, 4]
return initial[n-1]
else:
return recursive(n-1) + recursive(n-2) + recursive(n-3)
Recursive O(log(n))
This is per a comment for this answer. This tribonacci-by-doubling solution is analogous to the fibonacci-by-doubling solution in the algorithms by Nayuki. Note that multiplication has a higher complexity than constant. This doesn't require or benefit from a cache.
def recursive_doubling(n):
def recursive_tribonacci_tuple(n):
"""Return the n, n+1, and n+2 tribonacci numbers for n>=0.
Tribonacci forward doubling identities:
T(2n) = T(n+1)^2 + T(n)*(2*T(n+2) - 2*T(n+1) - T(n))
T(2n+1) = T(n)^2 + T(n+1)*(2*T(n+2) - T(n+1))
T(2n+2) = T(n+2)^2 + T(n+1)*(2*T(n) + T(n+1))
"""
assert n >= 0
if n == 0:
return 0, 0, 1 # T(0), T(1), T(2)
a, b, c = recursive_tribonacci_tuple(n // 2)
x = b*b + a*(2*(c - b) - a)
y = a*a + b*(2*c - b)
z = c*c + b*(2*a + b)
return (x, y, z) if n % 2 == 0 else (y, z, x+y+z)
return recursive_tribonacci_tuple(n)[2] # Is offset by 2 for the steps problem.
Iterative O(n)
This is motivated by the answer by 太極者無極而生. It is a modified tribonacci extension of the iterative fibonacci solution. It is modified from tribonacci in that it returns c, not a.
def iterative(n):
a, b, c = 0, 0, 1
for _ in range(n):
a, b, c = b, c, a+b+c
return c
Iterative O(log(n)) (left to right)
This is per a comment for this answer. This modified iterative tribonacci-by-doubling solution is derived from the corresponding recursive solution. For some background, see here and here. It is modified from tribonacci in that it returns c, not a. Note that multiplication has a higher complexity than constant.
The bits of n are iterated from left to right, i.e. MSB to LSB.
def iterative_doubling_l2r(n):
"""Return the n+2 tribonacci number for n>=0.
Tribonacci forward doubling identities:
T(2n) = T(n+1)^2 + T(n)*(2*T(n+2) - 2*T(n+1) - T(n))
T(2n+1) = T(n)^2 + T(n+1)*(2*T(n+2) - T(n+1))
T(2n+2) = T(n+2)^2 + T(n+1)*(2*T(n) + T(n+1))
"""
assert n >= 0
a, b, c = 0, 0, 1 # T(0), T(1), T(2)
for i in range(n.bit_length() - 1, -1, -1): # Left (MSB) to right (LSB).
x = b*b + a*(2*(c - b) - a)
y = a*a + b*(2*c - b)
z = c*c + b*(2*a + b)
bit = (n >> i) & 1
a, b, c = (y, z, x+y+z) if bit else (x, y, z)
return c
Notes:
list(range(m - 1, -1, -1)) == list(reversed(range(m)))
If the bit is odd (1), the sequence is advanced by one iteration. This intuitively makes sense after understanding the same for the efficient integer exponentiation problem.
Iterative O(log(n)) (right to left)
This is per a comment for this answer. The bits of n are iterated from right to left, i.e. LSB to MSB. This approach is probably not prescriptive.
def iterative_doubling_r2l(n):
"""Return the n+2 tribonacci number for n>=0.
Tribonacci forward doubling identities:
T(2n) = T(n+1)^2 + T(n)*(2*T(n+2) - 2*T(n+1) - T(n))
T(2n+1) = T(n)^2 + T(n+1)*(2*T(n+2) - T(n+1))
T(2n+2) = T(n+2)^2 + T(n+1)*(2*T(n) + T(n+1))
Given Tribonacci tuples (T(n), T(n+1), T(n+2)) and (T(k), T(k+1), T(k+2)),
we can "add" them together to get (T(n+k), T(n+k+1), T(n+k+2)).
Tribonacci addition formulas:
T(n+k) = T(n)*(T(k+2) - T(k+1) - T(k)) + T(n+1)*(T(k+1) - T(k)) + T(n+2)*T(k)
T(n+k+1) = T(n)*T(k) + T(n+1)*(T(k+2) - T(k+1)) + T(n+2)*T(k+1)
T(n+k+2) = T(n)*T(k+1) + T(n+1)*(T(k) + T(k+1)) + T(n+2)*T(k+2)
When n == k, these are equivalent to the doubling formulas.
"""
assert n >= 0
a, b, c = 0, 0, 1 # T(0), T(1), T(2)
d, e, f = 0, 1, 1 # T(1), T(2), T(3)
for i in range(n.bit_length()): # Right (LSB) to left (MSB).
bit = (n >> i) & 1
if bit:
# a, b, c += d, e, f
x = a*(f - e - d) + b*(e - d) + c*d
y = a*d + b*(f - e) + c*e
z = a*e + b*(d + e) + c*f
a, b, c = x, y, z
# d, e, f += d, e, f
x = e*e + d*(2*(f - e) - d)
y = d*d + e*(2*f - e)
z = f*f + e*(2*d + e)
d, e, f = x, y, z
return c
Approximations
Approximations are of course useful mainly for very large n. The exponentiation operation is used. Note that exponentiation has a higher complexity than constant.
def approx1(n):
a_pos = (19 + 3*(33**.5))**(1./3)
a_neg = (19 - 3*(33**.5))**(1./3)
b = (586 + 102*(33**.5))**(1./3)
return round(3*b * ((1./3) * (a_pos+a_neg+1))**(n+1) / (b**2 - 2*b + 4))
The approximation above was tested to be correct till n = 53, after which it differed. It's certainly possible that using higher precision floating point arithmetic will lead to a better approximation in practice.
def approx2(n):
return round((0.618363 * 1.8392**n + \
(0.029252 + 0.014515j) * (-0.41964 - 0.60629j)**n + \
(0.029252 - 0.014515j) * (-0.41964 - 0.60629j)**n).real)
The approximation above was tested to be correct till n = 11, after which it differed.
This is my solution in Ruby:
# recursion requirement: it returns the number of way up
# a staircase of n steps, given that the number of steps
# can be 1, 2, 3
def how_many_ways(n)
# this is a bit Zen like, if 0 steps, then there is 1 way
# and we don't even need to specify f(1), because f(1) = summing them up
# and so f(1) = f(0) = 1
# Similarly, f(2) is summing them up = f(1) + f(0) = 1 + 1 = 2
# and so we have all base cases covered
return 1 if n == 0
how_many_ways_total = 0
(1..3).each do |n_steps|
if n >= n_steps
how_many_ways_total += how_many_ways(n - n_steps)
end
end
return how_many_ways_total
end
0.upto(20) {|n| puts "how_many_ways(#{n}) => #{how_many_ways(n)}"}
and a shorter version:
def how_many_ways(n)
# this is a bit Zen like, if 0 steps, then there is 1 way
# if n is negative, there is no way and therefore returns 0
return 1 if n == 0
return 0 if n < 0
return how_many_ways(n - 1) + how_many_ways(n - 2) + how_many_ways(n - 3)
end
0.upto(20) {|n| puts "how_many_ways(#{n}) => #{how_many_ways(n)}"}
and once we know it is similar to fibonacci series, we wouldn't use recursion, but use an iterative method:
#
# from 0 to 27: recursive: 4.72 second
# iterative: 0.03 second
#
def how_many_ways(n)
arr = [0, 0, 1]
n.times do
new_sum = arr.inject(:+) # sum them up
arr.push(new_sum).shift()
end
return arr[-1]
end
0.upto(27) {|n| puts "how_many_ways(#{n}) => #{how_many_ways(n)}"}
output:
how_many_ways(0) => 1
how_many_ways(1) => 1
how_many_ways(2) => 2
how_many_ways(3) => 4
how_many_ways(4) => 7
how_many_ways(5) => 13
how_many_ways(6) => 24
how_many_ways(7) => 44
how_many_ways(8) => 81
how_many_ways(9) => 149
how_many_ways(10) => 274
how_many_ways(11) => 504
how_many_ways(12) => 927
how_many_ways(13) => 1705
.
.
how_many_ways(22) => 410744
how_many_ways(23) => 755476
how_many_ways(24) => 1389537
how_many_ways(25) => 2555757
how_many_ways(26) => 4700770
how_many_ways(27) => 8646064
I like the explanation of #MichałKomorowski and the comment of #rici. Though I think if it depends on knowing K(3) = 4, then it involves counting manually.
Easily get the intuition for the problem:
Think you are climbing stairs and the possible steps you can take are 1 & 2
The total no. of ways to reach step 4 = Total no. of ways to reach step 3 + Total no of ways to reach step 2
How?
Basically, there are only two possible steps from where you can reach step 4.
Either you are in step 3 and take one step
Or you are in step 2 and take two step leap
These two are the only possibilities by which you can ever reach step 4
Similarly, there are only two possible ways to reach step 2
Either you are in step 1 and take one step
Or you are in step 0 and take two step leap
F(n) = F(n-1) + F(n-2)
F(0) = 0 and F(1) = 1 are the base cases. From here you can start building F(2), F(3) and so on. This is similar to Fibonacci series.
If the number of possible steps is increased, say [1,2,3], now for every step you have one more option i.e., you can directly leap from three steps prior to it
Hence the formula would become
F(n) = F(n-1) + F(n-2) + F(n-3)
See this video for understanding Staircase Problem Fibonacci Series
Easy understanding of code: geeksforgeeks staircase problem
Count ways to reach the nth stair using step 1, 2, 3.
We can count using simple Recursive Methods.
// Header File
#include<stdio.h>
// Function prototype for recursive Approch
int findStep(int);
int main(){
int n;
int ways=0;
ways = findStep(4);
printf("%d\n", ways);
return 0;
}
// Function Definition
int findStep(int n){
int t1, t2, t3;
if(n==1 || n==0){
return 1;
}else if(n==2){
return 2;
}
else{
t3 = findStep(n-3);
t2 = findStep(n-2);
t1 = findStep(n-1);
return t1+t2+t3;
}
}
def count(steps):
sol = []
sol.append(1)
sol.append(1 + sol[0])
sol.append(1 + sol[1] + sol[0])
if(steps > 3):
for x in range(4, steps+1):
sol[(x-1)%3] = sum(sol)
return sol[(steps-1)%3]
My solution is in java.
I decided to solve this bottom up.
I start off with having an empty array of current paths []
Each step i will add a all possible step sizes {1,2,3}
First step [] --> [[1],[2],[3]]
Second step [[1],[2],[3]] --> [[1,1],[1,2],[1,3],[2,1],[2,2],[2,3],[3,1][3,2],[3,3]]
Iteration 0: []
Iteration 1: [ [1], [2] , [3]]
Iteration 2: [ [1,1], [1,2], [1,3], [2,1], [2,2], [2,3], [3,1], [3,2], [3,3]]
Iteration 3 [ [1,1,1], [1,1,2], [1,1,3] ....]
The sequence lengths are as follows
1
2
3
5
8
13
21
My step function is called build
public class App {
public static boolean isClimedTooHigh(List<Integer> path, int maxSteps){
int sum = 0;
for (Integer i : path){
sum+=i;
}
return sum>=maxSteps;
}
public static void modify(Integer x){
x++;
return;
}
/// 1 2 3
/// 11 12 13
/// 21 22 23
/// 31 32 33
///111 121
public static boolean build(List<List<Integer>> paths, List<Integer> steps, int maxSteps){
List<List<Integer>> next = new ArrayList<List<Integer>>();
for (List<Integer> path : paths){
if (isClimedTooHigh(path, maxSteps)){
next.add(path);
}
for (Integer step : steps){
List<Integer> p = new ArrayList<Integer>(path);
p.add(step);
next.add(p);
}
}
paths.clear();
boolean completed = true;
for (List<Integer> n : next){
if (completed && !isClimedTooHigh(n, maxSteps))
completed = false;
paths.add(n);
}
return completed;
}
public static boolean isPathEqualToMax(List<Integer> path, int maxSteps){
int sum = 0;
for (Integer i : path){
sum+=i;
}
return sum==maxSteps;
}
public static void calculate( int stepSize, int maxSteps ){
List<List<Integer>> paths = new ArrayList<List<Integer>>();
List<Integer> steps = new ArrayList<Integer>();
for (int i =1; i < stepSize; i++){
List<Integer> s = new ArrayList<Integer>(1);
s.add(i);
steps.add(i);
paths.add(s);
}
while (!build(paths,steps,maxSteps));
List<List<Integer>> finalPaths = new ArrayList<List<Integer>>();
for (List<Integer> p : paths){
if (isPathEqualToMax(p, maxSteps)){
finalPaths.add(p);
}
}
System.out.println(finalPaths.size());
}
public static void main(String[] args){
calculate(3,1);
calculate(3,2);
calculate(3,3);
calculate(3,4);
calculate(3,5);
calculate(3,6);
calculate(3,7);
return;
}
}
Count total number of ways to cover the distance with 1, 2 and 3 steps.
Recursion solution time complexity is exponential i.e. O(3n).
Since same sub problems are solved again, this problem has overlapping sub problems property. So min square sum problem has both properties of a dynamic programming problem.
public class MaxStepsCount {
/** Dynamic Programming. */
private static int getMaxWaysDP(int distance) {
int[] count = new int[distance+1];
count[0] = 1;
count[1] = 1;
count[2] = 2;
/** Memorize the Sub-problem in bottom up manner*/
for (int i=3; i<=distance; i++) {
count[i] = count[i-1] + count[i-2] + count[i-3];
}
return count[distance];
}
/** Recursion Approach. */
private static int getMaxWaysRecur(int distance) {
if(distance<0) {
return 0;
} else if(distance==0) {
return 1;
}
return getMaxWaysRecur(distance-1)+getMaxWaysRecur(distance-2)
+getMaxWaysRecur(distance-3);
}
public static void main(String[] args) {
// Steps pf 1, 2 and 3.
int distance = 10;
/** Recursion Approach. */
int ways = getMaxWaysRecur(distance);
System.out.println(ways);
/** Dynamic Programming. */
ways = getMaxWaysDP(distance);
System.out.println(ways);
}
}
My blog post on this:
http://javaexplorer03.blogspot.in/2016/10/count-number-of-ways-to-cover-distance.html
Recursive memoization based C++ solution:
You ask a stair how many ways we can go to top? If its not the topmost stair, its going to ask all its neighbors and sum it up and return you the result. If its the topmost stair its going to say 1.
vector<int> getAllStairsFromHere(vector<int>& numSteps, int& numStairs, int currentStair)
{
vector<int> res;
for(auto it : numSteps)
if(it + currentStair <= numStairs)
res.push_back(it + currentStair);
return res;
}
int numWaysToClimbUtil(vector<int>& numSteps, int& numStairs, int currentStair, map<int,int>& memT)
{
auto it = memT.find(currentStair);
if(it != memT.end())
return it->second;
if(currentStair >= numStairs)
return 1;
int numWaysToClimb = 0;
auto choices = getAllStairsFromHere(numSteps, numStairs, currentStair);
for(auto it : choices)
numWaysToClimb += numWaysToClimbUtil(numSteps, numStairs, it, memT);
memT.insert(make_pair(currentStair, numWaysToClimb));
return memT[currentStair];
}
int numWaysToClimb(vector<int>numSteps, int numStairs)
{
map<int,int> memT;
int currentStair = 0;
return numWaysToClimbUtil(numSteps, numStairs, currentStair, memT);
}
Here is an O(Nk) Java implementation using dynamic programming:
public class Sample {
public static void main(String[] args) {
System.out.println(combos(new int[]{4,3,2,1}, 100));
}
public static int combos(int[] steps, int stairs) {
int[][] table = new int[stairs+1][steps.length];
for (int i = 0; i < steps.length; i++) {
for (int n = 1; n <= stairs; n++ ) {
int count = 0;
if (n % steps[i] == 0){
if (i == 0)
count++;
else {
if (n <= steps[i])
count++;
}
}
if (i > 0 && n > steps[i]) {
count += table[n - steps[i]][i];
}
if (i > 0)
count += table[n][i-1];
table[n][i] = count;
}
}
for (int n = 1; n < stairs; n++) {
System.out.print(n + "\t");
for (int i = 0; i < steps.length; i++) {
System.out.print(table[n][i] + "\t");
}
System.out.println();
}
return table[stairs][steps.length-1];
}
}
The idea is to fill the following table 1 column at a time from left to right:
N (4) (4,3) (4,3,2) (4,3,2,1)
1 0 0 0 1
2 0 0 1 2
3 0 1 1 3
4 1 1 2 5
5 0 0 1 6
6 0 1 3 9
7 0 1 2 11
8 1 1 4 15
9 0 1 3 18
10 0 1 5 23
11 0 1 4 27
12 1 2 7 34
13 0 1 5 39
..
..
99 0 9 217 7803
100 8037
Below is the several ways to use 1 , 2 and 3 steps
1: 1
2: 11 2
3: 111 12 21 3
4: 1111 121 211 112 22 13 31
5: 11111 1112 1121 1211 2111 122 212 221 113 131 311 23 32
6: 111111 11112 11121 11211 12111 21111 1113 1131 1311 3111 123 132 312 321 213 231 33 222 1122 1221 2211 1212 2121 2112
So according to above combination the soln should be:
K(n) = K(n-3) + K(n-2) + K(n - 1)
k(6) = 24 which is k(5)+k(4)+k(3) = 13+7+4
Java recursive implementation based on Michał's answer:
public class Tribonacci {
// k(0) = 1
// k(1) = 1
// k(2) = 2
// k(3) = 4
// ...
// k(n) = k(n-3) + k(n-2) + k(n - 1)
static int get(int n) {
if (n == 0) {
return 1;
} if (n == 1) {
return 1;
} else if (n == 2) {
return 2;
//} else if (n == 3) {
// return 4;
} else {
return get(n - 3) + get(n - 2) + get(n - 1);
}
}
public static void main(String[] args) {
System.out.println("Tribonacci sequence");
System.out.println(Tribonacci.get(1));
System.out.println(Tribonacci.get(2));
System.out.println(Tribonacci.get(3));
System.out.println(Tribonacci.get(4));
System.out.println(Tribonacci.get(5));
System.out.println(Tribonacci.get(6));
}
}
As the question has got only one input which is stair numbers and simple constraints, I thought result could be equal to a simple mathematical equation which can be calculated with O(1) time complexity. Apparently, it is not as simple as i thought. But, i still could do something!
By underlining this, I found an equation for solution of same question with 1 and 2 steps taken(excluding 3). It took my 1 day to find this out. Harder work can find for 3 step version too.
So, if we were allowed to take 1 or 2 steps, results would be equal to:
First notation is not mathematically perfect, but i think it is easier to understand.
On the other hand, there must be a much simpler equation as there is one for Fibonacci series. But discovering it is out of my skills.
Maybe its just 2^(n-1) with n being the number of steps?
It makes sence for me because with 4 steps you have 8 possibilities:
4,
3+1,
1+3,
2+2,
2+1+1,
1+2+1,
1+1+2,
1+1+1+1,
I guess this is the pattern

How to calculate the index (lexicographical order) when the combination is given

I know that there is an algorithm that permits, given a combination of number (no repetitions, no order), calculates the index of the lexicographic order.
It would be very useful for my application to speedup things...
For example:
combination(10, 5)
1 - 1 2 3 4 5
2 - 1 2 3 4 6
3 - 1 2 3 4 7
....
251 - 5 7 8 9 10
252 - 6 7 8 9 10
I need that the algorithm returns the index of the given combination.
es: index( 2, 5, 7, 8, 10 ) --> index
EDIT: actually I'm using a java application that generates all combinations C(53, 5) and inserts them into a TreeMap.
My idea is to create an array that contains all combinations (and related data) that I can index with this algorithm.
Everything is to speedup combination searching.
However I tried some (not all) of your solutions and the algorithms that you proposed are slower that a get() from TreeMap.
If it helps: my needs are for a combination of 5 from 53 starting from 0 to 52.
Thank you again to all :-)
Here is a snippet that will do the work.
#include <iostream>
int main()
{
const int n = 10;
const int k = 5;
int combination[k] = {2, 5, 7, 8, 10};
int index = 0;
int j = 0;
for (int i = 0; i != k; ++i)
{
for (++j; j != combination[i]; ++j)
{
index += c(n - j, k - i - 1);
}
}
std::cout << index + 1 << std::endl;
return 0;
}
It assumes you have a function
int c(int n, int k);
that will return the number of combinations of choosing k elements out of n elements.
The loop calculates the number of combinations preceding the given combination.
By adding one at the end we get the actual index.
For the given combination there are
c(9, 4) = 126 combinations containing 1 and hence preceding it in lexicographic order.
Of the combinations containing 2 as the smallest number there are
c(7, 3) = 35 combinations having 3 as the second smallest number
c(6, 3) = 20 combinations having 4 as the second smallest number
All of these are preceding the given combination.
Of the combinations containing 2 and 5 as the two smallest numbers there are
c(4, 2) = 6 combinations having 6 as the third smallest number.
All of these are preceding the given combination.
Etc.
If you put a print statement in the inner loop you will get the numbers
126, 35, 20, 6, 1.
Hope that explains the code.
Convert your number selections to a factorial base number. This number will be the index you want. Technically this calculates the lexicographical index of all permutations, but if you only give it combinations, the indexes will still be well ordered, just with some large gaps for all the permutations that come in between each combination.
Edit: pseudocode removed, it was incorrect, but the method above should work. Too tired to come up with correct pseudocode at the moment.
Edit 2: Here's an example. Say we were choosing a combination of 5 elements from a set of 10 elements, like in your example above. If the combination was 2 3 4 6 8, you would get the related factorial base number like so:
Take the unselected elements and count how many you have to pass by to get to the one you are selecting.
1 2 3 4 5 6 7 8 9 10
2 -> 1
1 3 4 5 6 7 8 9 10
3 -> 1
1 4 5 6 7 8 9 10
4 -> 1
1 5 6 7 8 9 10
6 -> 2
1 5 7 8 9 10
8 -> 3
So the index in factorial base is 1112300000
In decimal base, it's
1*9! + 1*8! + 1*7! + 2*6! + 3*5! = 410040
This is Algorithm 2.7 kSubsetLexRank on page 44 of Combinatorial Algorithms by Kreher and Stinson.
r = 0
t[0] = 0
for i from 1 to k
if t[i - 1] + 1 <= t[i] - 1
for j from t[i - 1] to t[i] - 1
r = r + choose(n - j, k - i)
return r
The array t holds your values, for example [5 7 8 9 10]. The function choose(n, k) calculates the number "n choose k". The result value r will be the index, 251 for the example. Other inputs are n and k, for the example they would be 10 and 5.
zero-base,
# v: array of length k consisting of numbers between 0 and n-1 (ascending)
def index_of_combination(n,k,v):
idx = 0
for p in range(k-1):
if p == 0: arrg = range(1,v[p]+1)
else: arrg = range(v[p-1]+2, v[p]+1)
for a in arrg:
idx += combi[n-a, k-1-p]
idx += v[k-1] - v[k-2] - 1
return idx
Null Set has the right approach. The index corresponds to the factorial-base number of the sequence. You build a factorial-base number just like any other base number, except that the base decreases for each digit.
Now, the value of each digit in the factorial-base number is the number of elements less than it that have not yet been used. So, for combination(10, 5):
(1 2 3 4 5) == 0*9!/5! + 0*8!/5! + 0*7!/5! + 0*6!/5! + 0*5!/5!
== 0*3024 + 0*336 + 0*42 + 0*6 + 0*1
== 0
(10 9 8 7 6) == 9*3024 + 8*336 + 7*42 + 6*6 + 5*1
== 30239
It should be pretty easy to calculate the index incrementally.
If you have a set of positive integers 0<=x_1 < x_2< ... < x_k , then you could use something called the squashed order:
I = sum(j=1..k) Choose(x_j,j)
The beauty of the squashed order is that it works independent of the largest value in the parent set.
The squashed order is not the order you are looking for, but it is related.
To use the squashed order to get the lexicographic order in the set of k-subsets of {1,...,n) is by taking
1 <= x1 < ... < x_k <=n
compute
0 <= n-x_k < n-x_(k-1) ... < n-x_1
Then compute the squashed order index of (n-x_k,...,n-k_1)
Then subtract the squashed order index from Choose(n,k) to get your result, which is the lexicographic index.
If you have relatively small values of n and k, you can cache all the values Choose(a,b) with a
See Anderson, Combinatorics on Finite Sets, pp 112-119
I needed also the same for a project of mine and the fastest solution I found was (Python):
import math
def nCr(n,r):
f = math.factorial
return f(n) / f(r) / f(n-r)
def index(comb,n,k):
r=nCr(n,k)
for i in range(k):
if n-comb[i]<k-i:continue
r=r-nCr(n-comb[i],k-i)
return r
My input "comb" contained elements in increasing order You can test the code with for example:
import itertools
k=3
t=[1,2,3,4,5]
for x in itertools.combinations(t, k):
print x,index(x,len(t),k)
It is not hard to prove that if comb=(a1,a2,a3...,ak) (in increasing order) then:
index=[nCk-(n-a1+1)Ck] + [(n-a1)C(k-1)-(n-a2+1)C(k-1)] + ... =
nCk -(n-a1)Ck -(n-a2)C(k-1) - .... -(n-ak)C1
There's another way to do all this. You could generate all possible combinations and write them into a binary file where each comb is represented by it's index starting from zero. Then, when you need to find an index, and the combination is given, you apply a binary search on the file. Here's the function. It's written in VB.NET 2010 for my lotto program, it works with Israel lottery system so there's a bonus (7th) number; just ignore it.
Public Function Comb2Index( _
ByVal gAr() As Byte) As UInt32
Dim mxPntr As UInt32 = WHL.AMT.WHL_SYS_00 '(16.273.488)
Dim mdPntr As UInt32 = mxPntr \ 2
Dim eqCntr As Byte
Dim rdAr() As Byte
modBinary.OpenFile(WHL.WHL_SYS_00, _
FileMode.Open, FileAccess.Read)
Do
modBinary.ReadBlock(mdPntr, rdAr)
RP: If eqCntr = 7 Then GoTo EX
If gAr(eqCntr) = rdAr(eqCntr) Then
eqCntr += 1
GoTo RP
ElseIf gAr(eqCntr) < rdAr(eqCntr) Then
If eqCntr > 0 Then eqCntr = 0
mxPntr = mdPntr
mdPntr \= 2
ElseIf gAr(eqCntr) > rdAr(eqCntr) Then
If eqCntr > 0 Then eqCntr = 0
mdPntr += (mxPntr - mdPntr) \ 2
End If
Loop Until eqCntr = 7
EX: modBinary.CloseFile()
Return mdPntr
End Function
P.S. It takes 5 to 10 mins to generate 16 million combs on a Core 2 Duo. To find the index using binary search on file takes 397 milliseconds on a SATA drive.
Assuming the maximum setSize is not too large, you can simply generate a lookup table, where the inputs are encoded this way:
int index(a,b,c,...)
{
int key = 0;
key |= 1<<a;
key |= 1<<b;
key |= 1<<c;
//repeat for all arguments
return Lookup[key];
}
To generate the lookup table, look at this "banker's order" algorithm. Generate all the combinations, and also store the base index for each nItems. (For the example on p6, this would be [0,1,5,11,15]). Note that by you storing the answers in the opposite order from the example (LSBs set first) you will only need one table, sized for the largest possible set.
Populate the lookup table by walking through the combinations doing Lookup[combination[i]]=i-baseIdx[nItems]
EDIT: Never mind. This is completely wrong.
Let your combination be (a1, a2, ..., ak-1, ak) where a1 < a2 < ... < ak. Let choose(a,b) = a!/(b!*(a-b)!) if a >= b and 0 otherwise. Then, the index you are looking for is
choose(ak-1, k) + choose(ak-1-1, k-1) + choose(ak-2-1, k-2) + ... + choose (a2-1, 2) + choose (a1-1, 1) + 1
The first term counts the number of k-element combinations such that the largest element is less than ak. The second term counts the number of (k-1)-element combinations such that the largest element is less than ak-1. And, so on.
Notice that the size of the universe of elements to be chosen from (10 in your example) does not play a role in the computation of the index. Can you see why?
Sample solution:
class Program
{
static void Main(string[] args)
{
// The input
var n = 5;
var t = new[] { 2, 4, 5 };
// Helping transformations
ComputeDistances(t);
CorrectDistances(t);
// The algorithm
var r = CalculateRank(t, n);
Console.WriteLine("n = 5");
Console.WriteLine("t = {2, 4, 5}");
Console.WriteLine("r = {0}", r);
Console.ReadKey();
}
static void ComputeDistances(int[] t)
{
var k = t.Length;
while (--k >= 0)
t[k] -= (k + 1);
}
static void CorrectDistances(int[] t)
{
var k = t.Length;
while (--k > 0)
t[k] -= t[k - 1];
}
static int CalculateRank(int[] t, int n)
{
int k = t.Length - 1, r = 0;
for (var i = 0; i < t.Length; i++)
{
if (t[i] == 0)
{
n--;
k--;
continue;
}
for (var j = 0; j < t[i]; j++)
{
n--;
r += CalculateBinomialCoefficient(n, k);
}
n--;
k--;
}
return r;
}
static int CalculateBinomialCoefficient(int n, int k)
{
int i, l = 1, m, x, y;
if (n - k < k)
{
x = k;
y = n - k;
}
else
{
x = n - k;
y = k;
}
for (i = x + 1; i <= n; i++)
l *= i;
m = CalculateFactorial(y);
return l/m;
}
static int CalculateFactorial(int n)
{
int i, w = 1;
for (i = 1; i <= n; i++)
w *= i;
return w;
}
}
The idea behind the scenes is to associate a k-subset with an operation of drawing k-elements from the n-size set. It is a combination, so the overall count of possible items will be (n k). It is a clue that we could seek the solution in Pascal Triangle. After a while of comparing manually written examples with the appropriate numbers from the Pascal Triangle, we will find the pattern and hence the algorithm.
I used user515430's answer and converted to python3. Also this supports non-continuous values so you could pass in [1,3,5,7,9] as your pool instead of range(1,11)
from itertools import combinations
from scipy.special import comb
from pandas import Index
debugcombinations = False
class IndexedCombination:
def __init__(self, _setsize, _poolvalues):
self.setsize = _setsize
self.poolvals = Index(_poolvalues)
self.poolsize = len(self.poolvals)
self.totalcombinations = 1
fast_k = min(self.setsize, self.poolsize - self.setsize)
for i in range(1, fast_k + 1):
self.totalcombinations = self.totalcombinations * (self.poolsize - fast_k + i) // i
#fill the nCr cache
self.choose_cache = {}
n = self.poolsize
k = self.setsize
for i in range(k + 1):
for j in range(n + 1):
if n - j >= k - i:
self.choose_cache[n - j,k - i] = comb(n - j,k - i, exact=True)
if debugcombinations:
print('testnth = ' + str(self.testnth()))
def get_nth_combination(self,index):
n = self.poolsize
r = self.setsize
c = self.totalcombinations
#if index < 0 or index >= c:
# raise IndexError
result = []
while r:
c, n, r = c*r//n, n-1, r-1
while index >= c:
index -= c
c, n = c*(n-r)//n, n-1
result.append(self.poolvals[-1 - n])
return tuple(result)
def get_n_from_combination(self,someset):
n = self.poolsize
k = self.setsize
index = 0
j = 0
for i in range(k):
setidx = self.poolvals.get_loc(someset[i])
for j in range(j + 1, setidx + 1):
index += self.choose_cache[n - j, k - i - 1]
j += 1
return index
#just used to test whether nth_combination from the internet actually works
def testnth(self):
n = 0
_setsize = self.setsize
mainset = self.poolvals
for someset in combinations(mainset, _setsize):
nthset = self.get_nth_combination(n)
n2 = self.get_n_from_combination(nthset)
if debugcombinations:
print(str(n) + ': ' + str(someset) + ' vs ' + str(n2) + ': ' + str(nthset))
if n != n2:
return False
for x in range(_setsize):
if someset[x] != nthset[x]:
return False
n += 1
return True
setcombination = IndexedCombination(5, list(range(1,10+1)))
print( str(setcombination.get_n_from_combination([2,5,7,8,10])))
returns 188

algorithm to sum up a list of numbers for all combinations

I have a list of numbers and I want to add up all the different combinations.
For example:
number as 1,4,7 and 13
the output would be:
1+4=5
1+7=8
1+13=14
4+7=11
4+13=17
7+13=20
1+4+7=12
1+4+13=18
1+7+13=21
4+7+13=24
1+4+7+13=25
Is there a formula to calculate this with different numbers?
A simple way to do this is to create a bit set with as much bits as there are numbers.
In your example 4.
Then count from 0001 to 1111 and sum each number that has a 1 on the set:
Numbers 1,4,7,13:
0001 = 13=13
0010 = 7=7
0011 = 7+13 = 20
1111 = 1+4+7+13 = 25
Here's how a simple recursive solution would look like, in Java:
public static void main(String[] args)
{
f(new int[] {1,4,7,13}, 0, 0, "{");
}
static void f(int[] numbers, int index, int sum, String output)
{
if (index == numbers.length)
{
System.out.println(output + " } = " + sum);
return;
}
// include numbers[index]
f(numbers, index + 1, sum + numbers[index], output + " " + numbers[index]);
// exclude numbers[index]
f(numbers, index + 1, sum, output);
}
Output:
{ 1 4 7 13 } = 25
{ 1 4 7 } = 12
{ 1 4 13 } = 18
{ 1 4 } = 5
{ 1 7 13 } = 21
{ 1 7 } = 8
{ 1 13 } = 14
{ 1 } = 1
{ 4 7 13 } = 24
{ 4 7 } = 11
{ 4 13 } = 17
{ 4 } = 4
{ 7 13 } = 20
{ 7 } = 7
{ 13 } = 13
{ } = 0
The best-known algorithm requires exponential time. If there were a polynomial-time algorithm, then you would solve the subset sum problem, and thus the P=NP problem.
The algorithm here is to create bitvector of length that is equal to the cardinality of your set of numbers. Fix an enumeration (n_i) of your set of numbers. Then, enumerate over all possible values of the bitvector. For each enumeration (e_i) of the bitvector, compute the sum of e_i * n_i.
The intuition here is that you are representing the subsets of your set of numbers by a bitvector and generating all possible subsets of the set of numbers. When bit e_i is equal to one, n_i is in the subset, otherwise it is not.
The fourth volume of Knuth's TAOCP provides algorithms for generating all possible values of the bitvector.
C#:
I was trying to find something more elegant - but this should do the trick for now...
//Set up our array of integers
int[] items = { 1, 3, 5, 7 };
//Figure out how many bitmasks we need...
//4 bits have a maximum value of 15, so we need 15 masks.
//Calculated as:
// (2 ^ ItemCount) - 1
int len = items.Length;
int calcs = (int)Math.Pow(2, len) - 1;
//Create our array of bitmasks... each item in the array
//represents a unique combination from our items array
string[] masks = Enumerable.Range(1, calcs).Select(i => Convert.ToString(i, 2).PadLeft(len, '0')).ToArray();
//Spit out the corresponding calculation for each bitmask
foreach (string m in masks)
{
//Get the items from our array that correspond to
//the on bits in our mask
int[] incl = items.Where((c, i) => m[i] == '1').ToArray();
//Write out our mask, calculation and resulting sum
Console.WriteLine(
"[{0}] {1}={2}",
m,
String.Join("+", incl.Select(c => c.ToString()).ToArray()),
incl.Sum()
);
}
Outputs as:
[0001] 7=7
[0010] 5=5
[0011] 5+7=12
[0100] 3=3
[0101] 3+7=10
[0110] 3+5=8
[0111] 3+5+7=15
[1000] 1=1
[1001] 1+7=8
[1010] 1+5=6
[1011] 1+5+7=13
[1100] 1+3=4
[1101] 1+3+7=11
[1110] 1+3+5=9
[1111] 1+3+5+7=16
Here is a simple recursive Ruby implementation:
a = [1, 4, 7, 13]
def add(current, ary, idx, sum)
(idx...ary.length).each do |i|
add(current + [ary[i]], ary, i+1, sum + ary[i])
end
puts "#{current.join('+')} = #{sum}" if current.size > 1
end
add([], a, 0, 0)
Which prints
1+4+7+13 = 25
1+4+7 = 12
1+4+13 = 18
1+4 = 5
1+7+13 = 21
1+7 = 8
1+13 = 14
4+7+13 = 24
4+7 = 11
4+13 = 17
7+13 = 20
If you do not need to print the array at each step, the code can be made even simpler and much faster because no additional arrays are created:
def add(ary, idx, sum)
(idx...ary.length).each do |i|
add(ary, i+1, sum + ary[i])
end
puts sum
end
add(a, 0, 0)
I dont think you can have it much simpler than that.
Mathematica solution:
{#, Total##}& /# Subsets[{1, 4, 7, 13}] //MatrixForm
Output:
{} 0
{1} 1
{4} 4
{7} 7
{13} 13
{1,4} 5
{1,7} 8
{1,13} 14
{4,7} 11
{4,13} 17
{7,13} 20
{1,4,7} 12
{1,4,13} 18
{1,7,13} 21
{4,7,13} 24
{1,4,7,13} 25
This Perl program seems to do what you want. It goes through the different ways to choose n items from k items. It's easy to calculate how many combinations there are, but getting the sums of each combination means you have to add them eventually. I had a similar question on Perlmonks when I was asking How can I calculate the right combination of postage stamps?.
The Math::Combinatorics module can also handle many other cases. Even if you don't want to use it, the documentation has a lot of pointers to other information about the problem. Other people might be able to suggest the appropriate library for the language you'd like to you.
#!/usr/bin/perl
use List::Util qw(sum);
use Math::Combinatorics;
my #n = qw(1 4 7 13);
foreach my $count ( 2 .. #n ) {
my $c = Math::Combinatorics->new(
count => $count, # number to choose
data => [#n],
);
print "combinations of $count from: [" . join(" ",#n) . "]\n";
while( my #combo = $c->next_combination ){
print join( ' ', #combo ), " = ", sum( #combo ) , "\n";
}
}
You can enumerate all subsets using a bitvector.
In a for loop, go from 0 to 2 to the Nth power minus 1 (or start with 1 if you don't care about the empty set).
On each iteration, determine which bits are set. The Nth bit represents the Nth element of the set. For each set bit, dereference the appropriate element of the set and add to an accumulated value.
ETA: Because the nature of this problem involves exponential complexity, there's a practical limit to size of the set you can enumerate on. If it turns out you don't need all subsets, you can look up "n choose k" for ways of enumerating subsets of k elements.
PHP: Here's a non-recursive implementation. I'm not saying this is the most efficient way to do it (this is indeed exponential 2^N - see JasonTrue's response and comments), but it works for a small set of elements. I just wanted to write something quick to obtain results. I based the algorithm off Toon's answer.
$set = array(3, 5, 8, 13, 19);
$additions = array();
for($i = 0; $i < pow(2, count($set)); $i++){
$sum = 0;
$addends = array();
for($j = count($set)-1; $j >= 0; $j--) {
if(pow(2, $j) & $i) {
$sum += $set[$j];
$addends[] = $set[$j];
}
}
$additions[] = array($sum, $addends);
}
sort($additions);
foreach($additions as $addition){
printf("%d\t%s\n", $addition[0], implode('+', $addition[1]));
}
Which will output:
0
3 3
5 5
8 8
8 5+3
11 8+3
13 13
13 8+5
16 13+3
16 8+5+3
18 13+5
19 19
21 13+8
21 13+5+3
22 19+3
24 19+5
24 13+8+3
26 13+8+5
27 19+8
27 19+5+3
29 13+8+5+3
30 19+8+3
32 19+13
32 19+8+5
35 19+13+3
35 19+8+5+3
37 19+13+5
40 19+13+8
40 19+13+5+3
43 19+13+8+3
45 19+13+8+5
48 19+13+8+5+3
For example, a case for this could be a set of resistance bands for working out. Say you get 5 bands each having different resistances represented in pounds and you can combine bands to sum up the total resistance. The bands resistances are 3, 5, 8, 13 and 19 pounds. This set gives you 32 (2^5) possible configurations, minus the zero. In this example, the algorithm returns the data sorted by ascending total resistance favoring efficient band configurations first, and for each configuration the bands are sorted by descending resistance.
This is not the code to generate the sums, but it generates the permutations. In your case:
1; 1,4; 1,7; 4,7; 1,4,7; ...
If I have a moment over the weekend, and if it's interesting, I can modify this to come up with the sums.
It's just a fun chunk of LINQ code from Igor Ostrovsky's blog titled "7 tricks to simplify your programs with LINQ" (http://igoro.com/archive/7-tricks-to-simplify-your-programs-with-linq/).
T[] arr = …;
var subsets = from m in Enumerable.Range(0, 1 << arr.Length)
select
from i in Enumerable.Range(0, arr.Length)
where (m & (1 << i)) != 0
select arr[i];
You might be interested in checking out the GNU Scientific Library if you want to avoid maintenance costs. The actual process of summing longer sequences will become very expensive (more-so than generating a single permutation on a step basis), most architectures have SIMD/vector instructions that can provide rather impressive speed-up (I would provide examples of such implementations but I cannot post URLs yet).
Thanks Zach,
I am creating a Bank Reconciliation solution. I dropped your code into jsbin.com to do some quick testing and produced this in Javascript:
function f(numbers,ids, index, sum, output, outputid, find )
{
if (index == numbers.length){
var x ="";
if (find == sum) {
y= output + " } = " + sum + " " + outputid + " }<br/>" ;
}
return;
}
f(numbers,ids, index + 1, sum + numbers[index], output + " " + numbers[index], outputid + " " + ids[index], find);
f(numbers,ids, index + 1, sum, output, outputid,find);
}
var y;
f( [1.2,4,7,13,45,325,23,245,78,432,1,2,6],[1,2,3,4,5,6,7,8,9,10,11,12,13], 0, 0, '{','{', 24.2);
if (document.getElementById('hello')) {
document.getElementById('hello').innerHTML = y;
}
I need it to produce a list of ID's to exclude from the next matching number.
I will post back my final solution using vb.net
v=[1,2,3,4]#variables to sum
i=0
clis=[]#check list for solution excluding the variables itself
def iterate(lis,a,b):
global i
global clis
while len(b)!=0 and i<len(lis):
a=lis[i]
b=lis[i+1:]
if len(b)>1:
t=a+sum(b)
clis.append(t)
for j in b:
clis.append(a+j)
i+=1
iterate(lis,a,b)
iterate(v,0,v)
its written in python. the idea is to break the list in a single integer and a list for eg. [1,2,3,4] into 1,[2,3,4]. we append the total sum now by adding the integer and sum of remaining list.also we take each individual sum i.e 1,2;1,3;1,4. checklist shall now be [1+2+3+4,1+2,1+3,1+4] then we call the new list recursively i.e now int=2,list=[3,4]. checklist will now append [2+3+4,2+3,2+4] accordingly we append the checklist till list is empty.
set is the set of sums and list is the list of the original numbers.
Its Java.
public void subSums() {
Set<Long> resultSet = new HashSet<Long>();
for(long l: list) {
for(long s: set) {
resultSet.add(s);
resultSet.add(l + s);
}
resultSet.add(l);
set.addAll(resultSet);
resultSet.clear();
}
}
public static void main(String[] args) {
// this is an example number
long number = 245L;
int sum = 0;
if (number > 0) {
do {
int last = (int) (number % 10);
sum = (sum + last) % 9;
} while ((number /= 10) > 0);
System.err.println("s = " + (sum==0 ? 9:sum);
} else {
System.err.println("0");
}
}

Resources