Calculating the Number of Possible Permutations that Meet a Requirement - algorithm

This question has been bugging me for days now, and I am at a loss as how to solve it. I have tried very hard to solve it on my own, but now I would just very much appreciate some help and a pointer in the right direction.
Problem:
Given a set of numbers, and the maximum limits for which each number can be greater than or less than the following number, determine the number of valid orderings of the numbers according to the limits.
Example:
Numbers: 20, 30, 36, 40
Max amount that a number can be greater than the following number: 16
Max amount that a number can be less than the following number: 8
Here there would be 3 valid orderings:
36 40 30 20
40 36 30 20
40 30 36 20
I have devised a way to generate all valid permutations using recursion and trees, but unfortunately it takes far too long in cases in which there are many valid orders of the list (approaches n! run time I believe). I feel as if there is a quicker, more mathematical way of solving this using combinatorics that I am just not seeing. Any advice would be greatly appreciated, thank you!
EDIT:
Here's the code for the permutation algorithm I came up with. The last part of the code tests it out with the sample I gave above. It is written in Python 3.6.
class block:
def __init__(self, val, children):
self.val = val
self.children = children
# Gets all the possible children of the current head within the limits
def get_children(head, nums, b, visited, u, d):
global total
if all(visited):
total += 1
return
for i in range(b):
if not visited[i]:
if head.val - nums[i] <= d and nums[i] - head.val <= u:
head.children.append(block(nums[i], []))
visited[i] = True
get_children(head.children[-1], nums, b, visited, u, d)
visited[i] = False
# Display all the valid permutations of the current head
def show(head, vals, b):
vals.append(head.val)
if head.children == [] and len(vals) == b:
print(*vals)
return
for child in head.children:
show(child, vals[:], b)
# Test it out with the sample
b, nums, u, d = 4, [20, 30, 36, 40], 8, 16
visited = [False for x in range(b)]
total = 0
heads = []
for i in range(b):
heads.append(block(nums[i], []))
visited[i] = True
get_children(heads[-1], nums, b, visited, u, d)
visited[i] = False
show(heads[-1], [], b)
print(total)
This prints:
36 40 30 20
40 30 36 20
40 36 30 20
3

Trying your approach with 10 equal numbers resulted in a run-time of 35 seconds.
The first thing I noticed is that the function only needs the last entry in the list head, so the function can be simplified to take an integer instead of a list. The following code has three simplifications:
Pass in an integer for head instead of a list
Change total to be a return value instead of a global
Avoid storing the children (as only the count of orderings is required)
The simplified code looks like:
def get_children(head, nums, b, visited, u, d):
if all(visited):
return 1
t = 0
for i in range(b):
if not visited[i]:
if head - nums[i] <= d and nums[i] - head <= u:
head2 = nums[i]
visited[i] = True
t += get_children(head2, nums, b, visited, u, d)
visited[i] = False
return t
# Test it out with the sample
nums, u, d = [20, 30, 36, 40], 8, 16
b = len(nums)
visited = [False for x in range(b)]
total = 0
for i in range(b):
head = nums[i]
visited[i] = True
total += get_children(head, nums, b, visited, u, d)
visited[i] = False
print(total)
This takes 7 seconds for a list of 10 equal numbers.
The second thing I noticed is that (for a particular test case) the return value of get_children only depends on the things that are True in visited and the value of head.
Therefore we can cache the results to avoid recomputing them:
cache={}
# Gets all the possible children of the current head within the limits
def get_children(head, nums, b, visited, u, d):
if all(visited):
return 1
key = head,sum(1<<i for i,v in enumerate(visited) if v)
result = cache.get(key,None)
if result is not None:
return result
t = 0
for i in range(b):
if not visited[i]:
if head - nums[i] <= d and nums[i] - head <= u:
head2 = nums[i]
visited[i] = True
t += get_children(head2, nums, b, visited, u, d)
visited[i] = False
cache[key] = t
return t
This version only takes 0.03 seconds for a list of 10 equal number (i.e. 1000 times faster than the original.)
If you are doing multiple test cases with different values of b/u/d you should reset the cache at the start of each testcase (i.e. cache={}).

As has been noted in the comments, finding all valid permutations here is equivalent to identifying all Hamiltonian paths in the directed graph that has your numbers as vertices and edges corresponding to each pair of numbers that are permitted to follow one another.
Here's a very simple Java (IDEOne) program to find such paths. Whether this makes your problem tractable depends on the size of your graph and the branching factor.
public static void main(String[] args)
{
int[] values = {20, 30, 36, 40};
Vertex[] g = new Vertex[values.length];
for(int i=0; i<g.length; i++)
g[i] = new Vertex(values[i]);
for(int i=0; i<g.length; i++)
for(int j=0; j<g.length; j++)
if(i != j && g[j].id >= g[i].id-16 && g[j].id <= g[i].id+8)
g[i].adj.add(g[j]);
Set<Vertex> toVisit = new HashSet<>(Arrays.asList(g));
LinkedList<Vertex> path = new LinkedList<>();
for(int i=0; i<g.length; i++)
{
path.addLast(g[i]);
toVisit.remove(g[i]);
findPaths(g[i], path, toVisit);
toVisit.add(g[i]);
path.removeLast();
}
}
static void findPaths(Vertex v, LinkedList<Vertex> path, Set<Vertex> toVisit)
{
if(toVisit.isEmpty())
{
System.out.println(path);
return;
}
for(Vertex av : v.adj)
{
if(toVisit.contains(av))
{
toVisit.remove(av);
path.addLast(av);
findPaths(av, path, toVisit);
path.removeLast();
toVisit.add(av);
}
}
}
static class Vertex
{
int id;
List<Vertex> adj;
Vertex(int id)
{
this.id = id;
adj = new ArrayList<>();
}
public String toString()
{
return String.valueOf(id);
}
}
Output:
[36, 40, 30, 20]
[40, 30, 36, 20]
[40, 36, 30, 20]

Related

Algorithms question: Largest contiguous subarray selection

Q. Given two arrays, A and B, of equal length, find the largest possible contiguous subarray of indices [i,j] such that max(A[i: j]) < min(B[i: j]).
Example: A = [10, 21, 5, 1, 3], B = [3, 1, 4, 23, 56]
Explanation: A[4] = 1, A[5] = 3, B[4] = 23, B[5] = 56, max(A[4], A[5]) < min(B[4], B[5])
The indices are [4,5] (inclusive), and the largest contiguous subarray has length 2
I can do this in O(n2) brute force method but cannot seem to reduce the time complexity. Any ideas?
O(n) solution:
Move index j from left to right and drag i behind so that the window from i to j is valid. So always increase j by 1, and then increase i as much as needed for the window to be valid.
To do that, keep a queue Aq of indexes of non-increasing A-values. Then A[Aq[0]] tells you the max A-value in the window. Similarly, keep a queue for non-decreasing B-values.
For each new j, first update Aq and Bq according to the new A-value and new B-value. Then, while the window is invalid, increase i and drop Aq[0] and Bq[0] if they're i. When both j and i are updated, update the result with the window size j - i - 1.
Python implementation:
def solution(A, B):
Aq = deque()
Bq = deque()
i = 0
maxlen = 0
for j in range(len(A)):
while Aq and A[Aq[-1]] < A[j]:
Aq.pop()
Aq.append(j)
while Bq and B[Bq[-1]] > B[j]:
Bq.pop()
Bq.append(j)
while Aq and A[Aq[0]] >= B[Bq[0]]:
if Aq[0] == i:
Aq.popleft()
if Bq[0] == i:
Bq.popleft()
i += 1
maxlen = max(maxlen, j - i + 1)
return maxlen
Test results from comparing against a naive brute force reference solution:
expect: 83 result: 83 same: True
expect: 147 result: 147 same: True
expect: 105 result: 105 same: True
expect: 71 result: 71 same: True
expect: 110 result: 110 same: True
expect: 56 result: 56 same: True
expect: 140 result: 140 same: True
expect: 109 result: 109 same: True
expect: 86 result: 86 same: True
expect: 166 result: 166 same: True
Testing code (Try it online!)
from timeit import timeit
from random import choices
from collections import deque
from itertools import combinations
def solution(A, B):
Aq = deque()
Bq = deque()
i = 0
maxlen = 0
for j in range(len(A)):
while Aq and A[Aq[-1]] < A[j]:
Aq.pop()
Aq.append(j)
while Bq and B[Bq[-1]] > B[j]:
Bq.pop()
Bq.append(j)
while Aq and A[Aq[0]] >= B[Bq[0]]:
if Aq[0] == i:
Aq.popleft()
if Bq[0] == i:
Bq.popleft()
i += 1
maxlen = max(maxlen, j - i + 1)
return maxlen
def naive(A, B):
return max((j - i + 1
for i, j in combinations(range(len(A)), 2)
if max(A[i:j+1]) < min(B[i:j+1])),
default=0)
for _ in range(10):
n = 500
A = choices(range(42), k=n)
B = choices(range(1234), k=n)
expect = naive(A, B)
result = solution(A, B)
print(f'expect: {expect:3} result: {result:3} same: {result == expect}')
I can see based on the problem, saying we have 2 conditions:
max(A[i,j-1]) < min(B[i,j-1])
max(A[i,j]) >= min(B[i,j])
saying maxA is index of max item in A array in [i,j], minB is index of min item in B array in [i,j]; and anchor is min(maxA, minB)
Then we will have: max(A[i+k,anchor]) >= min(B[i+k,anchor]) ∀ k in [i+1,anchor].
So I come up with a simple algorithm like below:
int extractLongestRange(int n, int[] A, int[] B) {
// n is size of both array
int size = 0;
for(int i = 0; i < n; i++){
int maxAIndex = i;
int minBIndex = i;
for(int j = i; j < n; j++){
if(A[maxAIndex] < A[j]){
maxAIndex = j;
}
if(B[minBIndex] > B[j]){
minBIndex = j;
}
if(A[maxAIndex] >= B[minBIndex]){
if(size < j - i){
size = j - i;
}
// here, need to jump back to min of maxAIndex and minBIndex.
i = Math.min(maxAIndex, minBIndex);
break;
}
// this case, if j reach the end of array
if(j == n - 1){
if(size < j - i + 1){
size = j - i + 1;
i = j;
}
}
}
}
return size;
}
With this approach, the complexity is O(n).
We can change the output to pick-up the other information if needed.
This can be solved in O(n log(n)).
Here is an outline of my technique.
My idea looks like a "rising water level" of maximum in A, keeping track of the "islands" emerging in A, and the "islands" submerging in B. And recording the maximum intersections after they appear from A emerging, or before they disappear from B sinking.
We will need 2 balanced binary trees of intervals in A and B, and a priority queue of events.
The tree of intervals need to support a logarithmic "find and return interval containing i if it exists". It also needs to support a logarithmic "add i to tree, extending/merging intervals as appropriate, and return new interval". And likewise a logarithmic "remove i from tree, shortening, splitting, or deleting its interval as appropriate".
Events can be of the form "remove B[i]" or "add A[i]". The priority queue is sorted first by the size of the element added/subtracted, and then by putting B before A. (So we do not elements of size x to A until all of the elements of size x are removed from B.)
With that in mind, here is pseudocode for how to do it.
Put all possible events in the priority queue
Initialize an empty tree of intervals A_tree
Initialize a tree of intervals B_tree containing the entire range
Initialize max interval to (0, -1) (length 0)
For each event (type, i) in queue:
if event.type = A:
new_interval = A_tree.add(event.i)
if event.i in B_tree:
Intersect event.i's A_tree interval with event.i's B_tree interval
if intersection is longer than max_interval:
update to new and better max_interval
else:
if event.i in A_tree:
Intersect event.i's A_tree interval with event.i's B_tree interval
if intersection is longer than max_interval:
update to new and better max_interval
B_tree.remove(event.i)
Processing any event is O(log(n). There are 2n = O(n) events. So overall time is O(n log(n)).

Reconstructing input to encoder from output

I would like to understand how to solve the Codility ArrayRecovery challenge, but I don't even know what branch of knowledge to consult. Is it combinatorics, optimization, computer science, set theory, or something else?
Edit:
The branch of knowledge to consult is constraint programming, particularly constraint propagation. You also need some combinatorics to know that if you take k numbers at a time from the range [1..n], with the restriction that no number can be bigger than the one before it, that works out to be
(n+k-1)!/k!(n-1)! possible combinations
which is the same as the number of combinations with replacements of n things taken k at a time, which has the mathematical notation . You can read about why it works out like that here.
Peter Norvig provides an excellent example of how to solve this kind of problem with his Sudoku solver.
You can read the full description of the ArrayRecovery problem via the link above. The short story is that there is an encoder that takes a sequence of integers in the range 1 up to some given limit (say 100 for our purposes) and for each element of the input sequence outputs the most recently seen integer that is smaller than the current input, or 0 if none exists.
input 1, 2, 3, 4 => output 0, 1, 2, 3
input 2, 4, 3 => output 0, 2, 2
The full task is, given the output (and the range of allowable input), figure out how many possible inputs could have generated it. But before I even get to that calculation, I'm not confident about how to even approach formulating the equation. That is what I am asking for help with. (Of course a full solution would be welcome, too, if it is explained.)
I just look at some possible outputs and wonder. Here are some sample encoder outputs and the inputs I can come up with, with * meaning any valid input and something like > 4 meaning any valid input greater than 4. If needed, inputs are referred to as A1, A2, A3, ... (1-based indexing)
Edit #2
Part of the problem I was having with this challenge is that I did not manually generate the exactly correct sets of possible inputs for an output. I believe the set below is correct now. Look at this answer's edit history if you want to see my earlier mistakes.
output #1: 0, 0, 0, 4
possible inputs: [>= 4, A1 >= * >= 4, 4, > 4]
output #2: 0, 0, 0, 2, 3, 4 # A5 ↴ See more in discussion below
possible inputs: [>= 2, A1 >= * >=2, 2, 3, 4, > 4]
output #3: 0, 0, 0, 4, 3, 1
possible inputs: none # [4, 3, 1, 1 >= * > 4, 4, > 1] but there is no number 1 >= * > 4
The second input sequence is very tightly constrained compared to the first just by adding 2 more outputs. The third sequence is so constrained as to be impossible.
But the set of constraints on A5 in example #2 is a bit harder to articulate. Of course A5 > O5, that is the basic constraint on all the inputs. But any output > A4 and after O5 has to appear in the input after A4, so A5 has to be an element of the set of numbers that comes after A5 that is also > A4. Since there is only 1 such number (A6 == 4), A5 has to be it, but it gets more complicated if there is a longer string of numbers that follow. (Editor's note: actually it doesn't.)
As the output set gets longer, I worry these constraints just get more complicated and harder to get right. I cannot think of any data structures for efficiently representing these in a way that leads to efficiently calculating the number of possible combinations. I also don't quite see how to algorithmically add constraint sets together.
Here are the constraints I see so far for any given An
An > On
An <= min(Set of other possible numbers from O1 to n-1 > On). How to define the set of possible numbers greater than On?
Numbers greater than On that came after the most recent occurrence of On in the input
An >= max(Set of other possible numbers from O1 to n-1 < On). How to define the set of possible numbers less than On?
Actually this set is empty because On is, by definition, the largest possible number from the previous input sequence. (Which it not to say it is strictly the largest number from the previous input sequence.)
Any number smaller than On that came before the last occurrence of it in the input would be ineligible because of the "nearest" rule. No numbers smaller that On could have occurred after the most recent occurrence because of the "nearest" rule and because of the transitive property: if Ai < On and Aj < Ai then Aj < On
Then there is the set theory:
An must be an element of the set of unaccounted-for elements of the set of On+1 to Om, where m is the smallest m > n such that Om < On. Any output after such Om and larger than Om (which An is) would have to appear as or after Am.
An element is unaccounted-for if it is seen in the output but does not appear in the input in a position that is consistent with the rest of the output. Obviously I need a better definition than this in order to code and algorithm to calculate it.
It seems like perhaps some kind of set theory and/or combinatorics or maybe linear algebra would help with figuring out the number of possible sequences that would account for all of the unaccounted-for outputs and fit the other constraints. (Editor's note: actually, things never get that complicated.)
The code below passes all of Codility's tests. The OP added a main function to use it on the command line.
The constraints are not as complex as the OP thinks. In particular, there is never a situation where you need to add a restriction that an input be an element of some set of specific integers seen elsewhere in the output. Every input position has a well-defined minimum and maximum.
The only complication to that rule is that sometimes the maximum is "the value of the previous input" and that input itself has a range. But even then, all the values like that are consecutive and have the same range, so the number of possibilities can be calculated with basic combinatorics, and those inputs as a group are independent of the other inputs (which only serve to set the range), so the possibilities of that group can be combined with the possibilities of other input positions by simple multiplication.
Algorithm overview
The algorithm makes a single pass through the output array updating the possible numbers of input arrays after every span, which is what I am calling repetitions of numbers in the output. (You might say maximal subsequences of the output where every element is identical.) For example, for output 0,1,1,2 we have three spans: 0, 1,1 and 2. When a new span begins, the number of possibilities for the previous span is calculated.
This decision was based on a few observations:
For spans longer than 1 in length, the minimum value of the input
allowed in the first position is whatever the value is of the input
in the second position. Calculating the number of possibilities of a
span is straightforward combinatorics, but the standard formula
requires knowing the range of the numbers and the length of the span.
Every time the value of the
output changes (and a new span beings), that strongly constrains the value of the previous span:
When the output goes up, the only possible reason is that the previous input was the value of the new, higher output and the input corresponding to the position of the new, higher output, was even higher.
When an output goes down, new constraints are established, but those are a bit harder to articulate. The algorithm stores stairs (see below) in order to quantify the constraints imposed when the output goes down
The aim here was to confine the range of possible values for every span. Once we do that accurately, calculating the number of combinations is straightforward.
Because the encoder backtracks looking to output a number that relates to the input in 2 ways, both smaller and closer, we know we can throw out numbers that are larger and farther away. After a small number appears in the output, no larger number from before that position can have any influence on what follows.
So to confine these ranges of input when the output sequence decreased, we need to store stairs - a list of increasingly larger possible values for the position in the original array. E.g for 0,2,5,7,2,4 stairs build up like this: 0, 0,2, 0,2,5, 0,2,5,7, 0,2, 0,2,4.
Using these bounds we can tell for sure that the number in the position of the second 2 (next to last position in the example) must be in (2,5], because 5 is the next stair. If the input were greater than 5, a 5 would have been output in that space instead of a 2. Observe, that if the last number in the encoded array was not 4, but 6, we would exit early returning 0, because we know that the previous number couldn't be bigger than 5.
The complexity is O(n*lg(min(n,m))).
Functions
CombinationsWithReplacement - counts number of combinations with replacements of size k from n numbers. E.g. for (3, 2) it counts 3,3, 3,2, 3,1, 2,2, 2,1, 1,1, so returns 6 It is the same as choose(n - 1 + k, n - 1).
nextBigger - finds next bigger element in a range. E.g. for 4 in sub-array 1,2,3,4,5 it returns 5, and in sub-array 1,3 it returns its parameter Max.
countSpan (lambda) - counts how many different combinations a span we have just passed can have. Consider span 2,2 for 0,2,5,7,2,2,7.
When curr gets to the final position, curr is 7 and prev is the final 2 of the 2,2 span.
It computes maximum and minimum possible values of the prev span. At this point stairs consist of 2,5,7 then maximum possible value is 5 (nextBigger after 2 in the stair 2,5,7). A value of greater than 5 in this span would have output a 5, not a 2.
It computes a minimum value for the span (which is the minimum value for every element in the span), which is prev at this point, (remember curr at this moment equals to 7 and prev to 2). We know for sure that in place of the final 2 output, the original input has to have 7, so the minimum is 7. (This is a consequence of the "output goes up" rule. If we had 7,7,2 and curr would be 2 then the minimum for the previous span (the 7,7) would be 8 which is prev + 1.
It adjusts the number of combinations. For a span of length L with a range of n possibilities (1+max-min), there are possibilities, with k being either L or L-1 depending on what follows the span.
For a span followed by a larger number, like 2,2,7, k = L - 1 because the last position of the 2,2 span has to be 7 (the value of the first number after the span).
For a span followed by a smaller number, like 7,7,2, k = L because
the last element of 7,7 has no special constraints.
Finally, it calls CombinationsWithReplacement to find out the number of branches (or possibilities), computes new res partial results value (remainder values in the modulo arithmetic we are doing), and returns new res value and max for further handling.
solution - iterates over the given Encoder Output array. In the main loop, while in a span it counts the span length, and at span boundaries it updates res by calling countSpan and possibly updates the stairs.
If the current span consists of a bigger number than the previous one, then:
Check validity of the next number. E.g 0,2,5,2,7 is invalid input, becuase there is can't be 7 in the next-to-last position, only 3, or 4, or 5.
It updates the stairs. When we have seen only 0,2, the stairs are 0,2, but after the next 5, the stairs become 0,2,5.
If the current span consists of a smaller number then the previous one, then:
It updates stairs. When we have seen only 0,2,5, our stairs are 0,2,5, but after we have seen 0,2,5,2 the stairs become 0,2.
After the main loop it accounts for the last span by calling countSpan with -1 which triggers the "output goes down" branch of calculations.
normalizeMod, extendedEuclidInternal, extendedEuclid, invMod - these auxiliary functions help to deal with modulo arithmetic.
For stairs I use storage for the encoded array, as the number of stairs never exceeds current position.
#include <algorithm>
#include <cassert>
#include <vector>
#include <tuple>
const int Modulus = 1'000'000'007;
int CombinationsWithReplacement(int n, int k);
template <class It>
auto nextBigger(It begin, It end, int value, int Max) {
auto maxIt = std::upper_bound(begin, end, value);
auto max = Max;
if (maxIt != end) {
max = *maxIt;
}
return max;
}
auto solution(std::vector<int> &B, const int Max) {
auto res = 1;
const auto size = (int)B.size();
auto spanLength = 1;
auto prev = 0;
// Stairs is the list of numbers which could be smaller than number in the next position
const auto stairsBegin = B.begin();
// This includes first entry (zero) into stairs
// We need to include 0 because we can meet another zero later in encoded array
// and we need to be able to find in stairs
auto stairsEnd = stairsBegin + 1;
auto countSpan = [&](int curr) {
const auto max = nextBigger(stairsBegin, stairsEnd, prev, Max);
// At the moment when we switch from the current span to the next span
// prev is the number from previous span and curr from current.
// E.g. 1,1,7, when we move to the third position cur = 7 and prev = 1.
// Observe that, in this case minimum value possible in place of any of 1's can be at least 2=1+1=prev+1.
// But if we consider 7, then we have even more stringent condition for numbers in place of 1, it is 7
const auto min = std::max(prev + 1, curr);
const bool countLast = prev > curr;
const auto branchesCount = CombinationsWithReplacement(max - min + 1, spanLength - (countLast ? 0 : 1));
return std::make_pair(res * (long long)branchesCount % Modulus, max);
};
for (int i = 1; i < size; ++i) {
const auto curr = B[i];
if (curr == prev) {
++spanLength;
}
else {
int max;
std::tie(res, max) = countSpan(curr);
if (prev < curr) {
if (curr > max) {
// 0,1,5,1,7 - invalid because number in the fourth position lies in [2,5]
// and so in the fifth encoded position we can't something bigger than 5
return 0;
}
// It is time to possibly shrink stairs.
// E.g if we had stairs 0,2,4,9,17 and current value is 5,
// then we no more interested in 9 and 17, and we change stairs to 0,2,4,5.
// That's because any number bigger than 9 or 17 also bigger than 5.
const auto s = std::lower_bound(stairsBegin, stairsEnd, curr);
stairsEnd = s;
*stairsEnd++ = curr;
}
else {
assert(curr < prev);
auto it = std::lower_bound(stairsBegin, stairsEnd, curr);
if (it == stairsEnd || *it != curr) {
// 0,5,1 is invalid sequence because original sequence lloks like this 5,>5,>1
// and there is no 1 in any of the two first positions, so
// it can't appear in the third position of the encoded array
return 0;
}
}
spanLength = 1;
}
prev = curr;
}
res = countSpan(-1).first;
return res;
}
template <class T> T normalizeMod(T a, T m) {
if (a < 0) return a + m;
return a;
}
template <class T> std::pair<T, std::pair<T, T>> extendedEuclidInternal(T a, T b) {
T old_x = 1;
T old_y = 0;
T x = 0;
T y = 1;
while (true) {
T q = a / b;
T t = a - b * q;
if (t == 0) {
break;
}
a = b;
b = t;
t = x; x = old_x - x * q; old_x = t;
t = y; y = old_y - y * q; old_y = t;
}
return std::make_pair(b, std::make_pair(x, y));
}
// Returns gcd and Bezout's coefficients
template <class T> std::pair<T, std::pair<T, T>> extendedEuclid(T a, T b) {
if (a > b) {
if (b == 0) return std::make_pair(a, std::make_pair(1, 0));
return extendedEuclidInternal(a, b);
}
else {
if (a == 0) return std::make_pair(b, std::make_pair(0, 1));
auto p = extendedEuclidInternal(b, a);
std::swap(p.second.first, p.second.second);
return p;
}
}
template <class T> T invMod(T a, T m) {
auto p = extendedEuclid(a, m);
assert(p.first == 1);
return normalizeMod(p.second.first, m);
}
int CombinationsWithReplacement(int n, int k) {
int res = 1;
for (long long i = n; i < n + k; ++i) {
res = res * i % Modulus;
}
int denom = 1;
for (long long i = k; i > 0; --i) {
denom = denom * i % Modulus;
}
res = res * (long long)invMod(denom, Modulus) % Modulus;
return res;
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Only the above is needed for the Codility challenge. Below is to run on the command line.
//
// Compile with: gcc -std=gnu++14 -lc++ -lstdc++ array_recovery.cpp
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#include <string.h>
// Usage: 0 1 2,3, 4 M
// Last arg is M, the max value for an input.
// Remaining args are B (the output of the encoder) separated by commas and/or spaces
// Parentheses and brackets are ignored, so you can use the same input form as Codility's tests: ([1,2,3], M)
int main(int argc, char* argv[]) {
int Max;
std::vector<int> B;
const char* delim = " ,[]()";
if (argc < 2 ) {
printf("Usage: %s M 0 1 2,3, 4... \n", argv[0]);
return 1;
}
for (int i = 1; i < argc; i++) {
char* parse;
parse = strtok(argv[i], delim);
while (parse != NULL)
{
B.push_back(atoi(parse));
parse = strtok (NULL, delim);
}
}
Max = B.back();
B.pop_back();
printf("%d\n", solution(B, Max));
return 0;
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Only the above is needed for the Codility challenge. Below is to run on the command line.
//
// Compile with: gcc -std=gnu++14 -lc++ -lstdc++ array_recovery.cpp
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#include <string.h>
// Usage: M 0 1 2,3, 4
// first arg is M, the max value for an input.
// remaining args are B (the output of the encoder) separated by commas and/or spaces
int main(int argc, char* argv[]) {
int Max;
std::vector<int> B;
const char* delim = " ,";
if (argc < 3 ) {
printf("Usage: %s M 0 1 2,3, 4... \n", argv[0]);
return 1;
}
Max = atoi(argv[1]);
for (int i = 2; i < argc; i++) {
char* parse;
parse = strtok(argv[i], delim);
while (parse != NULL)
{
B.push_back(atoi(parse));
parse = strtok (NULL, delim);
}
}
printf("%d\n", solution(B, Max));
return 0;
}
Let's see an example:
Max = 5
Array is
0 1 3 0 1 1 3
1
1 2..5
1 3 4..5
1 3 4..5 1
1 3 4..5 1 2..5
1 3 4..5 1 2..5 >=..2 (sorry, for a cumbersome way of writing)
1 3 4..5 1 3..5 >=..3 4..5
Now count:
1 1 2 1 3 2 which amounts to 12 total.
Here's an idea. One known method to construct the output is to use a stack. We pop it while the element is greater or equal, then output the smaller element if it exists, then push the greater element onto the stack. Now what if we attempted to do this backwards from the output?
First we'll demonstrate the stack method using the c∅dility example.
[2, 5, 3, 7, 9, 6]
2: output 0, stack [2]
5: output 2, stack [2,5]
3: pop 5, output, 2, stack [2,3]
7: output 3, stack [2,3,7]
... etc.
Final output: [0, 2, 2, 3, 7, 3]
Now let's try reconstruction! We'll use stack both as the imaginary stack and as the reconstituted input:
(Input: [2, 5, 3, 7, 9, 6])
Output: [0, 2, 2, 3, 7, 3]
* Something >3 that reached 3 in the stack
stack = [3, 3 < *]
* Something >7 that reached 7 in the stack
but both of those would've popped before 3
stack = [3, 7, 7 < x, 3 < * <= x]
* Something >3, 7 qualifies
stack = [3, 7, 7 < x, 3 < * <= x]
* Something >2, 3 qualifies
stack = [2, 3, 7, 7 < x, 3 < * <= x]
* Something >2 and >=3 since 3 reached 2
stack = [2, 2 < *, 3, 7, 7 < x, 3 < * <= x]
Let's attempt your examples:
Example 1:
[0, 0, 0, 2, 3, 4]
* Something >4
stack = [4, 4 < *]
* Something >3, 4 qualifies
stack = [3, 4, 4 < *]
* Something >2, 3 qualifies
stack = [2, 3, 4, 4 < *]
* The rest is non-increasing with lowerbound 2
stack = [y >= x, x >= 2, 2, 3, 4, >4]
Example 2:
[0, 0, 0, 4]
* Something >4
stack [4, 4 < *]
* Non-increasing
stack = [z >= y, y >= 4, 4, 4 < *]
Calculating the number of combinations is achieved by multiplying together the possibilities for all the sections. A section is either a bounded single cell; or a bound, non-increasing subarray of one or more cells. To calculate the latter we use the multi-choose binomial, (n + k - 1) choose (k - 1). Consider that we can express the differences between the cells of a bound, non-increasing sequence of 3 cells as:
(ub - cell_3) + (cell_3 - cell_2) + (cell_2 - cell_1) + (cell_1 - lb) = ub - lb
Then the number of ways to distribute ub - lb into (x + 1) cells is
(n + k - 1) choose (k - 1)
or
(ub - lb + x) choose x
For example, the number of non-increasing sequences between
(3,4) in two cells is (4 - 3 + 2) choose 2 = 3: [3,3] [4,3] [4,4]
And the number of non-increasing sequences between
(3,4) in three cells is (4 - 3 + 3) choose 3 = 4: [3,3,3] [4,3,3] [4,4,3] [4,4,4]
(Explanation attributed to Brian M. Scott.)
Rough JavaScript sketch (the code is unreliable; it's only meant to illustrate the encoding. The encoder lists [lower_bound, upper_bound], or a non-increasing sequence as [non_inc, length, lower_bound, upper_bound]):
function f(A, M){
console.log(JSON.stringify(A), M);
let i = A.length - 1;
let last = A[i];
let s = [[last,last]];
if (A[i-1] == last){
let d = 1;
s.splice(1,0,['non_inc',d++,last,M]);
while (i > 0 && A[i-1] == last){
s.splice(1,0,['non_inc',d++,last,M]);
i--
}
} else {
s.push([last+1,M]);
i--;
}
if (i == 0)
s.splice(0,1);
for (; i>0; i--){
let x = A[i];
if (x < s[0][0])
s = [[x,x]].concat(s);
if (x > s[0][0]){
let [l, _l] = s[0];
let [lb, ub] = s[1];
s[0] = [x+1, M];
s[1] = [lb, x];
s = [[l,_l], [x,x]].concat(s);
}
if (x == s[0][0]){
let [l,_l] = s[0];
let [lb, ub] = s[1];
let d = 1;
s.splice(0,1);
while (i > 0 && A[i-1] == x){
s =
[['non_inc', d++, lb, M]].concat(s);
i--;
}
if (i > 0)
s = [[l,_l]].concat(s);
}
}
// dirty fix
if (s[0][0] == 0)
s.splice(0,1);
return s;
}
var a = [2, 5, 3, 7, 9, 6]
var b = [0, 2, 2, 3, 7, 3]
console.log(JSON.stringify(a));
console.log(JSON.stringify(f(b,10)));
b = [0,0,0,4]
console.log(JSON.stringify(f(b,10)));
b = [0,2,0,0,0,4]
console.log(JSON.stringify(f(b,10)));
b = [0,0,0,2,3,4]
console.log(JSON.stringify(f(b,10)));
b = [0,2,2]
console.log(JSON.stringify(f(b,4)));
b = [0,3,5,6]
console.log(JSON.stringify(f(b,10)));
b = [0,0,3,0]
console.log(JSON.stringify(f(b,10)));

Display all the possible numbers having its digits in ascending order

Write a program that can display all the possible numbers in between given two numbers, having its digits in ascending order.
For Example:-
Input: 5000 to 6000
Output: 5678 5679 5689 5789
Input: 90 to 124
Output: 123 124
Brute force approach can make it count to all numbers and check of digits for each one of them. But I want approaches that can skip some numbers and can bring complexity lesser than O(n). Do any such solution(s) exists that can give better approach for this problem?
I offer a solution in Python. It is efficient as it considers only the relevant numbers. The basic idea is to count upwards, but handle overflow somewhat differently. While we normally set overflowing digits to 0, here we set them to the previous digit +1. Please check the inline comments for further details. You can play with it here: http://ideone.com/ePvVsQ
def ascending( na, nb ):
assert nb>=na
# split each number into a list of digits
a = list( int(x) for x in str(na))
b = list( int(x) for x in str(nb))
d = len(b) - len(a)
# if both numbers have different length add leading zeros
if d>0:
a = [0]*d + a # add leading zeros
assert len(a) == len(b)
n = len(a)
# check if the initial value has increasing digits as required,
# and fix if necessary
for x in range(d+1, n):
if a[x] <= a[x-1]:
for y in range(x, n):
a[y] = a[y-1] + 1
break
res = [] # result set
while a<=b:
# if we found a value and add it to the result list
# turn the list of digits back into an integer
if max(a) < 10:
res.append( int( ''.join( str(k) for k in a ) ) )
# in order to increase the number we look for the
# least significant digit that can be increased
for x in range( n-1, -1, -1): # count down from n-1 to 0
if a[x] < 10+x-n:
break
# digit x is to be increased
a[x] += 1
# all subsequent digits must be increased accordingly
for y in range( x+1, n ):
a[y] = a[y-1] + 1
return res
print( ascending( 5000, 9000 ) )
Sounds like task from Project Euler. Here is the solution in C++. It is not short, but it is straightforward and effective. Oh, and hey, it uses backtracking.
// Higher order digits at the back
typedef std::vector<int> Digits;
// Extract decimal digits of a number
Digits ExtractDigits(int n)
{
Digits digits;
while (n > 0)
{
digits.push_back(n % 10);
n /= 10;
}
if (digits.empty())
{
digits.push_back(0);
}
return digits;
}
// Main function
void PrintNumsRec(
const Digits& minDigits, // digits of the min value
const Digits& maxDigits, // digits of the max value
Digits& digits, // digits of current value
int pos, // current digits with index greater than pos are already filled
bool minEq, // currently filled digits are the same as of min value
bool maxEq) // currently filled digits are the same as of max value
{
if (pos < 0)
{
// Print current value. Handle leading zeros by yourself, if need
for (auto pDigit = digits.rbegin(); pDigit != digits.rend(); ++pDigit)
{
if (*pDigit >= 0)
{
std::cout << *pDigit;
}
}
std::cout << std::endl;
return;
}
// Compute iteration boundaries for current position
int first = minEq ? minDigits[pos] : 0;
int last = maxEq ? maxDigits[pos] : 9;
// The last filled digit
int prev = digits[pos + 1];
// Make sure generated number has increasing digits
int firstInc = std::max(first, prev + 1);
// Iterate through possible cases for current digit
for (int d = firstInc; d <= last; ++d)
{
digits[pos] = d;
if (d == 0 && prev == -1)
{
// Mark leading zeros with -1
digits[pos] = -1;
}
PrintNumsRec(minDigits, maxDigits, digits, pos - 1, minEq && (d == first), maxEq && (d == last));
}
}
// High-level function
void PrintNums(int min, int max)
{
auto minDigits = ExtractDigits(min);
auto maxDigits = ExtractDigits(max);
// Make digits array of the same size
while (minDigits.size() < maxDigits.size())
{
minDigits.push_back(0);
}
Digits digits(minDigits.size());
int pos = digits.size() - 1;
// Placeholder for leading zero
digits.push_back(-1);
PrintNumsRec(minDigits, maxDigits, digits, pos, true, true);
}
void main()
{
PrintNums(53, 297);
}
It uses recursion to handle arbitrary amount of digits, but it is essentially the same as the nested loops approach. Here is the output for (53, 297):
056
057
058
059
067
068
069
078
079
089
123
124
125
126
127
128
129
134
135
136
137
138
139
145
146
147
148
149
156
157
158
159
167
168
169
178
179
189
234
235
236
237
238
239
245
246
247
248
249
256
257
258
259
267
268
269
278
279
289
Much more interesting problem would be to count all these numbers without explicitly computing it. One would use dynamic programming for that.
There is only a very limited number of numbers which can match your definition (with 9 digits max) and these can be generated very fast. But if you really need speed, just cache the tree or the generated list and do a lookup when you need your result.
using System;
using System.Collections.Generic;
namespace so_ascending_digits
{
class Program
{
class Node
{
int digit;
int value;
List<Node> children;
public Node(int val = 0, int dig = 0)
{
digit = dig;
value = (val * 10) + digit;
children = new List<Node>();
for (int i = digit + 1; i < 10; i++)
{
children.Add(new Node(value, i));
}
}
public void Collect(ref List<int> collection, int min = 0, int max = Int16.MaxValue)
{
if ((value >= min) && (value <= max)) collection.Add(value);
foreach (Node n in children) if (value * 10 < max) n.Collect(ref collection, min, max);
}
}
static void Main(string[] args)
{
Node root = new Node();
List<int> numbers = new List<int>();
root.Collect(ref numbers, 5000, 6000);
numbers.Sort();
Console.WriteLine(String.Join("\n", numbers));
}
}
}
Why the brute force algorithm may be very inefficient.
One efficient way of encoding the input is to provide two numbers: the lower end of the range, a, and the number of values in the range, b-a-1. This can be encoded in O(lg a + lg (b - a)) bits, since the number of bits needed to represent a number in base-2 is roughly equal to the base-2 logarithm of the number. We can simplify this to O(lg b), because intuitively if b - a is small, then a = O(b), and if b - a is large, then b - a = O(b). Either way, the total input size is O(2 lg b) = O(lg b).
Now the brute force algorithm just checks each number from a to b, and outputs the numbers whose digits in base 10 are in increasing order. There are b - a + 1 possible numbers in that range. However, when you represent this in terms of the input size, you find that b - a + 1 = 2lg (b - a + 1) = 2O(lg b) for a large enough interval.
This means that for an input size n = O(lg b), you may need to check in the worst case O(2 n) values.
A better algorithm
Instead of checking every possible number in the interval, you can simply generate the valid numbers directly. Here's a rough overview of how. A number n can be thought of as a sequence of digits n1 ... nk, where k is again roughly log10 n.
For a and a four-digit number b, the iteration would look something like
for w in a1 .. 9:
for x in w+1 .. 9:
for y in x+1 .. 9:
for x in y+1 .. 9:
m = 1000 * w + 100 * x + 10 * y + w
if m < a:
next
if m > b:
exit
output w ++ x ++ y ++ z (++ is just string concatenation)
where a1 can be considered 0 if a has fewer digits than b.
For larger numbers, you can imagine just adding more nested for loops. In general, if b has d digits, you need d = O(lg b) loops, each of which iterates at most 10 times. The running time is thus O(10 lg b) = O(lg b) , which is a far better than the O(2lg b) running time you get by checking if every number is sorted or not.
One other detail that I have glossed over, which actually does affect the running time. As written, the algorithm needs to consider the time it takes to generate m. Without going into the details, you could assume that this adds at worst a factor of O(lg b) to the running time, resulting in an O(lg2 b) algorithm. However, using a little extra space at the top of each for loop to store partial products would save lots of redundant multiplication, allowing us to preserve the originally stated O(lg b) running time.
One way (pseudo-code):
for (digit3 = '5'; digit3 <= '6'; digit3++)
for (digit2 = digit3+1; digit2 <= '9'; digit2++)
for (digit1 = digit2+1; digit1 <= '9'; digit1++)
for (digit0 = digit1+1; digit0 <= '9'; digit0++)
output = digit3 + digit2 + digit1 + digit0; // concatenation

find all subsets that sum to x - using an initial code

I am trying to build upon a problem, to solve another similar problem... given below is a code for finding the total number of subsets that sum to a particular value, and I am trying to modify the code so that I can return all subsets that sum to that value (instead of finding the count).
Code for finding the total number of suibsets that sum to 'sum':
/**
* method to return number of sets with a given sum.
**/
public static int count = 0;
public static void countSubsetSum2(int arr[], int k, int sum) {
if(sum == 0) {
count++;
return;
}
if(sum != 0 && k == 0) {
return;
}
if(sum < arr[k - 1]) {
countSubsetSum2(arr, k-1, sum);
}
countSubsetSum2(arr, k-1, sum - arr[k-1]);
countSubsetSum2(arr, k-1, sum);
}
Can someone propose some changes to this code, to make it return the subsets rather than the subset count?
Firstly, your code isn't correct.
The function, at every step, recurses with the sum excluding and including the current element 1, moving on to the next element, thanks to these lines:
countSubsetSum2(arr, k-1, sum - arr[k-1]);
countSubsetSum2(arr, k-1, sum);
But then there's also this:
if(sum < arr[k - 1]) {
countSubsetSum2(arr, k-1, sum);
}
which causes it to recurse twice with the sum excluding the current element under some circumstances (which it should never do).
Essentially you just need to remove that if-statement.
If all the elements are positive and sum - arr[k-1] < 0, we'd keep going, but we can never get a sum of 0 since the sum can't increase, thus we'd be doing a lot of unnecessary work. So, if the elements are all positive, we can add a check for if(arr[k - 1] <= sum) to the first call to improve the running time. If the elements aren't all positive, the code won't find all sums.
Now on to printing the sums
If you understand the code well, changing it to print the sums instead should be pretty easy. I suggest you work on understanding it a bit more - trace what the program will do by hand, then trace what you want the program to do.
And a hint for solving the actual problem: On noting that countSubsetSum2(arr, k-1, sum - arr[k-1]); recurses with the sum including the current element (and the other recursive call recurses with the sum excluding the current element), what you should do should become clear.
1: Well, technically it's reversed (we start with the target sum and decrease to 0 instead of starting at 0 and increasing to sum), but the same idea is there.
This is the code that works:
import java.util.LinkedList;
import java.util.Iterator;
import java.util.List;
public class subset{
public static int count = 0;
public static List list = new LinkedList();
public static void countSubsetSum2(int arr[], int k, int sum) {
if(sum <= 0 || k < 0) {
count++;
return;
}
if(sum == arr[k]) {
System.out.print(arr[k]);
for(Iterator i = list.iterator(); i.hasNext();)
System.out.print("\t" + i.next());
System.out.println();
}
list.add(arr[k]);
countSubsetSum2(arr, k-1, sum - arr[k]);
list.remove(list.size() - 1);
countSubsetSum2(arr, k-1, sum);
}
public static void main(String[] args)
{
int [] array = {1, 4, 5, 6};
countSubsetSum2(array, 3, 10);
}
}
First off, the code you have there doesn't seem to actually work (I tested it on input [1,2,3, ..., 10] with a sum of 3 and it output 128).
To get it working, first note that you implemented the algorithm in a pretty unorthodox way. Mathematical functions take input and produce output. (Arguably) the most elegant programming functions should also take input and produce output because then we can reason about them as we reason about math.
In your case you don't produce any output (the return type is void) and instead store the result in a static variable. This means it's hard to tell exactly what it means to call countSubsetSum2. In particular, what happens if you call it multiple times? It does something different each time (because the count variable will have a different starting value!) Instead, if you write countSubsetSum2 so that it returns a value then you can define its behavior to be: countSubsetSum2 returns the number of subsets of the input arr[0...k] that sum to sum. And then you can try proving why your implementation meets that specification.
I'm not doing the greatest job of explaining, but I think a more natural way to write it would be:
// Algorithm stops once k is the least element in the array
if (k == 0) {
if (sum == 0 || sum == arr[k]) {
// Either we can sum to "sum"
return 1;
}
else {
// Or we can't sum to "sum"
return 0;
}
}
// Otherwise, let's recursively see if we can sum to "sum"
// Any valid subset either includes arr[k]
return countSubsetSum2(arr, k-1, sum - arr[k]) +
// Or it doesn't
countSubsetSum2(arr, k-1, sum);
As described above, this function takes an input and outputs a value that we can define and prove to be true mathematically (caveat: it's usually not quite a proof because there are crazy edge cases in most programming languages unfortunately).
Anyways, to get back to your question. The issue with the above code is that it doesn't store any data... it just returns the count. Instead, let's generate the actual subsets while we're generating them. In particular, when I say Any valid subset either includes arr[k] I mean... the subset we're generating includes arr[k]; so add it. Below I assumed that the code you wrote above is java-ish. Hopefully it makes sense:
// Algorithm stops once k is the least element in the array
if (k == 0) {
if (sum == 0 || sum == arr[k]) {
// Either we can sum to "sum" using just arr[0]
// So return a list of all of the subsets that sum to "sum"
// There are actually a few edge cases here, so we need to be careful
List<Set<int>> ret = new List<Set<int>>();
// First consider if the singleton containing arr[k] could equal sum
if (sum == arr[k])
{
Set<int> subSet = new Subset<int>();
subSet.Add(arr[k]);
ret.Add(subSet);
}
// Now consider the empty set
if (sum == 0)
{
Set<int> subSet = new Subset<int>();
ret.Add(subSet);
}
return ret;
}
else {
// Or we can't sum to "sum" using just arr[0]
// So return a list of all of the subsets that sum to "sum". None
// (given our inputs!)
List<Set<int>> ret = new List<Set<int>>();
return ret;
}
}
// Otherwise, let's recursively generate subsets summing to "sum"
// Any valid subset either includes arr[k]
List<Set<int>> subsetsThatNeedKthElement = genSubsetSum(arr, k-1, sum - arr[k]);
// Or it doesn't
List<Set<int>> completeSubsets = genSubsetSum(arr, k-1, sum);
// Note that subsetsThatNeedKthElement only sum to "sum" - arr[k]... so we need to add
// arr[k] to each of those subsets to create subsets which sum to "sum"
// On the other hand, completeSubsets contains subsets which already sum to "sum"
// so they're "complete"
// Initialize it with the completed subsets
List<Set<int>> ret = new List<Set<int>>(completeSubsets);
// Now augment the incomplete subsets and add them to the final list
foreach (Set<int> subset in subsetsThatNeedKthElement)
{
subset.Add(arr[k]);
ret.Add(subset);
}
return ret;
The code is pretty cluttered with all the comments; but the key point is that this implementation always returns what it's specified to return (a list of sets of ints from arr[0] to arr[k] which sum to whatever sum was passed in).
FYI, there is another approach which is "bottom-up" (i.e. doesn't use recursion) which should be more performant. If you implement it that way, then you need to store extra data in static state (a "memoized table")... which is a bit ugly but practical. However, when you implement it this way you need to have a more clever way of generating the subsets. Feel free to ask that question in a separate post after giving it a try.
Based, on the comments/suggestions here, I have been able to get the solution for this problem in this way:
public static int counter = 0;
public static List<List<Integer>> lists = new ArrayList<>();
public static void getSubsetCountThatSumToTargetValue(int[] arr, int k, int targetSum, List<Integer> list) {
if(targetSum == 0) {
counter++;
lists.add(list);
return;
}
if(k <= 0) {
return;
}
getSubsetCountThatSumToTargetValue(arr, k - 1, targetSum, list);
List<Integer> appendedlist = new ArrayList<>();
appendedlist.addAll(list);
appendedlist.add(arr[k - 1]);
getSubsetCountThatSumToTargetValue(arr, k - 1, targetSum - arr[k - 1], appendedlist);
}
The main method looks like this:
public static void main(String[] args) {
int[] arr = {1, 2, 3, 4, 5};
SubSetSum.getSubsetCountThatSumToTargetValue(arr, 5, 9, new ArrayList<Integer>());
System.out.println("Result count: " + counter);
System.out.println("lists: " + lists);
}
Output:
Result: 3
lists: [[4, 3, 2], [5, 3, 1], [5, 4]]
A Python implementation with k moving from 0 to len() - 1:
import functools
def sum_of_subsets( numbers, sum_original ):
def _sum_of_subsets( list, k, sum ):
if sum < 0 or k == len( numbers ):
return
if ( sum == numbers[ k ] ):
expression = functools.reduce( lambda result, num: str( num ) if len( result ) == 0 else \
"%s + %d" % ( result, num ),
sorted( list + [ numbers[ k ]] ),
'' )
print "%d = %s" % ( sum_original, expression )
return
list.append( numbers[ k ] )
_sum_of_subsets( list, k + 1, sum - numbers[ k ])
list.pop( -1 )
_sum_of_subsets( list, k + 1, sum )
_sum_of_subsets( [], 0, sum_original )
...
sum_of_subsets( [ 8, 6, 3, 4, 2, 5, 7, 1, 9, 11, 10, 13, 12, 14, 15 ], 15 )
...
15 = 1 + 6 + 8
15 = 3 + 4 + 8
15 = 1 + 2 + 4 + 8
15 = 2 + 5 + 8
15 = 7 + 8
15 = 2 + 3 + 4 + 6
15 = 1 + 3 + 5 + 6
15 = 4 + 5 + 6
15 = 2 + 6 + 7
15 = 6 + 9
15 = 1 + 2 + 3 + 4 + 5
15 = 1 + 3 + 4 + 7
15 = 1 + 2 + 3 + 9
15 = 2 + 3 + 10
15 = 3 + 5 + 7
15 = 1 + 3 + 11
15 = 3 + 12
15 = 2 + 4 + 9
15 = 1 + 4 + 10
15 = 4 + 11
15 = 1 + 2 + 5 + 7
15 = 1 + 2 + 12
15 = 2 + 13
15 = 1 + 5 + 9
15 = 5 + 10
15 = 1 + 14
15 = 15

Algorithm: Determine if a combination of min/max values fall within a given range

Imagine you have 3 buckets, but each of them has a hole in it. I'm trying to fill a bath tub. The bath tub has a minimum level of water it needs and a maximum level of water it can contain. By the time you reach the tub with the bucket it is not clear how much water will be in the bucket, but you have a range of possible values.
Is it possible to adequately fill the tub with water?
Pretty much you have 3 ranges (min,max), is there some sum of them that will fall within a 4th range?
For example:
Bucket 1 : 5-10L
Bucket 2 : 15-25L
Bucket 3 : 10-50L
Bathtub 100-150L
Is there some guaranteed combination of 1 2 and 3 that will fill the bathtub within the requisite range? Multiples of each bucket can be used.
EDIT: Now imagine there are 50 different buckets?
If the capacity of the tub is not very large ( not greater than 10^6 for an example), we can solve it using dynamic programming.
Approach:
Initialization: memo[X][Y] is an array to memorize the result. X = number of buckets, Y = maximum capacity of the tub. Initialize memo[][] with -1.
Code:
bool dp(int bucketNum, int curVolume){
if(curVolume > maxCap)return false; // pruning extra branches
if(curVolume>=minCap && curVolume<=maxCap){ // base case on success
return true;
}
int &ret = memo[bucketNum][curVolume];
if(ret != -1){ // this state has been visited earlier
return false;
}
ret = false;
for(int i = minC[bucketNum]; i < = maxC[bucketNum]; i++){
int newVolume = curVolume + i;
for(int j = bucketNum; j <= 3; j++){
ret|=dp(j,newVolume);
if(ret == true)return ret;
}
}
return ret;
}
Warning: Code not tested
Here's a naïve recursive solution in python that works just fine (although it doesn't find an optimal solution):
def match_helper(lower, upper, units, least_difference, fail = dict()):
if upper < lower + least_difference:
return None
if fail.get((lower,upper)):
return None
exact_match = [ u for u in units if u['lower'] >= lower and u['upper'] <= upper ]
if exact_match:
return [ exact_match[0] ]
for unit in units:
if unit['upper'] > upper:
continue
recursive_match = match_helper(lower - unit['lower'], upper - unit['upper'], units, least_difference)
if recursive_match:
return [unit] + recursive_match
else:
fail[(lower,upper)] = 1
return None
def match(lower, upper):
units = [
{ 'name': 'Bucket 1', 'lower': 5, 'upper': 10 },
{ 'name': 'Bucket 2', 'lower': 15, 'upper': 25 },
{ 'name': 'Bucket 3', 'lower': 10, 'upper': 50 }
]
least_difference = min([ u['upper'] - u['lower'] for u in units ])
return match_helper(
lower = lower,
upper = upper,
units = sorted(units, key = lambda u: u['upper']),
least_difference = min([ u['upper'] - u['lower'] for u in units ]),
)
result = match(100, 175)
if result:
lower = sum([ u['lower'] for u in result ])
upper = sum([ u['upper'] for u in result ])
names = [ u['name'] for u in result ]
print lower, "-", upper
print names
else:
print "No solution"
It prints "No solution" for 100-150, but for 100-175 it comes up with a solution of 5x bucket 1, 5x bucket 2.
Assuming you are saying that the "range" for each bucket is the amount of water that it may have when it reaches the tub, and all you care about is if they could possibly fill the tub...
Just take the "max" of each bucket and sum them. If that is in the range of what you consider the tub to be "filled" then it can.
Updated:
Given that buckets can be used multiple times, this seems to me like we're looking for solutions to a pair of equations.
Given buckets x, y and z we want to find a, b and c:
a*x.min + b*y.min + c*z.min >= bathtub.min
and
a*x.max + b*y.max + c*z.max <= bathtub.max
Re: http://en.wikipedia.org/wiki/Diophantine_equation
If bathtub.min and bathtub.max are both multiples of the greatest common divisor of a,b and c, then there are infinitely many solutions (i.e. we can fill the tub), otherwise there are no solutions (i.e. we can never fill the tub).
This can be solved with multiple applications of the change making problem.
Each Bucket.Min value is a currency denomination, and Bathtub.Min is the target value.
When you find a solution via a change-making algorithm, then apply one more constraint:
sum(each Bucket.Max in your solution) <= Bathtub.max
If this constraint is not met, throw out this solution and look for another. This will probably require a change to a standard change-making algorithm that allows you to try other solutions when one is found to not be suitable.
Initially, your target range is Bathtub.Range.
Each time you add an instance of a bucket to the solution, you reduce the target range for the remaining buckets.
For example, using your example buckets and tub:
Target Range = 100..150
Let's say we want to add a Bucket1 to the candidate solution. That then gives us
Target Range = 95..140
because if the rest of the buckets in the solution total < 95, then this Bucket1 might not be sufficient to fill the tub to 100, and if the rest of the buckets in the solution total > 140, then this Bucket1 might fill the tub over 150.
So, this gives you a quick way to check if a candidate solution is valid:
TargetRange = Bathtub.Range
foreach Bucket in CandidateSolution
TargetRange.Min -= Bucket.Min
TargetRange.Max -= Bucket.Max
if TargetRange.Min == 0 AND TargetRange.Max >= 0 then solution found
if TargetRange.Min < 0 or TargetRange.Max < 0 then solution is invalid
This still leaves the question - How do you come up with the set of candidate solutions?
Brute force would try all possible combinations of buckets.
Here is my solution for finding the optimal solution (least number of buckets). It compares the ratio of the maximums to the ratio of the minimums, to figure out the optimal number of buckets to fill the tub.
private static void BucketProblem()
{
Range bathTub = new Range(100, 175);
List<Range> buckets = new List<Range> {new Range(5, 10), new Range(15, 25), new Range(10, 50)};
Dictionary<Range, int> result;
bool canBeFilled = SolveBuckets(bathTub, buckets, out result);
}
private static bool BucketHelper(Range tub, List<Range> buckets, Dictionary<Range, int> results)
{
Range bucket;
int startBucket = -1;
int fills = -1;
for (int i = buckets.Count - 1; i >=0 ; i--)
{
bucket = buckets[i];
double maxRatio = (double)tub.Maximum / bucket.Maximum;
double minRatio = (double)tub.Minimum / bucket.Minimum;
if (maxRatio >= minRatio)
{
startBucket = i;
if (maxRatio - minRatio > 1)
fills = (int) minRatio + 1;
else
fills = (int) maxRatio;
break;
}
}
if (startBucket < 0)
return false;
bucket = buckets[startBucket];
tub.Maximum -= bucket.Maximum * fills;
tub.Minimum -= bucket.Minimum * fills;
results.Add(bucket, fills);
return tub.Maximum == 0 || tub.Minimum <= 0 || startBucket == 0 || BucketHelper(tub, buckets.GetRange(0, startBucket), results);
}
public static bool SolveBuckets(Range tub, List<Range> buckets, out Dictionary<Range, int> results)
{
results = new Dictionary<Range, int>();
buckets = buckets.OrderBy(b => b.Minimum).ToList();
return BucketHelper(new Range(tub.Minimum, tub.Maximum), buckets, results);
}

Resources