Rust sorting uses surprisingly few comparaisons - sorting

I am currently learning Rust (using the Rust book), and one page mentions counting the number of times the sorting key was used while sorting an array. I modified the code in order to count this for arbitrary sizes, and here is the code :
fn main() {
const MAX: i32 = 10000;
for n in 1..MAX {
let mut v: Vec<i32> = (1..n).collect();
let mut ops = 0;
v.sort_by(|x, y| {
ops += 1;
x.cmp(y)
});
if n-2 >= 0 {
assert_eq!(n-2, ops);
}
// println!("A list of {n} elements is sorted in {ops} operations");
}
}
However, it seems that in order to sort a vector of n elements, Rust only needs n-2 comparaisons (the code above runs without panicking).
How can this be possible ? Aren't sorts supposed to be in O(n*log(n)) ?
Is it because Rust somehow "noticed" that my input vector was already sorted ?
Even in that case, how can a vector of length 2 can be sorted without any comparaisons ? Shouldn't it at least be n-1 ?

The biggest misconseption you have, I think, is:
fn main() {
const SIZE: i32 = 5;
let v: Vec<i32> = (1..SIZE).collect();
println!("{}", v.len());
}
4
The range 1..SIZE does not include SIZE and contains SIZE-1 elements.
Further, it will already be sorted, so it's as simple as iterating through it once.
See here:
fn main() {
const SIZE: i32 = 5;
let mut v: Vec<i32> = (1..SIZE).collect();
let mut ops = 0;
v.sort_by(|x, y| {
ops += 1;
let result = x.cmp(y);
println!(" - cmp {} vs {} => {:?}", x, y, result);
result
});
println!("Total comparisons: {}", ops);
}
- cmp 4 vs 3 => Greater
- cmp 3 vs 2 => Greater
- cmp 2 vs 1 => Greater
Total comparisons: 3
it seems that in order to sort a vector of n elements, Rust only needs n-2 comparaisons
That is incorrect. In order to sort an already sorted vector (which yours are), Rust needs n-1 comparisons. It doesn't detect that, that's just an inherent property of the mergesort implementation that Rust uses.
If it isn't already sorted, it will be more:
fn main() {
let mut v: Vec<i32> = vec![2, 4, 1, 3];
let mut ops = 0;
v.sort_by(|x, y| {
ops += 1;
let result = x.cmp(y);
println!(" - cmp {} vs {} => {:?}", x, y, result);
result
});
println!("Total comparisons: {}", ops);
}
- cmp 3 vs 1 => Greater
- cmp 1 vs 4 => Less
- cmp 3 vs 4 => Less
- cmp 1 vs 2 => Less
- cmp 3 vs 2 => Greater
Total comparisons: 5

FYI sort_by:
pub fn sort_by<F>(&mut self, mut compare: F)
where
F: FnMut(&T, &T) -> Ordering,
{
merge_sort(self, |a, b| compare(a, b) == Less);
}
and it actually invokes merge_sort:
/// This merge sort borrows some (but not all) ideas from TimSort, which is described in detail
/// [here](https://github.com/python/cpython/blob/main/Objects/listsort.txt).
///
/// The algorithm identifies strictly descending and non-descending subsequences, which are called
/// natural runs. There is a stack of pending runs yet to be merged. Each newly found run is pushed
/// onto the stack, and then some pairs of adjacent runs are merged until these two invariants are
/// satisfied:
///
/// 1. for every `i` in `1..runs.len()`: `runs[i - 1].len > runs[i].len`
/// 2. for every `i` in `2..runs.len()`: `runs[i - 2].len > runs[i - 1].len + runs[i].len`
///
/// The invariants ensure that the total running time is *O*(*n* \* log(*n*)) worst-case.
#[cfg(not(no_global_oom_handling))]
fn merge_sort<T, F>(v: &mut [T], mut is_less: F)
how can a vector of length 2 be sorted without any comparisons? Shouldn't it at least be n-1?
(1..2) returns a slice of length 1 (start from 1, but less than 2). So, when n == 2 in your code, please note that the length of the vector is one.
Let me demonstrate how it will actually go in the merge_sort if the input is a slice shorter than or equal to 2.
// MAX_INSERTION: 20
if len <= MAX_INSERTION {
// if the len is less than 1, it won't use `is_less` closure to let you count the cmp.
if len >= 2 {
for i in (0..len - 1).rev() {
insert_head(&mut v[i..], &mut is_less); // <- go into `insert_head`.
}
}
return;
}
fn insert_head<T, F>(v: &mut [T], is_less: &mut F)
where
F: FnMut(&T, &T) -> bool,
{
if v.len() >= 2 && is_less(&v[1], &v[0]) // <- here it uses the closure to make comparison.
So if your input is less than 21, short arrays will get sorted in place via insertion sort to avoid allocations.

Related

Reconstructing input to encoder from output

I would like to understand how to solve the Codility ArrayRecovery challenge, but I don't even know what branch of knowledge to consult. Is it combinatorics, optimization, computer science, set theory, or something else?
Edit:
The branch of knowledge to consult is constraint programming, particularly constraint propagation. You also need some combinatorics to know that if you take k numbers at a time from the range [1..n], with the restriction that no number can be bigger than the one before it, that works out to be
(n+k-1)!/k!(n-1)! possible combinations
which is the same as the number of combinations with replacements of n things taken k at a time, which has the mathematical notation . You can read about why it works out like that here.
Peter Norvig provides an excellent example of how to solve this kind of problem with his Sudoku solver.
You can read the full description of the ArrayRecovery problem via the link above. The short story is that there is an encoder that takes a sequence of integers in the range 1 up to some given limit (say 100 for our purposes) and for each element of the input sequence outputs the most recently seen integer that is smaller than the current input, or 0 if none exists.
input 1, 2, 3, 4 => output 0, 1, 2, 3
input 2, 4, 3 => output 0, 2, 2
The full task is, given the output (and the range of allowable input), figure out how many possible inputs could have generated it. But before I even get to that calculation, I'm not confident about how to even approach formulating the equation. That is what I am asking for help with. (Of course a full solution would be welcome, too, if it is explained.)
I just look at some possible outputs and wonder. Here are some sample encoder outputs and the inputs I can come up with, with * meaning any valid input and something like > 4 meaning any valid input greater than 4. If needed, inputs are referred to as A1, A2, A3, ... (1-based indexing)
Edit #2
Part of the problem I was having with this challenge is that I did not manually generate the exactly correct sets of possible inputs for an output. I believe the set below is correct now. Look at this answer's edit history if you want to see my earlier mistakes.
output #1: 0, 0, 0, 4
possible inputs: [>= 4, A1 >= * >= 4, 4, > 4]
output #2: 0, 0, 0, 2, 3, 4 # A5 ↴ See more in discussion below
possible inputs: [>= 2, A1 >= * >=2, 2, 3, 4, > 4]
output #3: 0, 0, 0, 4, 3, 1
possible inputs: none # [4, 3, 1, 1 >= * > 4, 4, > 1] but there is no number 1 >= * > 4
The second input sequence is very tightly constrained compared to the first just by adding 2 more outputs. The third sequence is so constrained as to be impossible.
But the set of constraints on A5 in example #2 is a bit harder to articulate. Of course A5 > O5, that is the basic constraint on all the inputs. But any output > A4 and after O5 has to appear in the input after A4, so A5 has to be an element of the set of numbers that comes after A5 that is also > A4. Since there is only 1 such number (A6 == 4), A5 has to be it, but it gets more complicated if there is a longer string of numbers that follow. (Editor's note: actually it doesn't.)
As the output set gets longer, I worry these constraints just get more complicated and harder to get right. I cannot think of any data structures for efficiently representing these in a way that leads to efficiently calculating the number of possible combinations. I also don't quite see how to algorithmically add constraint sets together.
Here are the constraints I see so far for any given An
An > On
An <= min(Set of other possible numbers from O1 to n-1 > On). How to define the set of possible numbers greater than On?
Numbers greater than On that came after the most recent occurrence of On in the input
An >= max(Set of other possible numbers from O1 to n-1 < On). How to define the set of possible numbers less than On?
Actually this set is empty because On is, by definition, the largest possible number from the previous input sequence. (Which it not to say it is strictly the largest number from the previous input sequence.)
Any number smaller than On that came before the last occurrence of it in the input would be ineligible because of the "nearest" rule. No numbers smaller that On could have occurred after the most recent occurrence because of the "nearest" rule and because of the transitive property: if Ai < On and Aj < Ai then Aj < On
Then there is the set theory:
An must be an element of the set of unaccounted-for elements of the set of On+1 to Om, where m is the smallest m > n such that Om < On. Any output after such Om and larger than Om (which An is) would have to appear as or after Am.
An element is unaccounted-for if it is seen in the output but does not appear in the input in a position that is consistent with the rest of the output. Obviously I need a better definition than this in order to code and algorithm to calculate it.
It seems like perhaps some kind of set theory and/or combinatorics or maybe linear algebra would help with figuring out the number of possible sequences that would account for all of the unaccounted-for outputs and fit the other constraints. (Editor's note: actually, things never get that complicated.)
The code below passes all of Codility's tests. The OP added a main function to use it on the command line.
The constraints are not as complex as the OP thinks. In particular, there is never a situation where you need to add a restriction that an input be an element of some set of specific integers seen elsewhere in the output. Every input position has a well-defined minimum and maximum.
The only complication to that rule is that sometimes the maximum is "the value of the previous input" and that input itself has a range. But even then, all the values like that are consecutive and have the same range, so the number of possibilities can be calculated with basic combinatorics, and those inputs as a group are independent of the other inputs (which only serve to set the range), so the possibilities of that group can be combined with the possibilities of other input positions by simple multiplication.
Algorithm overview
The algorithm makes a single pass through the output array updating the possible numbers of input arrays after every span, which is what I am calling repetitions of numbers in the output. (You might say maximal subsequences of the output where every element is identical.) For example, for output 0,1,1,2 we have three spans: 0, 1,1 and 2. When a new span begins, the number of possibilities for the previous span is calculated.
This decision was based on a few observations:
For spans longer than 1 in length, the minimum value of the input
allowed in the first position is whatever the value is of the input
in the second position. Calculating the number of possibilities of a
span is straightforward combinatorics, but the standard formula
requires knowing the range of the numbers and the length of the span.
Every time the value of the
output changes (and a new span beings), that strongly constrains the value of the previous span:
When the output goes up, the only possible reason is that the previous input was the value of the new, higher output and the input corresponding to the position of the new, higher output, was even higher.
When an output goes down, new constraints are established, but those are a bit harder to articulate. The algorithm stores stairs (see below) in order to quantify the constraints imposed when the output goes down
The aim here was to confine the range of possible values for every span. Once we do that accurately, calculating the number of combinations is straightforward.
Because the encoder backtracks looking to output a number that relates to the input in 2 ways, both smaller and closer, we know we can throw out numbers that are larger and farther away. After a small number appears in the output, no larger number from before that position can have any influence on what follows.
So to confine these ranges of input when the output sequence decreased, we need to store stairs - a list of increasingly larger possible values for the position in the original array. E.g for 0,2,5,7,2,4 stairs build up like this: 0, 0,2, 0,2,5, 0,2,5,7, 0,2, 0,2,4.
Using these bounds we can tell for sure that the number in the position of the second 2 (next to last position in the example) must be in (2,5], because 5 is the next stair. If the input were greater than 5, a 5 would have been output in that space instead of a 2. Observe, that if the last number in the encoded array was not 4, but 6, we would exit early returning 0, because we know that the previous number couldn't be bigger than 5.
The complexity is O(n*lg(min(n,m))).
Functions
CombinationsWithReplacement - counts number of combinations with replacements of size k from n numbers. E.g. for (3, 2) it counts 3,3, 3,2, 3,1, 2,2, 2,1, 1,1, so returns 6 It is the same as choose(n - 1 + k, n - 1).
nextBigger - finds next bigger element in a range. E.g. for 4 in sub-array 1,2,3,4,5 it returns 5, and in sub-array 1,3 it returns its parameter Max.
countSpan (lambda) - counts how many different combinations a span we have just passed can have. Consider span 2,2 for 0,2,5,7,2,2,7.
When curr gets to the final position, curr is 7 and prev is the final 2 of the 2,2 span.
It computes maximum and minimum possible values of the prev span. At this point stairs consist of 2,5,7 then maximum possible value is 5 (nextBigger after 2 in the stair 2,5,7). A value of greater than 5 in this span would have output a 5, not a 2.
It computes a minimum value for the span (which is the minimum value for every element in the span), which is prev at this point, (remember curr at this moment equals to 7 and prev to 2). We know for sure that in place of the final 2 output, the original input has to have 7, so the minimum is 7. (This is a consequence of the "output goes up" rule. If we had 7,7,2 and curr would be 2 then the minimum for the previous span (the 7,7) would be 8 which is prev + 1.
It adjusts the number of combinations. For a span of length L with a range of n possibilities (1+max-min), there are possibilities, with k being either L or L-1 depending on what follows the span.
For a span followed by a larger number, like 2,2,7, k = L - 1 because the last position of the 2,2 span has to be 7 (the value of the first number after the span).
For a span followed by a smaller number, like 7,7,2, k = L because
the last element of 7,7 has no special constraints.
Finally, it calls CombinationsWithReplacement to find out the number of branches (or possibilities), computes new res partial results value (remainder values in the modulo arithmetic we are doing), and returns new res value and max for further handling.
solution - iterates over the given Encoder Output array. In the main loop, while in a span it counts the span length, and at span boundaries it updates res by calling countSpan and possibly updates the stairs.
If the current span consists of a bigger number than the previous one, then:
Check validity of the next number. E.g 0,2,5,2,7 is invalid input, becuase there is can't be 7 in the next-to-last position, only 3, or 4, or 5.
It updates the stairs. When we have seen only 0,2, the stairs are 0,2, but after the next 5, the stairs become 0,2,5.
If the current span consists of a smaller number then the previous one, then:
It updates stairs. When we have seen only 0,2,5, our stairs are 0,2,5, but after we have seen 0,2,5,2 the stairs become 0,2.
After the main loop it accounts for the last span by calling countSpan with -1 which triggers the "output goes down" branch of calculations.
normalizeMod, extendedEuclidInternal, extendedEuclid, invMod - these auxiliary functions help to deal with modulo arithmetic.
For stairs I use storage for the encoded array, as the number of stairs never exceeds current position.
#include <algorithm>
#include <cassert>
#include <vector>
#include <tuple>
const int Modulus = 1'000'000'007;
int CombinationsWithReplacement(int n, int k);
template <class It>
auto nextBigger(It begin, It end, int value, int Max) {
auto maxIt = std::upper_bound(begin, end, value);
auto max = Max;
if (maxIt != end) {
max = *maxIt;
}
return max;
}
auto solution(std::vector<int> &B, const int Max) {
auto res = 1;
const auto size = (int)B.size();
auto spanLength = 1;
auto prev = 0;
// Stairs is the list of numbers which could be smaller than number in the next position
const auto stairsBegin = B.begin();
// This includes first entry (zero) into stairs
// We need to include 0 because we can meet another zero later in encoded array
// and we need to be able to find in stairs
auto stairsEnd = stairsBegin + 1;
auto countSpan = [&](int curr) {
const auto max = nextBigger(stairsBegin, stairsEnd, prev, Max);
// At the moment when we switch from the current span to the next span
// prev is the number from previous span and curr from current.
// E.g. 1,1,7, when we move to the third position cur = 7 and prev = 1.
// Observe that, in this case minimum value possible in place of any of 1's can be at least 2=1+1=prev+1.
// But if we consider 7, then we have even more stringent condition for numbers in place of 1, it is 7
const auto min = std::max(prev + 1, curr);
const bool countLast = prev > curr;
const auto branchesCount = CombinationsWithReplacement(max - min + 1, spanLength - (countLast ? 0 : 1));
return std::make_pair(res * (long long)branchesCount % Modulus, max);
};
for (int i = 1; i < size; ++i) {
const auto curr = B[i];
if (curr == prev) {
++spanLength;
}
else {
int max;
std::tie(res, max) = countSpan(curr);
if (prev < curr) {
if (curr > max) {
// 0,1,5,1,7 - invalid because number in the fourth position lies in [2,5]
// and so in the fifth encoded position we can't something bigger than 5
return 0;
}
// It is time to possibly shrink stairs.
// E.g if we had stairs 0,2,4,9,17 and current value is 5,
// then we no more interested in 9 and 17, and we change stairs to 0,2,4,5.
// That's because any number bigger than 9 or 17 also bigger than 5.
const auto s = std::lower_bound(stairsBegin, stairsEnd, curr);
stairsEnd = s;
*stairsEnd++ = curr;
}
else {
assert(curr < prev);
auto it = std::lower_bound(stairsBegin, stairsEnd, curr);
if (it == stairsEnd || *it != curr) {
// 0,5,1 is invalid sequence because original sequence lloks like this 5,>5,>1
// and there is no 1 in any of the two first positions, so
// it can't appear in the third position of the encoded array
return 0;
}
}
spanLength = 1;
}
prev = curr;
}
res = countSpan(-1).first;
return res;
}
template <class T> T normalizeMod(T a, T m) {
if (a < 0) return a + m;
return a;
}
template <class T> std::pair<T, std::pair<T, T>> extendedEuclidInternal(T a, T b) {
T old_x = 1;
T old_y = 0;
T x = 0;
T y = 1;
while (true) {
T q = a / b;
T t = a - b * q;
if (t == 0) {
break;
}
a = b;
b = t;
t = x; x = old_x - x * q; old_x = t;
t = y; y = old_y - y * q; old_y = t;
}
return std::make_pair(b, std::make_pair(x, y));
}
// Returns gcd and Bezout's coefficients
template <class T> std::pair<T, std::pair<T, T>> extendedEuclid(T a, T b) {
if (a > b) {
if (b == 0) return std::make_pair(a, std::make_pair(1, 0));
return extendedEuclidInternal(a, b);
}
else {
if (a == 0) return std::make_pair(b, std::make_pair(0, 1));
auto p = extendedEuclidInternal(b, a);
std::swap(p.second.first, p.second.second);
return p;
}
}
template <class T> T invMod(T a, T m) {
auto p = extendedEuclid(a, m);
assert(p.first == 1);
return normalizeMod(p.second.first, m);
}
int CombinationsWithReplacement(int n, int k) {
int res = 1;
for (long long i = n; i < n + k; ++i) {
res = res * i % Modulus;
}
int denom = 1;
for (long long i = k; i > 0; --i) {
denom = denom * i % Modulus;
}
res = res * (long long)invMod(denom, Modulus) % Modulus;
return res;
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Only the above is needed for the Codility challenge. Below is to run on the command line.
//
// Compile with: gcc -std=gnu++14 -lc++ -lstdc++ array_recovery.cpp
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#include <string.h>
// Usage: 0 1 2,3, 4 M
// Last arg is M, the max value for an input.
// Remaining args are B (the output of the encoder) separated by commas and/or spaces
// Parentheses and brackets are ignored, so you can use the same input form as Codility's tests: ([1,2,3], M)
int main(int argc, char* argv[]) {
int Max;
std::vector<int> B;
const char* delim = " ,[]()";
if (argc < 2 ) {
printf("Usage: %s M 0 1 2,3, 4... \n", argv[0]);
return 1;
}
for (int i = 1; i < argc; i++) {
char* parse;
parse = strtok(argv[i], delim);
while (parse != NULL)
{
B.push_back(atoi(parse));
parse = strtok (NULL, delim);
}
}
Max = B.back();
B.pop_back();
printf("%d\n", solution(B, Max));
return 0;
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Only the above is needed for the Codility challenge. Below is to run on the command line.
//
// Compile with: gcc -std=gnu++14 -lc++ -lstdc++ array_recovery.cpp
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#include <string.h>
// Usage: M 0 1 2,3, 4
// first arg is M, the max value for an input.
// remaining args are B (the output of the encoder) separated by commas and/or spaces
int main(int argc, char* argv[]) {
int Max;
std::vector<int> B;
const char* delim = " ,";
if (argc < 3 ) {
printf("Usage: %s M 0 1 2,3, 4... \n", argv[0]);
return 1;
}
Max = atoi(argv[1]);
for (int i = 2; i < argc; i++) {
char* parse;
parse = strtok(argv[i], delim);
while (parse != NULL)
{
B.push_back(atoi(parse));
parse = strtok (NULL, delim);
}
}
printf("%d\n", solution(B, Max));
return 0;
}
Let's see an example:
Max = 5
Array is
0 1 3 0 1 1 3
1
1 2..5
1 3 4..5
1 3 4..5 1
1 3 4..5 1 2..5
1 3 4..5 1 2..5 >=..2 (sorry, for a cumbersome way of writing)
1 3 4..5 1 3..5 >=..3 4..5
Now count:
1 1 2 1 3 2 which amounts to 12 total.
Here's an idea. One known method to construct the output is to use a stack. We pop it while the element is greater or equal, then output the smaller element if it exists, then push the greater element onto the stack. Now what if we attempted to do this backwards from the output?
First we'll demonstrate the stack method using the c∅dility example.
[2, 5, 3, 7, 9, 6]
2: output 0, stack [2]
5: output 2, stack [2,5]
3: pop 5, output, 2, stack [2,3]
7: output 3, stack [2,3,7]
... etc.
Final output: [0, 2, 2, 3, 7, 3]
Now let's try reconstruction! We'll use stack both as the imaginary stack and as the reconstituted input:
(Input: [2, 5, 3, 7, 9, 6])
Output: [0, 2, 2, 3, 7, 3]
* Something >3 that reached 3 in the stack
stack = [3, 3 < *]
* Something >7 that reached 7 in the stack
but both of those would've popped before 3
stack = [3, 7, 7 < x, 3 < * <= x]
* Something >3, 7 qualifies
stack = [3, 7, 7 < x, 3 < * <= x]
* Something >2, 3 qualifies
stack = [2, 3, 7, 7 < x, 3 < * <= x]
* Something >2 and >=3 since 3 reached 2
stack = [2, 2 < *, 3, 7, 7 < x, 3 < * <= x]
Let's attempt your examples:
Example 1:
[0, 0, 0, 2, 3, 4]
* Something >4
stack = [4, 4 < *]
* Something >3, 4 qualifies
stack = [3, 4, 4 < *]
* Something >2, 3 qualifies
stack = [2, 3, 4, 4 < *]
* The rest is non-increasing with lowerbound 2
stack = [y >= x, x >= 2, 2, 3, 4, >4]
Example 2:
[0, 0, 0, 4]
* Something >4
stack [4, 4 < *]
* Non-increasing
stack = [z >= y, y >= 4, 4, 4 < *]
Calculating the number of combinations is achieved by multiplying together the possibilities for all the sections. A section is either a bounded single cell; or a bound, non-increasing subarray of one or more cells. To calculate the latter we use the multi-choose binomial, (n + k - 1) choose (k - 1). Consider that we can express the differences between the cells of a bound, non-increasing sequence of 3 cells as:
(ub - cell_3) + (cell_3 - cell_2) + (cell_2 - cell_1) + (cell_1 - lb) = ub - lb
Then the number of ways to distribute ub - lb into (x + 1) cells is
(n + k - 1) choose (k - 1)
or
(ub - lb + x) choose x
For example, the number of non-increasing sequences between
(3,4) in two cells is (4 - 3 + 2) choose 2 = 3: [3,3] [4,3] [4,4]
And the number of non-increasing sequences between
(3,4) in three cells is (4 - 3 + 3) choose 3 = 4: [3,3,3] [4,3,3] [4,4,3] [4,4,4]
(Explanation attributed to Brian M. Scott.)
Rough JavaScript sketch (the code is unreliable; it's only meant to illustrate the encoding. The encoder lists [lower_bound, upper_bound], or a non-increasing sequence as [non_inc, length, lower_bound, upper_bound]):
function f(A, M){
console.log(JSON.stringify(A), M);
let i = A.length - 1;
let last = A[i];
let s = [[last,last]];
if (A[i-1] == last){
let d = 1;
s.splice(1,0,['non_inc',d++,last,M]);
while (i > 0 && A[i-1] == last){
s.splice(1,0,['non_inc',d++,last,M]);
i--
}
} else {
s.push([last+1,M]);
i--;
}
if (i == 0)
s.splice(0,1);
for (; i>0; i--){
let x = A[i];
if (x < s[0][0])
s = [[x,x]].concat(s);
if (x > s[0][0]){
let [l, _l] = s[0];
let [lb, ub] = s[1];
s[0] = [x+1, M];
s[1] = [lb, x];
s = [[l,_l], [x,x]].concat(s);
}
if (x == s[0][0]){
let [l,_l] = s[0];
let [lb, ub] = s[1];
let d = 1;
s.splice(0,1);
while (i > 0 && A[i-1] == x){
s =
[['non_inc', d++, lb, M]].concat(s);
i--;
}
if (i > 0)
s = [[l,_l]].concat(s);
}
}
// dirty fix
if (s[0][0] == 0)
s.splice(0,1);
return s;
}
var a = [2, 5, 3, 7, 9, 6]
var b = [0, 2, 2, 3, 7, 3]
console.log(JSON.stringify(a));
console.log(JSON.stringify(f(b,10)));
b = [0,0,0,4]
console.log(JSON.stringify(f(b,10)));
b = [0,2,0,0,0,4]
console.log(JSON.stringify(f(b,10)));
b = [0,0,0,2,3,4]
console.log(JSON.stringify(f(b,10)));
b = [0,2,2]
console.log(JSON.stringify(f(b,4)));
b = [0,3,5,6]
console.log(JSON.stringify(f(b,10)));
b = [0,0,3,0]
console.log(JSON.stringify(f(b,10)));

Why does a range that starts at a negative number not iterate?

I have just started to learn Rust. During my first steps with this language, I found a strange behaviour, when an iteration is performed inside main or in another function as in following example:
fn myfunc(x: &Vec<f64>) {
let n = x.len();
println!(" n: {:?}", n);
for i in -1 .. n {
println!(" i: {}", i);
}
}
fn main() {
for j in -1 .. 6 {
println!("j: {}", j);
}
let field = vec![1.; 6];
myfunc(&field);
}
While the loop in main is correctly displayed, nothing is printed for the loop inside myfunc and I get following output:
j: -1
j: 0
j: 1
j: 2
j: 3
j: 4
j: 5
n: 6
What is the cause of this behaviour?
Type inference is causing both of the numbers in your range to be usize, which cannot represent negative numbers. Thus, the range is from usize::MAX to n, which never has any members.
To find this out, I used a trick to print out the types of things:
let () = -1 .. x.len();
Which has this error:
error: mismatched types:
expected `core::ops::Range<usize>`,
found `()`
(expected struct `core::ops::Range`,
found ()) [E0308]
let () = -1 .. x.len();
^~
Diving into the details, slice::len returns a usize. Your -1 is an untyped integral value, which will conform to fit whatever context it needs (if there's nothing for it to conform to, it will fall back to an i32).
In this case, it's as if you actually typed (-1 as usize)..x.len().
The good news is that you probably don't want to start at -1 anyway. Slices are zero-indexed:
fn myfunc(x: &[f64]) {
let n = x.len();
println!(" n: {:?}", n);
for i in 0..n {
println!(" i: {}", i);
}
}
Extra good news is that this annoyance was fixed in the newest versions of Rust. It will cause a warning and then eventually an error:
warning: unary negation of unsigned integers will be feature gated in the future
for i in -1 .. n {
^~
Also note that you should never accept a &Vec<T> as a parameter. Always use a &[T] as it's more flexible and you lose nothing.

Split a random value into four that sum up to it

I have one value like 24, and I have four textboxes. How can I dynamically generate four values that add up to 24?
All the values must be integers and can't be negative, and the result cannot be 6, 6, 6, 6; they must be different like: 8, 2, 10, 4. (But 5, 6, 6, 7 would be okay.)
For your stated problem, it is possible to generate an array of all possible solutions and then pick one randomly. There are in fact 1,770 possible solutions.
var solutions = [[Int]]()
for i in 1...21 {
for j in 1...21 {
for k in 1...21 {
let l = 24 - (i + j + k)
if l > 0 && !(i == 6 && j == 6 && k == 6) {
solutions.append([i, j, k, l])
}
}
}
}
// Now generate 20 solutions
for _ in 1...20 {
let rval = Int(arc4random_uniform(UInt32(solutions.count)))
println(solutions[rval])
}
This avoids any bias at the cost of initial setup time and storage.
This could be improved by:
Reducing storage space by only storing the first 3 numbers. The 4th one is always 24 - (sum of first 3)
Reducing storage space by storing each solution as a single integer: (i * 10000 + j * 100 + k)
Speeding up the generation of solutions by realizing that each loop doesn't need to go to 21.
Here is the solution that stores each solution as a single integer and optimizes the loops:
var solutions = [Int]()
for i in 1...21 {
for j in 1...22-i {
for k in 1...23-i-j {
if !(i == 6 && j == 6 && k == 6) {
solutions.append(i * 10000 + j * 100 + k)
}
}
}
}
// Now generate 20 solutions
for _ in 1...20 {
let rval = Int(arc4random_uniform(UInt32(solutions.count)))
let solution = solutions[rval]
// unpack the values
let i = solution / 10000
let j = (solution % 10000) / 100
let k = solution % 100
let l = 24 - (i + j + k)
// print the solution
println("\([i, j, k, l])")
}
Here is a Swift implementation of the algorithm given in https://stackoverflow.com/a/8064754/1187415, with a slight
modification because all numbers are required to be positive.
The method to producing N positive random integers with sum M is
Build an array containing the number 0, followed by N-1 different
random numbers in the range 1 .. M-1, and finally the number M.
Compute the differences of subsequent array elements.
In the first step, we need a random subset of N-1 elements out of
the set { 1, ..., M-1 }. This can be achieved by iterating over this
set and choosing each element with probability n/m, where
m is the remaining number of elements we can choose from and
n is the remaining number of elements to choose.
Instead of storing the chosen random numbers in an array, the
difference to the previously chosen number is computed immediately
and stored.
This gives the following function:
func randomNumbers(#count : Int, withSum sum : Int) -> [Int] {
precondition(sum >= count, "`sum` must not be less than `count`")
var diffs : [Int] = []
var last = 0 // last number chosen
var m = UInt32(sum - 1) // remaining # of elements to choose from
var n = UInt32(count - 1) // remaining # of elements to choose
for i in 1 ..< sum {
// Choose this number `i` with probability n/m:
if arc4random_uniform(m) < n {
diffs.append(i - last)
last = i
n--
}
m--
}
diffs.append(sum - last)
return diffs
}
println(randomNumbers(count: 4, withSum: 24))
If a solution with all elements equal (e.g 6+6+6+6=24) is not
allowed, you can repeat the method until a valid solution is found:
func differentRandomNumbers(#count : Int, withSum sum : Int) -> [Int] {
precondition(count >= 2, "`count` must be at least 2")
var v : [Int]
do {
v = randomNumbers(count: count, withSum: sum)
} while (!contains(v, { $0 != v[0]} ))
return v
}
Here is a simple test. It computes 1,000,000 random representations
of 7 as the sum of 3 positive integers, and counts the distribution
of the results.
let set = NSCountedSet()
for i in 1 ... 1_000_000 {
let v = randomNumbers(count: 3, withSum: 7)
set.addObject(v)
}
for (_, v) in enumerate(set) {
let count = set.countForObject(v)
println("\(v as! [Int]) \(count)")
}
Result:
[1, 4, 2] 66786
[1, 5, 1] 67082
[3, 1, 3] 66273
[2, 2, 3] 66808
[2, 3, 2] 66966
[5, 1, 1] 66545
[2, 1, 4] 66381
[1, 3, 3] 67153
[3, 3, 1] 67034
[4, 1, 2] 66423
[3, 2, 2] 66674
[2, 4, 1] 66418
[4, 2, 1] 66292
[1, 1, 5] 66414
[1, 2, 4] 66751
Update for Swift 3:
func randomNumbers(count : Int, withSum sum : Int) -> [Int] {
precondition(sum >= count, "`sum` must not be less than `count`")
var diffs : [Int] = []
var last = 0 // last number chosen
var m = UInt32(sum - 1) // remaining # of elements to choose from
var n = UInt32(count - 1) // remaining # of elements to choose
for i in 1 ..< sum {
// Choose this number `i` with probability n/m:
if arc4random_uniform(m) < n {
diffs.append(i - last)
last = i
n -= 1
}
m -= 1
}
diffs.append(sum - last)
return diffs
}
print(randomNumbers(count: 4, withSum: 24))
Update for Swift 4.2 (and later), using the unified random API:
func randomNumbers(count : Int, withSum sum : Int) -> [Int] {
precondition(sum >= count, "`sum` must not be less than `count`")
var diffs : [Int] = []
var last = 0 // last number chosen
var m = sum - 1 // remaining # of elements to choose from
var n = count - 1 // remaining # of elements to choose
for i in 1 ..< sum {
// Choose this number `i` with probability n/m:
if Int.random(in: 0..<m) < n {
diffs.append(i - last)
last = i
n -= 1
}
m -= 1
}
diffs.append(sum - last)
return diffs
}
func getRandomValues(amountOfValues:Int, totalAmount:Int) -> [Int]?{
if amountOfValues < 1{
return nil
}
if totalAmount < 1{
return nil
}
if totalAmount < amountOfValues{
return nil
}
var values:[Int] = []
var valueLeft = totalAmount
for i in 0..<amountOfValues{
if i == amountOfValues - 1{
values.append(valueLeft)
break
}
var value = Int(arc4random_uniform(UInt32(valueLeft - (amountOfValues - i))) + 1)
valueLeft -= value
values.append(value)
}
var shuffledArray:[Int] = []
for i in 0..<values.count {
var rnd = Int(arc4random_uniform(UInt32(values.count)))
shuffledArray.append(values[rnd])
values.removeAtIndex(rnd)
}
return shuffledArray
}
getRandomValues(4, 24)
This is not a final answer, but it should be a (good) starting point.
How it works: It takes 2 parameters. The amount of random values (4 in your case) and the total amount (24 in your case).
It takes a random value between the total Amount and 0, stores this in an array and it subtracts this from a variable which stores the amount that is left and stores the new value.
Than it takes a new random value between the amount that is left and 0, stores this in an array and it again subtracts this from the amount that is left and stores the new value.
When it is the last number needed, it sees what amount is left and adds that to the array
EDIT:
Adding a +1 to the random value removes the problem of having 0 in your array.
EDIT 2:
Shuffling the array does remove the increased chance of having a high value as the first value.
One solution that is unfortunatly non-deterministic but completely random is as follows:
For a total of 24 in 4 numbers:
pick four random numbers between 1 and 21
repeat until the total of the numbers equals 24 and they are not all 6.
This will, on average, loop about 100 times before finding a solution.
Here's a solution which should have significantly* less bias than some of the other methods. It works by generating the requested number of random floating point numbers, multiplying or dividing all of them until they add up to the target total, and then rounding them into integers. The rounding process changes the total, so we need to correct for that by adding or subtracting from random terms until they add up to the right amount.
func getRandomDoubles(#count: Int, #total: Double) -> [Double] {
var nonNormalized = [Double]()
nonNormalized.reserveCapacity(count)
for i in 0..<count {
nonNormalized.append(Double(arc4random()) / 0xFFFFFFFF)
}
let nonNormalizedSum = reduce(nonNormalized, 0) { $0 + $1 }
let normalized = nonNormalized.map { $0 * total / nonNormalizedSum }
return normalized
}
func getRandomInts(#count: Int, #total: Int) -> [Int] {
let doubles = getRandomDoubles(count: count, total: Double(total))
var ints = [Int]()
ints.reserveCapacity(count)
for double in doubles {
if double < 1 || double % 1 >= 0.5 {
// round up
ints.append(Int(ceil(double)))
} else {
// round down
ints.append(Int(floor(double)))
}
}
let roundingErrors = total - (reduce(ints, 0) { $0 + $1 })
let directionToAdjust: Int = roundingErrors > 0 ? 1 : -1
var corrections = abs(roundingErrors)
while corrections > 0 {
let index = Int(arc4random_uniform(UInt32(count)))
if directionToAdjust == -1 && ints[index] <= 1 { continue }
ints[index] += directionToAdjust
corrections--
}
return ints
}
*EDIT: Martin R has correctly pointed out that this is not nearly as uniform as one might expect, and is in fact highly biased towards numbers in the middle of the 1-24 range. I would not recommend using this solution, but I'm leaving it up so that others can know not to make the same mistake.
As a recursive function the algorithm is very nice:
func getRandomValues(amount: Int, total: Int) -> [Int] {
if amount == 1 { return [total] }
if amount == total { return Array(count: amount, repeatedValue: 1) }
let number = Int(arc4random()) % (total - amount + 1) + 1
return [number] + getRandomValues(amount - 1, total - number)
}
And with safety check:
func getRandomValues(amount: Int, total: Int) -> [Int]? {
if !(1...total ~= amount) { return nil }
if amount == 1 { return [total] }
if amount == total { return Array(count: amount, repeatedValue: 1) }
let number = Int(arc4random()) % (total - amount + 1) + 1
return [number] + getRandomValues(amount - 1, total - number)!
}
As #MartinR pointed out the code above is extremely biased. So in order to have a uniform distribution of the output values you should use this piece of code:
func getRandomValues(amount: Int, total: Int) -> [Int] {
var numberSet = Set<Int>()
// add splitting points to numberSet
for _ in 1...amount - 1 {
var number = Int(arc4random()) % (total - 1) + 1
while numberSet.contains(number) {
number = Int(arc4random()) % (total - 1) + 1
}
numberSet.insert(number)
}
// sort numberSet and return the differences between the splitting points
let sortedArray = (Array(numberSet) + [0, total]).sort()
return sortedArray.enumerate().flatMap{
indexElement in
if indexElement.index == amount { return nil }
return sortedArray[indexElement.index + 1] - indexElement.element
}
}
A javascript implementation for those who may be looking for such case:
const numbersSumTo = (length, value) => {
const fourRandomNumbers = Array.from({ length: length }, () => Math.floor(Math.random() * 6) + 1);
const res = fourRandomNumbers.map(num => (num / fourRandomNumbers.reduce((a, b) => a + b, 0)) * value).map(num => Math.trunc(num));
res[0] += Math.abs(res.reduce((a, b) => a + b, 0) - value);
return res;
}
// Gets an array with 4 items which sum to 100
const res = numbersSumTo(4, 100);
const resSum = res.reduce((a, b) => a + b, 0);
console.log({
res,
resSum
});
Also plenty of different methods of approach can be found here on this question: https://math.stackexchange.com/questions/1276206/method-of-generating-random-numbers-that-sum-to-100-is-this-truly-random

How to design an algorithm to calculate countdown style maths number puzzle

I have always wanted to do this but every time I start thinking about the problem it blows my mind because of its exponential nature.
The problem solver I want to be able to understand and code is for the countdown maths problem:
Given set of number X1 to X5 calculate how they can be combined using mathematical operations to make Y.
You can apply multiplication, division, addition and subtraction.
So how does 1,3,7,6,8,3 make 348?
Answer: (((8 * 7) + 3) -1) *6 = 348.
How to write an algorithm that can solve this problem? Where do you begin when trying to solve a problem like this? What important considerations do you have to think about when designing such an algorithm?
Very quick and dirty solution in Java:
public class JavaApplication1
{
public static void main(String[] args)
{
List<Integer> list = Arrays.asList(1, 3, 7, 6, 8, 3);
for (Integer integer : list) {
List<Integer> runList = new ArrayList<>(list);
runList.remove(integer);
Result result = getOperations(runList, integer, 348);
if (result.success) {
System.out.println(integer + result.output);
return;
}
}
}
public static class Result
{
public String output;
public boolean success;
}
public static Result getOperations(List<Integer> numbers, int midNumber, int target)
{
Result midResult = new Result();
if (midNumber == target) {
midResult.success = true;
midResult.output = "";
return midResult;
}
for (Integer number : numbers) {
List<Integer> newList = new ArrayList<Integer>(numbers);
newList.remove(number);
if (newList.isEmpty()) {
if (midNumber - number == target) {
midResult.success = true;
midResult.output = "-" + number;
return midResult;
}
if (midNumber + number == target) {
midResult.success = true;
midResult.output = "+" + number;
return midResult;
}
if (midNumber * number == target) {
midResult.success = true;
midResult.output = "*" + number;
return midResult;
}
if (midNumber / number == target) {
midResult.success = true;
midResult.output = "/" + number;
return midResult;
}
midResult.success = false;
midResult.output = "f" + number;
return midResult;
} else {
midResult = getOperations(newList, midNumber - number, target);
if (midResult.success) {
midResult.output = "-" + number + midResult.output;
return midResult;
}
midResult = getOperations(newList, midNumber + number, target);
if (midResult.success) {
midResult.output = "+" + number + midResult.output;
return midResult;
}
midResult = getOperations(newList, midNumber * number, target);
if (midResult.success) {
midResult.output = "*" + number + midResult.output;
return midResult;
}
midResult = getOperations(newList, midNumber / number, target);
if (midResult.success) {
midResult.output = "/" + number + midResult.output;
return midResult
}
}
}
return midResult;
}
}
UPDATE
It's basically just simple brute force algorithm with exponential complexity.
However you can gain some improvemens by leveraging some heuristic function which will help you to order sequence of numbers or(and) operations you will process in each level of getOperatiosn() function recursion.
Example of such heuristic function is for example difference between mid result and total target result.
This way however only best-case and average-case complexities get improved. Worst case complexity remains untouched.
Worst case complexity can be improved by some kind of branch cutting. I'm not sure if it's possible in this case.
Sure it's exponential but it's tiny so a good (enough) naive implementation would be a good start. I suggest you drop the usual infix notation with bracketing, and use postfix, it's easier to program. You can always prettify the outputs as a separate stage.
Start by listing and evaluating all the (valid) sequences of numbers and operators. For example (in postfix):
1 3 7 6 8 3 + + + + + -> 28
1 3 7 6 8 3 + + + + - -> 26
My Java is laughable, I don't come here to be laughed at so I'll leave coding this up to you.
To all the smart people reading this: yes, I know that for even a small problem like this there are smarter approaches which are likely to be faster, I'm just pointing OP towards an initial working solution. Someone else can write the answer with the smarter solution(s).
So, to answer your questions:
I begin with an algorithm that I think will lead me quickly to a working solution. In this case the obvious (to me) choice is exhaustive enumeration and testing of all possible calculations.
If the obvious algorithm looks unappealing for performance reasons I'll start thinking more deeply about it, recalling other algorithms that I know about which are likely to deliver better performance. I may start coding one of those first instead.
If I stick with the exhaustive algorithm and find that the run-time is, in practice, too long, then I might go back to the previous step and code again. But it has to be worth my while, there's a cost/benefit assessment to be made -- as long as my code can outperform Rachel Riley I'd be satisfied.
Important considerations include my time vs computer time, mine costs a helluva lot more.
A working solution in c++11 below.
The basic idea is to use a stack-based evaluation (see RPN) and convert the viable solutions to infix notation for display purposes only.
If we have N input digits, we'll use (N-1) operators, as each operator is binary.
First we create valid permutations of operands and operators (the selector_ array). A valid permutation is one that can be evaluated without stack underflow and which ends with exactly one value (the result) on the stack. Thus 1 1 + is valid, but 1 + 1 is not.
We test each such operand-operator permutation with every permutation of operands (the values_ array) and every combination of operators (the ops_ array). Matching results are pretty-printed.
Arguments are taken from command line as [-s] <target> <digit>[ <digit>...]. The -s switch prevents exhaustive search, only the first matching result is printed.
(use ./mathpuzzle 348 1 3 7 6 8 3 to get the answer for the original question)
This solution doesn't allow concatenating the input digits to form numbers. That could be added as an additional outer loop.
The working code can be downloaded from here. (Note: I updated that code with support for concatenating input digits to form a solution)
See code comments for additional explanation.
#include <iostream>
#include <vector>
#include <algorithm>
#include <stack>
#include <iterator>
#include <string>
namespace {
enum class Op {
Add,
Sub,
Mul,
Div,
};
const std::size_t NumOps = static_cast<std::size_t>(Op::Div) + 1;
const Op FirstOp = Op::Add;
using Number = int;
class Evaluator {
std::vector<Number> values_; // stores our digits/number we can use
std::vector<Op> ops_; // stores the operators
std::vector<char> selector_; // used to select digit (0) or operator (1) when evaluating. should be std::vector<bool>, but that's broken
template <typename T>
using Stack = std::stack<T, std::vector<T>>;
// checks if a given number/operator order can be evaluated or not
bool isSelectorValid() const {
int numValues = 0;
for (auto s : selector_) {
if (s) {
if (--numValues <= 0) {
return false;
}
}
else {
++numValues;
}
}
return (numValues == 1);
}
// evaluates the current values_ and ops_ based on selector_
Number eval(Stack<Number> &stack) const {
auto vi = values_.cbegin();
auto oi = ops_.cbegin();
for (auto s : selector_) {
if (!s) {
stack.push(*(vi++));
continue;
}
Number top = stack.top();
stack.pop();
switch (*(oi++)) {
case Op::Add:
stack.top() += top;
break;
case Op::Sub:
stack.top() -= top;
break;
case Op::Mul:
stack.top() *= top;
break;
case Op::Div:
if (top == 0) {
return std::numeric_limits<Number>::max();
}
Number res = stack.top() / top;
if (res * top != stack.top()) {
return std::numeric_limits<Number>::max();
}
stack.top() = res;
break;
}
}
Number res = stack.top();
stack.pop();
return res;
}
bool nextValuesPermutation() {
return std::next_permutation(values_.begin(), values_.end());
}
bool nextOps() {
for (auto i = ops_.rbegin(), end = ops_.rend(); i != end; ++i) {
std::size_t next = static_cast<std::size_t>(*i) + 1;
if (next < NumOps) {
*i = static_cast<Op>(next);
return true;
}
*i = FirstOp;
}
return false;
}
bool nextSelectorPermutation() {
// the start permutation is always valid
do {
if (!std::next_permutation(selector_.begin(), selector_.end())) {
return false;
}
} while (!isSelectorValid());
return true;
}
static std::string buildExpr(const std::string& left, char op, const std::string &right) {
return std::string("(") + left + ' ' + op + ' ' + right + ')';
}
std::string toString() const {
Stack<std::string> stack;
auto vi = values_.cbegin();
auto oi = ops_.cbegin();
for (auto s : selector_) {
if (!s) {
stack.push(std::to_string(*(vi++)));
continue;
}
std::string top = stack.top();
stack.pop();
switch (*(oi++)) {
case Op::Add:
stack.top() = buildExpr(stack.top(), '+', top);
break;
case Op::Sub:
stack.top() = buildExpr(stack.top(), '-', top);
break;
case Op::Mul:
stack.top() = buildExpr(stack.top(), '*', top);
break;
case Op::Div:
stack.top() = buildExpr(stack.top(), '/', top);
break;
}
}
return stack.top();
}
public:
Evaluator(const std::vector<Number>& values) :
values_(values),
ops_(values.size() - 1, FirstOp),
selector_(2 * values.size() - 1, 0) {
std::fill(selector_.begin() + values_.size(), selector_.end(), 1);
std::sort(values_.begin(), values_.end());
}
// check for solutions
// 1) we create valid permutations of our selector_ array (eg: "1 1 + 1 +",
// "1 1 1 + +", but skip "1 + 1 1 +" as that cannot be evaluated
// 2) for each evaluation order, we permutate our values
// 3) for each value permutation we check with each combination of
// operators
//
// In the first version I used a local stack in eval() (see toString()) but
// it turned out to be a performance bottleneck, so now I use a cached
// stack. Reusing the stack gives an order of magnitude speed-up (from
// 4.3sec to 0.7sec) due to avoiding repeated allocations. Using
// std::vector as a backing store also gives a slight performance boost
// over the default std::deque.
std::size_t check(Number target, bool singleResult = false) {
Stack<Number> stack;
std::size_t res = 0;
do {
do {
do {
Number value = eval(stack);
if (value == target) {
++res;
std::cout << target << " = " << toString() << "\n";
if (singleResult) {
return res;
}
}
} while (nextOps());
} while (nextValuesPermutation());
} while (nextSelectorPermutation());
return res;
}
};
} // namespace
int main(int argc, const char **argv) {
int i = 1;
bool singleResult = false;
if (argc > 1 && std::string("-s") == argv[1]) {
singleResult = true;
++i;
}
if (argc < i + 2) {
std::cerr << argv[0] << " [-s] <target> <digit>[ <digit>]...\n";
std::exit(1);
}
Number target = std::stoi(argv[i]);
std::vector<Number> values;
while (++i < argc) {
values.push_back(std::stoi(argv[i]));
}
Evaluator evaluator{values};
std::size_t res = evaluator.check(target, singleResult);
if (!singleResult) {
std::cout << "Number of solutions: " << res << "\n";
}
return 0;
}
Input is obviously a set of digits and operators: D={1,3,3,6,7,8,3} and Op={+,-,*,/}. The most straight forward algorithm would be a brute force solver, which enumerates all possible combinations of these sets. Where the elements of set Op can be used as often as wanted, but elements from set D are used exactly once. Pseudo code:
D={1,3,3,6,7,8,3}
Op={+,-,*,/}
Solution=348
for each permutation D_ of D:
for each binary tree T with D_ as its leafs:
for each sequence of operators Op_ from Op with length |D_|-1:
label each inner tree node with operators from Op_
result = compute T using infix traversal
if result==Solution
return T
return nil
Other than that: read jedrus07's and HPM's answers.
By far the easiest approach is to intelligently brute force it. There is only a finite amount of expressions you can build out of 6 numbers and 4 operators, simply go through all of them.
How many? Since you don't have to use all numbers and may use the same operator multiple times, This problem is equivalent to "how many labeled strictly binary trees (aka full binary trees) can you make with at most 6 leaves, and four possible labels for each non-leaf node?".
The amount of full binary trees with n leaves is equal to catalan(n-1). You can see this as follows:
Every full binary tree with n leaves has n-1 internal nodes and corresponds to a non-full binary tree with n-1 nodes in a unique way (just delete all the leaves from the full one to get it). There happen to be catalan(n) possible binary trees with n nodes, so we can say that a strictly binary tree with n leaves has catalan(n-1) possible different structures.
There are 4 possible operators for each non-leaf node: 4^(n-1) possibilities
The leaves can be numbered in n! * (6 choose (n-1)) different ways. (Divide this by k! for each number that occurs k times, or just make sure all numbers are different)
So for 6 different numbers and 4 possible operators you get Sum(n=1...6) [ Catalan(n-1) * 6!/(6-n)! * 4^(n-1) ] possible expressions for a total of 33,665,406. Not a lot.
How do you enumerate these trees?
Given a collection of all trees with n-1 or less nodes, you can create all trees with n nodes by systematically pairing all of the n-1 trees with the empty tree, all n-2 trees with the 1 node tree, all n-3 trees with all 2 node tree etc. and using them as the left and right sub trees of a newly formed tree.
So starting with an empty set you first generate the tree that has just a root node, then from a new root you can use that either as a left or right sub tree which yields the two trees that look like this: / and . And so on.
You can turn them into a set of expressions on the fly (just loop over the operators and numbers) and evaluate them as you go until one yields the target number.
I've written my own countdown solver, in Python.
Here's the code; it is also available on GitHub:
#!/usr/bin/env python3
import sys
from itertools import combinations, product, zip_longest
from functools import lru_cache
assert sys.version_info >= (3, 6)
class Solutions:
def __init__(self, numbers):
self.all_numbers = numbers
self.size = len(numbers)
self.all_groups = self.unique_groups()
def unique_groups(self):
all_groups = {}
all_numbers, size = self.all_numbers, self.size
for m in range(1, size+1):
for numbers in combinations(all_numbers, m):
if numbers in all_groups:
continue
all_groups[numbers] = Group(numbers, all_groups)
return all_groups
def walk(self):
for group in self.all_groups.values():
yield from group.calculations
class Group:
def __init__(self, numbers, all_groups):
self.numbers = numbers
self.size = len(numbers)
self.partitions = list(self.partition_into_unique_pairs(all_groups))
self.calculations = list(self.perform_calculations())
def __repr__(self):
return str(self.numbers)
def partition_into_unique_pairs(self, all_groups):
# The pairs are unordered: a pair (a, b) is equivalent to (b, a).
# Therefore, for pairs of equal length only half of all combinations
# need to be generated to obtain all pairs; this is set by the limit.
if self.size == 1:
return
numbers, size = self.numbers, self.size
limits = (self.halfbinom(size, size//2), )
unique_numbers = set()
for m, limit in zip_longest(range((size+1)//2, size), limits):
for numbers1, numbers2 in self.paired_combinations(numbers, m, limit):
if numbers1 in unique_numbers:
continue
unique_numbers.add(numbers1)
group1, group2 = all_groups[numbers1], all_groups[numbers2]
yield (group1, group2)
def perform_calculations(self):
if self.size == 1:
yield Calculation.singleton(self.numbers[0])
return
for group1, group2 in self.partitions:
for calc1, calc2 in product(group1.calculations, group2.calculations):
yield from Calculation.generate(calc1, calc2)
#classmethod
def paired_combinations(cls, numbers, m, limit):
for cnt, numbers1 in enumerate(combinations(numbers, m), 1):
numbers2 = tuple(cls.filtering(numbers, numbers1))
yield (numbers1, numbers2)
if cnt == limit:
return
#staticmethod
def filtering(iterable, elements):
# filter elements out of an iterable, return the remaining elements
elems = iter(elements)
k = next(elems, None)
for n in iterable:
if n == k:
k = next(elems, None)
else:
yield n
#staticmethod
#lru_cache()
def halfbinom(n, k):
if n % 2 == 1:
return None
prod = 1
for m, l in zip(reversed(range(n+1-k, n+1)), range(1, k+1)):
prod = (prod*m)//l
return prod//2
class Calculation:
def __init__(self, expression, result, is_singleton=False):
self.expr = expression
self.result = result
self.is_singleton = is_singleton
def __repr__(self):
return self.expr
#classmethod
def singleton(cls, n):
return cls(f"{n}", n, is_singleton=True)
#classmethod
def generate(cls, calca, calcb):
if calca.result < calcb.result:
calca, calcb = calcb, calca
for result, op in cls.operations(calca.result, calcb.result):
expr1 = f"{calca.expr}" if calca.is_singleton else f"({calca.expr})"
expr2 = f"{calcb.expr}" if calcb.is_singleton else f"({calcb.expr})"
yield cls(f"{expr1} {op} {expr2}", result)
#staticmethod
def operations(x, y):
yield (x + y, '+')
if x > y: # exclude non-positive results
yield (x - y, '-')
if y > 1 and x > 1: # exclude trivial results
yield (x * y, 'x')
if y > 1 and x % y == 0: # exclude trivial and non-integer results
yield (x // y, '/')
def countdown_solver():
# input: target and numbers. If you want to play with more or less than
# 6 numbers, use the second version of 'unsorted_numbers'.
try:
target = int(sys.argv[1])
unsorted_numbers = (int(sys.argv[n+2]) for n in range(6)) # for 6 numbers
# unsorted_numbers = (int(n) for n in sys.argv[2:]) # for any numbers
numbers = tuple(sorted(unsorted_numbers, reverse=True))
except (IndexError, ValueError):
print("You must provide a target and numbers!")
return
solutions = Solutions(numbers)
smallest_difference = target
bestresults = []
for calculation in solutions.walk():
diff = abs(calculation.result - target)
if diff <= smallest_difference:
if diff < smallest_difference:
bestresults = [calculation]
smallest_difference = diff
else:
bestresults.append(calculation)
output(target, smallest_difference, bestresults)
def output(target, diff, results):
print(f"\nThe closest results differ from {target} by {diff}. They are:\n")
for calculation in results:
print(f"{calculation.result} = {calculation.expr}")
if __name__ == "__main__":
countdown_solver()
The algorithm works as follows:
The numbers are put into a tuple of length 6 in descending order. Then, all unique subgroups of lengths 1 to 6 are created, the smallest groups first.
Example: (75, 50, 5, 9, 1, 1) -> {(75), (50), (9), (5), (1), (75, 50), (75, 9), (75, 5), ..., (75, 50, 9, 5, 1, 1)}.
Next, the groups are organised into a hierarchical tree: every group is partitioned into all unique unordered pairs of its non-empty subgroups.
Example: (9, 5, 1, 1) -> [(9, 5, 1) + (1), (9, 1, 1) + (5), (5, 1, 1) + (9), (9, 5) + (1, 1), (9, 1) + (5, 1)].
Within each group of numbers, the calculations are performed and the results are stored. For groups of length 1, the result is simply the number itself. For larger groups, the calculations are carried out on every pair of subgroups: in each pair, all results of the first subgroup are combined with all results of the second subgroup using +, -, x and /, and the valid outcomes are stored.
Example: (75, 5) consists of the pair ((75), (5)). The result of (75) is 75; the result of (5) is 5; the results of (75, 5) are [75+5=80, 75-5=70, 75*5=375, 75/5=15].
In this manner, all results are generated, from the smallest groups to the largest. Finally, the algorithm iterates through all results and selects the ones that are the closest match to the target number.
For a group of m numbers, the maximum number of arithmetic computations is
comps[m] = 4*sum(binom(m, k)*comps[k]*comps[m-k]//(1 + (2*k)//m) for k in range(1, m//2+1))
For all groups of length 1 to 6, the maximum total number of computations is then
total = sum(binom(n, m)*comps[m] for m in range(1, n+1))
which is 1144386. In practice, it will be much less, because the algorithm reuses the results of duplicate groups, ignores trivial operations (adding 0, multiplying by 1, etc), and because the rules of the game dictate that intermediate results must be positive integers (which limits the use of the division operator).
I think, you need to strictly define the problem first. What you are allowed to do and what you are not. You can start by making it simple and only allowing multiplication, division, substraction and addition.
Now you know your problem space- set of inputs, set of available operations and desired input. If you have only 4 operations and x inputs, the number of combinations is less than:
The number of order in which you can carry out operations (x!) times the possible choices of operations on every step: 4^x. As you can see for 6 numbers it gives reasonable 2949120 operations. This means that this may be your limit for brute force algorithm.
Once you have brute force and you know it works, you can start improving your algorithm with some sort of A* algorithm which would require you to define heuristic functions.
In my opinion the best way to think about it is as the search problem. The main difficulty will be finding good heuristics, or ways to reduce your problem space (if you have numbers that do not add up to the answer, you will need at least one multiplication etc.). Start small, build on that and ask follow up questions once you have some code.
I wrote a terminal application to do this:
https://github.com/pg328/CountdownNumbersGame/tree/main
Inside, I've included an illustration of the calculation of the size of the solution space (it's n*((n-1)!^2)*(2^n-1), so: n=6 -> 2,764,800. I know, gross), and more importantly why that is. My implementation is there if you care to check it out, but in case you don't I'll explain here.
Essentially, at worst it is brute force because as far as I know it's impossible to determine whether any specific branch will result in a valid answer without explicitly checking. Having said that, the average case is some fraction of that; it's {that number} divided by the number of valid solutions (I tend to see around 1000 on my program, where 10 or so are unique and the rest are permutations fo those 10). If I handwaved a number, I'd say roughly 2,765 branches to check which takes like no time. (Yes, even in Python.)
TL;DR: Even though the solution space is huge and it takes a couple million operations to find all solutions, only one answer is needed. Best route is brute force til you find one and spit it out.
I wrote a slightly simpler version:
for every combination of 2 (distinct) elements from the list and combine them using +,-,*,/ (note that since a>b then only a-b is needed and only a/b if a%b=0)
if combination is target then record solution
recursively call on the reduced lists
import sys
def driver():
try:
target = int(sys.argv[1])
nums = list((int(sys.argv[i+2]) for i in range(6)))
except (IndexError, ValueError):
print("Provide a list of 7 numbers")
return
solutions = list()
solve(target, nums, list(), solutions)
unique = set()
final = list()
for s in solutions:
a = '-'.join(sorted(s))
if not a in unique:
unique.add(a)
final.append(s)
for s in final: #print them out
print(s)
def solve(target, nums, path, solutions):
if len(nums) == 1:
return
distinct = sorted(list(set(nums)), reverse = True)
rem1 = list(distinct)
for n1 in distinct: #reduce list by combining a pair
rem1.remove(n1)
for n2 in rem1:
rem2 = list(nums) # in case of duplicates we need to start with full list and take out the n1,n2 pair of elements
rem2.remove(n1)
rem2.remove(n2)
combine(target, solutions, path, rem2, n1, n2, '+')
combine(target, solutions, path, rem2, n1, n2, '-')
if n2 > 1:
combine(target, solutions, path, rem2, n1, n2, '*')
if not n1 % n2:
combine(target, solutions, path, rem2, n1, n2, '//')
def combine(target, solutions, path, rem2, n1, n2, symb):
lst = list(rem2)
ans = eval("{0}{2}{1}".format(n1, n2, symb))
newpath = path + ["{0}{3}{1}={2}".format(n1, n2, ans, symb[0])]
if ans == target:
solutions += [newpath]
else:
lst.append(ans)
solve(target, lst, newpath, solutions)
if __name__ == "__main__":
driver()

MergeSort in scala

I came across another codechef problem which I am attempting to solve in Scala. The problem statement is as follows:
Stepford Street was a dead end street. The houses on Stepford Street
were bought by wealthy millionaires. They had them extensively altered
so that as one progressed along the street, the height of the
buildings increased rapidly. However, not all millionaires were
created equal. Some refused to follow this trend and kept their houses
at their original heights. The resulting progression of heights was
thus disturbed. A contest to locate the most ordered street was
announced by the Beverly Hills Municipal Corporation. The criteria for
the most ordered street was set as follows: If there exists a house
with a lower height later in the street than the house under
consideration, then the pair (current house, later house) counts as 1
point towards the disorderliness index of the street. It is not
necessary that the later house be adjacent to the current house. Note:
No two houses on a street will be of the same height For example, for
the input: 1 2 4 5 3 6 The pairs (4,3), (5,3) form disordered pairs.
Thus the disorderliness index of this array is 2. As the criteria for
determining the disorderliness is complex, the BHMC has requested your
help to automate the process. You need to write an efficient program
that calculates the disorderliness index of a street.
A sample input output provided is as follows:
Input: 1 2 4 5 3 6
Output: 2
The output is 2 because of two pairs (4,3) and (5,3)
To solve this problem I thought I should use a variant of MergeSort,incrementing by 1 when the left element is greater than the right element.
My scala code is as follows:
def dysfunctionCalc(input:List[Int]):Int = {
val leftHalf = input.size/2
println("HalfSize:"+leftHalf)
val isOdd = input.size%2
println("Is odd:"+isOdd)
val leftList = input.take(leftHalf+isOdd)
println("LeftList:"+leftList)
val rightList = input.drop(leftHalf+isOdd)
println("RightList:"+rightList)
if ((leftList.size <= 1) && (rightList.size <= 1)){
println("Entering input where both lists are <= 1")
if(leftList.size == 0 || rightList.size == 0){
println("One of the lists is less than 0")
0
}
else if(leftList.head > rightList.head)1 else 0
}
else{
println("Both lists are greater than 1")
dysfunctionCalc(leftList) + dysfunctionCalc(rightList)
}
}
First off, my logic is wrong,it doesn't have a merge stage and I am not sure what would be the best way to percolate the result of the base-case up the stack and compare it with the other values. Also, using recursion to solve this problem may not be the most optimal way to go since for large lists, I maybe blowing up the stack. Also, there might be stylistic issues with my code as well.
I would be great if somebody could point out other flaws and the right way to solve this problem.
Thanks
Suppose you split your list into three pieces: the item you are considering, those on the left, and those on the right. Suppose further that those on the left are in a sorted set. Now you just need to walk through the list, moving items from "right" to "considered" and from "considered" to "left"; at each point, you look at the size of the subset of the sorted set that is greater than your item. In general, the size lookup can be done in O(log(N)) as can the add-element (with a Red-Black or AVL tree, for instance). So you have O(N log N) performance.
Now the question is how to implement this in Scala efficiently. It turns out that Scala has a Red-Black tree used for its TreeSet sorted set, and the implementation is actually quite simple (here in tail-recursive form):
import collection.immutable.TreeSet
final def calcDisorder(xs: List[Int], left: TreeSet[Int] = TreeSet.empty, n: Int = 0): Int = xs match {
case Nil => n
case x :: rest => calcDisorder(rest, left + x, n + left.from(x).size)
}
Unfortunately, left.from(x).size takes O(N) time (I believe), which yields a quadratic execution time. That's no good--what you need is an IndexedTreeSet which can do indexOf(x) in O(log(n)) (and then iterate with n + left.size - left.indexOf(x) - 1). You can build your own implementation or find one on the web. For instance, I found one here (API here) for Java that does exactly the right thing.
Incidentally, the problem with doing a mergesort is that you cannot easily work cumulatively. With merging a pair, you can keep track of how out-of-order it is. But when you merge in a third list, you must see how out of order it is with respect to both other lists, which spoils your divide-and-conquer strategy. (I am not sure whether there is some invariant one could find that would allow you to calculate directly if you kept track of it.)
Here is my try, I don't use MergeSort but it seems to solve the problem:
def calcDisorderness(myList:List[Int]):Int = myList match{
case Nil => 0
case t::q => q.count(_<t) + calcDisorderness(q)
}
scala> val input = List(1,2,4,5,3,6)
input: List[Int] = List(1, 2, 4, 5, 3, 6)
scala> calcDisorderness(input)
res1: Int = 2
The question is, is there a way to have a lower complexity?
Edit: tail recursive version of the same function and cool usage of default values in function arguments.
def calcDisorderness(myList:List[Int], disorder:Int=0):Int = myList match{
case Nil => disorder
case t::q => calcDisorderness(q, disorder + q.count(_<t))
}
A solution based on Merge Sort. Not super fast, potential slowdown could be in "xs.length".
def countSwaps(a: Array[Int]): Long = {
var disorder: Long = 0
def msort(xs: List[Int]): List[Int] = {
import Stream._
def merge(left: List[Int], right: List[Int], inc: Int): Stream[Int] = {
(left, right) match {
case (x :: xs, y :: ys) if x > y =>
cons(y, merge(left, ys, inc + 1))
case (x :: xs, _) => {
disorder += inc
cons(x, merge(xs, right, inc))
}
case _ => right.toStream
}
}
val n = xs.length / 2
if (n == 0)
xs
else {
val (ys, zs) = xs splitAt n
merge(msort(ys), msort(zs), 0).toList
}
}
msort(a.toList)
disorder
}
Another solution based on Merge Sort. Very fast: no FP or for-loop.
def countSwaps(a: Array[Int]): Count = {
var swaps: Count = 0
def mergeRun(begin: Int, run_len: Int, src: Array[Int], dst: Array[Int]) = {
var li = begin
val lend = math.min(begin + run_len, src.length)
var ri = begin + run_len
val rend = math.min(begin + run_len * 2, src.length)
var ti = begin
while (ti < rend) {
if (ri >= rend) {
dst(ti) = src(li); li += 1
swaps += ri - begin - run_len
} else if (li >= lend) {
dst(ti) = src(ri); ri += 1
} else if (a(li) <= a(ri)) {
dst(ti) = src(li); li += 1
swaps += ri - begin - run_len
} else {
dst(ti) = src(ri); ri += 1
}
ti += 1
}
}
val b = new Array[Int](a.length)
var run = 0
var run_len = 1
while (run_len < a.length) {
var begin = 0
while (begin < a.length) {
val (src, dst) = if (run % 2 == 0) (a, b) else (b, a)
mergeRun(begin, run_len, src, dst)
begin += run_len * 2
}
run += 1
run_len *= 2
}
swaps
}
Convert the above code to Functional style: no mutable variable, no loop.
All recursions are tail calls, thus the performance is good.
def countSwaps(a: Array[Int]): Count = {
def mergeRun(li: Int, lend: Int, rb: Int, ri: Int, rend: Int, di: Int, src: Array[Int], dst: Array[Int], swaps: Count): Count = {
if (ri >= rend && li >= lend) {
swaps
} else if (ri >= rend) {
dst(di) = src(li)
mergeRun(li + 1, lend, rb, ri, rend, di + 1, src, dst, ri - rb + swaps)
} else if (li >= lend) {
dst(di) = src(ri)
mergeRun(li, lend, rb, ri + 1, rend, di + 1, src, dst, swaps)
} else if (src(li) <= src(ri)) {
dst(di) = src(li)
mergeRun(li + 1, lend, rb, ri, rend, di + 1, src, dst, ri - rb + swaps)
} else {
dst(di) = src(ri)
mergeRun(li, lend, rb, ri + 1, rend, di + 1, src, dst, swaps)
}
}
val b = new Array[Int](a.length)
def merge(run: Int, run_len: Int, lb: Int, swaps: Count): Count = {
if (run_len >= a.length) {
swaps
} else if (lb >= a.length) {
merge(run + 1, run_len * 2, 0, swaps)
} else {
val lend = math.min(lb + run_len, a.length)
val rb = lb + run_len
val rend = math.min(rb + run_len, a.length)
val (src, dst) = if (run % 2 == 0) (a, b) else (b, a)
val inc_swaps = mergeRun(lb, lend, rb, rb, rend, lb, src, dst, 0)
merge(run, run_len, lb + run_len * 2, inc_swaps + swaps)
}
}
merge(0, 1, 0, 0)
}
It seems to me that the key is to break the list into a series of ascending sequences. For example, your example would be broken into (1 2 4 5)(3 6). None of the items in the first list can end a pair. Now you do a kind of merge of these two lists, working backwards:
6 > 5, so 6 can't be in any pairs
3 < 5, so its a pair
3 < 4, so its a pair
3 > 2, so we're done
I'm not clear from the definition on how to handle more than 2 such sequences.

Resources