which solution is better in terms of space.time complexity? - algorithm

i have 2 lists of integers. they are both sorted already. I want to find the elements (one from each list) that add up to a given number.
-first idea is to iterate over first list and use binary search to look for the number needed to sum to the given number. i know this will take nlogn time.
the other is to store one of the lists in a hashtable/map (i dont really know the difference) and iterate over other list and look up the needed value. does this take n time? and n memory?
overall which would be better?

You are comparing it right. But both has different aspects. Hashing is not a good choice if you have memory constraints. But if you have plenty of memory then yes, you can afford to do that.
Also you will see many times in Computer Science the notion of space-time tradeoff. It will always be some gain by losing some. Hashing runs in O(n) and space complexity is O(n). But in case of searching only O(nlogn) time complexity but space complexity is O(1)
Long story short, scenario lets you decide which one to select. I have shown just one aspect. There can be many. Know the constraints and tradeoffs of each and you will be able to decide it.
A better solution : (Time complexity: O(n) Space complexity: O(1))
Suppose there are 2 array a and b.
Now WLOG suppose a is sorted in ascending and another in descending (Even if it is not the case we can traverse it accordingly).
index1=0;index2=0; // considered 0 indexing
while(index1 <= N1-1 && index2 <= N2-1)
{
if ((a[index1] + b[index2]) == x)
// success
else if ((a[index1] + b[index2]) > x)
index2++;
else
index1++;
}
//failure no such element.

Sort list A in ascending order, and list B in descending order. Set a = 1 and b = 1.
If A[a] + B[b] = T, record the pair, increment a, and repeat.
Otherwise, A[a] + B[b] < T, increment a, and repeat from 1.
Otherwise, A[a] + B[b] > T, increment b, and repeat from 1.
Naturally, if a or b exceeds the size of A or B, respectively, terminate.
Example:
A = 1, 2, 2, 6, 8, 10, 11
B = 9, 8, 4, 3, 1, 1
T = 10
a = 1, b = 1
A[a] + B[b] = A[1] + B[1] = 10; record; a = a + 1 = 2; repeat.
A[a] + B[b] = A[2] + B[1] = 11; b = b + 1 = 2; repeat.
A[a] + B[b] = A[2] + B[2] = 10; record; a = a + 1 = 3; repeat.
A[a] + B[b] = A[3] + B[2] = 10; record; a = a + 1 = 4; repeat.
A[a] + B[b] = A[4] + B[2] = 14; b = b + 1 = 3; repeat.
A[a] + B[b] = A[4] + B[3] = 10; record; a = a + 1 = 5; repeat.
A[a] + B[b] = A[5] + B[3] = 12; b = b + 1 = 4; repeat.
A[a] + B[b] = A[5] + B[4] = 11; b = b + 1 = 5; repeat.
A[a] + B[b] = A[5] + B[5] = 9; a = a + 1 = 6; repeat.
A[a] + B[b] = A[6] + B[5] = 11; b = b + 1 = 6; repeat.
A[a] + B[b] = A[6] + B[6] = 11; b = b + 1 = 7; repeat.
Terminate.
You can do this without additional space if instead of having B sorted in descending order, you set b = |B| and decrement it instead of incrementing it, effectively reading it backwards.
The above procedure misses out on some duplicate answers where B has a string of duplicate values, for instance:
A = 2, 2, 2
B = 8, 8, 8
The algorithm as described above will yield three pairs, but you might want nine. This can be fixed by detecting this case, keeping separate counters ca and cb for the lengths of the runs of A[a] and B[b] you have seen, and adding ca * cb - ca copies of the last pair you added to the bag. In this example:
A = 2, 2, 2
B = 8, 8, 8
a = 1, b = 1
ca = 0, cb = 0
A[a] + B[b] = 10; record pair, a = a + 1 = 2, ca = ca + 1 = 2, repeat.
A[a] + B[b] = 10; record pair, a = a + 1 = 3, ca = ca + 1 = 2, repeat.
A[a] + B[b] = 10; record pair, a = a + 1 = 4;
a exceeds bounds, value of A[a] changed;
increment b to count run of B's;
b = b + 1 = 2, cb = cb + 1 = 2
b = b + 1 = 3, cb = cb + 1 = 3
b = b + 1 = 4;
b exceeds bounds, value of B[b] changed;
add ca * cb - ca = 3 * 3 - 3 = 6 copies of pair (2, 8).

Related

Min Abs Sum task from codility

There is already a topic about this task, but I'd like to ask about my specific approach.
The task is:
Let A be a non-empty array consisting of N integers.
The abs sum of two for a pair of indices (P, Q) is the absolute value
|A[P] + A[Q]|, for 0 ≤ P ≤ Q < N.
For example, the following array A:
A[0] = 1 A1 = 4 A[2] = -3 has pairs of indices (0, 0), (0,
1), (0, 2), (1, 1), (1, 2), (2, 2). The abs sum of two for the pair
(0, 0) is A[0] + A[0] = |1 + 1| = 2. The abs sum of two for the pair
(0, 1) is A[0] + A1 = |1 + 4| = 5. The abs sum of two for the pair
(0, 2) is A[0] + A[2] = |1 + (−3)| = 2. The abs sum of two for the
pair (1, 1) is A1 + A1 = |4 + 4| = 8. The abs sum of two for the
pair (1, 2) is A1 + A[2] = |4 + (−3)| = 1. The abs sum of two for
the pair (2, 2) is A[2] + A[2] = |(−3) + (−3)| = 6. Write a function:
def solution(A)
that, given a non-empty array A consisting of N integers, returns the
minimal abs sum of two for any pair of indices in this array.
For example, given the following array A:
A[0] = 1 A1 = 4 A[2] = -3 the function should return 1, as
explained above.
Given array A:
A[0] = -8 A1 = 4 A[2] = 5 A[3] =-10 A[4] = 3 the
function should return |(−8) + 5| = 3.
Write an efficient algorithm for the following assumptions:
N is an integer within the range [1..100,000]; each element of array A
is an integer within the range [−1,000,000,000..1,000,000,000].
The official solution is O(N*M^2), but I think it could be solved in O(N).
My approach is to first get rid of duplicates and sort the array. Then we check both ends and sompare the abs sum moving the ends by one towards each other. We try to move the left end, the right one or both. If this doesn't improve the result, our sum is the lowest. My code is:
def solution(A):
A = list(set(A))
n = len(A)
A.sort()
beg = 0
end = n - 1
min_sum = abs(A[beg] + A[end])
while True:
min_left = abs(A[beg+1] + A[end]) if beg+1 < n else float('inf')
min_right = abs(A[beg] + A[end-1]) if end-1 >= 0 else float('inf')
min_both = abs(A[beg+1] + A[end-1]) if beg+1 < n and end-1 >= 0 else float('inf')
min_all = min([min_left, min_right, min_both])
if min_sum <= min_all:
return min_sum
if min_left == min_all:
beg += 1
min_sum = min_left
elif min_right == min_all:
end -= 1
min_sum = min_right
else:
beg += 1
end -= 1
min_sum = min_both
It passes almost all of the tests, but not all. Is there some bug in my code or the approach is wrong?
EDIT:
After the aka.nice answer I was able to fix the code. It scores 100% now.
def solution(A):
A = list(set(A))
n = len(A)
A.sort()
beg = 0
end = n - 1
min_sum = abs(A[beg] + A[end])
while beg <= end:
min_left = abs(A[beg+1] + A[end]) if beg+1 < n else float('inf')
min_right = abs(A[beg] + A[end-1]) if end-1 >= 0 else float('inf')
min_all = min(min_left, min_right)
if min_all < min_sum:
min_sum = min_all
if min_left <= min_all:
beg += 1
else:
end -= 1
return min_sum
Just take this example for array A
-11 -5 -2 5 6 8 12
and execute your algorithm step by step, you get a premature return:
beg=0
end=6
min_sum=1
min_left=7
min_right=3
min_both=3
min_all=3
return min_sum
though there is a better solution abs(5-5)=0.
Hint: you should check the sign of A[beg] and A[end] to decide whether to continue or exit the loop. What to do if both >= 0, if both <= 0, else ?
Note that A.sort() has a non neglectable cost, likely O(N*log(N)), it will dominate the cost of the solution you exhibit.
By the way, what is M in the official cost O(N*M^2)?
And the link you provide is another problem (sum all the elements of A or their opposite).

Bounded square sum algorithm

The problem goes as follows:
You are given two arrays of integers a and b, and two integers lower and upper.
Your task is to find the number of pairs (i, j) such that lower ≤ a[i] * a[i] + b[j] * b[j] ≤ upper.
Example:
For a = [3, -1, 9], b = [100, 5, -2], lower = 7, and upper = 99, the output should be boundedSquareSum(a, b, lower, upper) = 4.
There are only four pairs that satisfy the requirement:
If i = 0 and j = 1, then a[0] = 3, b[1] = 5, and 7 ≤ 3 * 3 + 5 * 5 = 9 + 25 = 36 ≤ 99.
If i = 0 and j = 2, then a[0] = 3, b[2] = -2, and 7 ≤ 3 * 3 + (-2) * (-2) = 9 + 4 = 13 ≤ 99.
If i = 1 and j = 1, then a[1] = -1, b[1] = 5, and 7 ≤ (-1) * (-1) + 5 * 5 = 1 + 25 = 26 ≤ 99.
If i = 2 and j = 2, then a[2] = 9, b[2] = -2, and 7 ≤ 9 * 9 + (-2) * (-2) = 81 + 4 = 85 ≤ 99.
For a = [1, 2, 3, -1, -2, -3], b = [10], lower = 0, and upper = 100, the output should be boundedSquareSum(a, b, lower, upper) = 0.
Since the array b contains only one element 10 and the array a does not contain 0, it is not possible to satisfy 0 ≤ a[i] * a[i] + 10 * 10 ≤ 100.
Now, I know there is a brute force way to solve this, but what would be the optimal solution for this problem?
Sort the smaller array using the absolute value of the elements, then for each element in the unsorted array, binary search the interval on the sorted one.
You can break loop when calculation goes higher than upper limit.
I will reduce execution time.
function boundedSquareSum(a, b, lower, upper) {
let result = 0;
a = a.sort((i,j) => Math.abs(i) - Math.abs(j));
b = b.sort((i,j) => Math.abs(i) - Math.abs(j))
for(let i = 0; i < a.length; i++) {
let aValue = a[i] ** 2;
if(aValue > upper) {
break; // Don't need to check further
}
for(let j = 0; j < b.length; j++) {
let bValue = b[j] ** 2;
let total = aValue + bValue;
if(total > upper) {
break; // Don't need to check further
}
if((total >= lower && total <= upper) ) {
result++;
}
}
}
return result;
}

Given two sequences, find the maximal overlap between ending of one and beginning of the other

I need to find an efficient (pseudo)code to solve the following problem:
Given two sequences of (not necessarily distinct) integers (a[1], a[2], ..., a[n]) and (b[1], b[2], ..., b[n]), find the maximum d such that a[n-d+1] == b[1], a[n-d+2] == b[2], ..., and a[n] == b[d].
This is not homework, I actually came up with this when trying to contract two tensors along as many dimensions as possible. I suspect an efficient algorithm exists (maybe O(n)?), but I cannot come up with something that is not O(n^2). The O(n^2) approach would be the obvious loop on d and then an inner loop on the items to check the required condition until hitting the maximum d. But I suspect something better than this is possible.
You can utilize the z algorithm, a linear time (O(n)) algorithm that:
Given a string S of length n, the Z Algorithm produces an array Z
where Z[i] is the length of the longest substring starting from S[i]
which is also a prefix of S
You need to concatenate your arrays (b+a) and run the algorithm on the resulting constructed array till the first i such that Z[i]+i == m+n.
For example, for a = [1, 2, 3, 6, 2, 3] & b = [2, 3, 6, 2, 1, 0], the concatenation would be [2, 3, 6, 2, 1, 0, 1, 2, 3, 6, 2, 3] which would yield Z[10] = 2 fulfilling Z[i] + i = 12 = m + n.
For O(n) time/space complexity, the trick is to evaluate hashes for each subsequence. Consider the array b:
[b1 b2 b3 ... bn]
Using Horner's method, you can evaluate all the possible hashes for each subsequence. Pick a base value B (bigger than any value in both of your arrays):
from b1 to b1 = b1 * B^1
from b1 to b2 = b1 * B^1 + b2 * B^2
from b1 to b3 = b1 * B^1 + b2 * B^2 + b3 * B^3
...
from b1 to bn = b1 * B^1 + b2 * B^2 + b3 * B^3 + ... + bn * B^n
Note that you can evaluate each sequence in O(1) time, using the result of the previous sequence, hence all the job costs O(n).
Now you have an array Hb = [h(b1), h(b2), ... , h(bn)], where Hb[i] is the hash from b1 until bi.
Do the same thing for the array a, but with a little trick:
from an to an = (an * B^1)
from an-1 to an = (an-1 * B^1) + (an * B^2)
from an-2 to an = (an-2 * B^1) + (an-1 * B^2) + (an * B^3)
...
from a1 to an = (a1 * B^1) + (a2 * B^2) + (a3 * B^3) + ... + (an * B^n)
You must note that, when you step from one sequence to another, you multiply the whole previous sequence by B and add the new value multiplied by B. For example:
from an to an = (an * B^1)
for the next sequence, multiply the previous by B: (an * B^1) * B = (an * B^2)
now sum with the new value multiplied by B: (an-1 * B^1) + (an * B^2)
hence:
from an-1 to an = (an-1 * B^1) + (an * B^2)
Now you have an array Ha = [h(an), h(an-1), ... , h(a1)], where Ha[i] is the hash from ai until an.
Now, you can compare Ha[d] == Hb[d] for all d values from n to 1, if they match, you have your answer.
ATTENTION: this is a hash method, the values can be large and you may have to use a fast exponentiation method and modular arithmetics, which may (hardly) give you collisions, making this method not totally safe. A good practice is to pick a base B as a really big prime number (at least bigger than the biggest value in your arrays). You should also be careful as the limits of the numbers may overflow at each step, so you'll have to use (modulo K) in each operation (where K can be a prime bigger than B).
This means that two different sequences might have the same hash, but two equal sequences will always have the same hash.
This can indeed be done in linear time, O(n), and O(n) extra space. I will assume the input arrays are character strings, but this is not essential.
A naive method would -- after matching k characters that are equal -- find a character that does not match, and go back k-1 units in a, reset the index in b, and then start the matching process from there. This clearly represents a O(n²) worst case.
To avoid this backtracking process, we can observe that going back is not useful if we have not encountered the b[0] character while scanning the last k-1 characters. If we did find that character, then backtracking to that position would only be useful, if in that k sized substring we had a periodic repetition.
For instance, if we look at substring "abcabc" somewhere in a, and b is "abcabd", and we find that the final character of b does not match, we must consider that a successful match might start at the second "a" in the substring, and we should move our current index in b back accordingly before continuing the comparison.
The idea is then to do some preprocessing based on string b to log back-references in b that are useful to check when there is a mismatch. So for instance, if b is "acaacaacd", we could identify these 0-based backreferences (put below each character):
index: 0 1 2 3 4 5 6 7 8
b: a c a a c a a c d
ref: 0 0 0 1 0 0 1 0 5
For example, if we have a equal to "acaacaaca" the first mismatch happens on the final character. The above information then tells the algorithm to go back in b to index 5, since "acaac" is common. And then with only changing the current index in b we can continue the matching at the current index of a. In this example the match of the final character then succeeds.
With this we can optimise the search and make sure that the index in a can always progress forwards.
Here is an implementation of that idea in JavaScript, using the most basic syntax of that language only:
function overlapCount(a, b) {
// Deal with cases where the strings differ in length
let startA = 0;
if (a.length > b.length) startA = a.length - b.length;
let endB = b.length;
if (a.length < b.length) endB = a.length;
// Create a back-reference for each index
// that should be followed in case of a mismatch.
// We only need B to make these references:
let map = Array(endB);
let k = 0; // Index that lags behind j
map[0] = 0;
for (let j = 1; j < endB; j++) {
if (b[j] == b[k]) {
map[j] = map[k]; // skip over the same character (optional optimisation)
} else {
map[j] = k;
}
while (k > 0 && b[j] != b[k]) k = map[k];
if (b[j] == b[k]) k++;
}
// Phase 2: use these references while iterating over A
k = 0;
for (let i = startA; i < a.length; i++) {
while (k > 0 && a[i] != b[k]) k = map[k];
if (a[i] == b[k]) k++;
}
return k;
}
console.log(overlapCount("ababaaaabaabab", "abaababaaz")); // 7
Although there are nested while loops, these do not have more iterations in total than n. This is because the value of k strictly decreases in the while body, and cannot become negative. This can only happen when k++ was executed that many times to give enough room for such decreases. So all in all, there cannot be more executions of the while body than there are k++ executions, and the latter is clearly O(n).
To complete, here you can find the same code as above, but in an interactive snippet: you can input your own strings and see the result interactively:
function overlapCount(a, b) {
// Deal with cases where the strings differ in length
let startA = 0;
if (a.length > b.length) startA = a.length - b.length;
let endB = b.length;
if (a.length < b.length) endB = a.length;
// Create a back-reference for each index
// that should be followed in case of a mismatch.
// We only need B to make these references:
let map = Array(endB);
let k = 0; // Index that lags behind j
map[0] = 0;
for (let j = 1; j < endB; j++) {
if (b[j] == b[k]) {
map[j] = map[k]; // skip over the same character (optional optimisation)
} else {
map[j] = k;
}
while (k > 0 && b[j] != b[k]) k = map[k];
if (b[j] == b[k]) k++;
}
// Phase 2: use these references while iterating over A
k = 0;
for (let i = startA; i < a.length; i++) {
while (k > 0 && a[i] != b[k]) k = map[k];
if (a[i] == b[k]) k++;
}
return k;
}
// I/O handling
let [inputA, inputB] = document.querySelectorAll("input");
let output = document.querySelector("pre");
function refresh() {
let a = inputA.value;
let b = inputB.value;
let count = overlapCount(a, b);
let padding = a.length - count;
// Apply some HTML formatting to highlight the overlap:
if (count) {
a = a.slice(0, -count) + "<b>" + a.slice(-count) + "</b>";
b = "<b>" + b.slice(0, count) + "</b>" + b.slice(count);
}
output.innerHTML = count + " overlapping characters:\n" +
a + "\n" +
" ".repeat(padding) + b;
}
document.addEventListener("input", refresh);
refresh();
body { font-family: monospace }
b { background:yellow }
input { width: 90% }
a: <input value="acacaacaa"><br>
b: <input value="acaacaacd"><br>
<pre></pre>

positional sum of 2 numbers

How to sum 2 numbers digit by digit with pseudo code?
Note: You don't know the length of the numbers - if it has tens, hundreds, thousands...
Units should be add to units, tens to tens, hundreds to hundreds.....
If there is a value >= 10 in adding the units you need to put the value of that ten with "the tens"....
I tried
Start
Do
Add digit(x) in A to Sum(x)
Add digit(x) in B to Sum(x)
If Sum(x) > 9, then (?????)
digit(x) = digit(x+1)
while digit(x) in A and digit(x) in B is > 0
How to show the result?
I am lost with that.....
Please help!
Try this,
n = minDigit(a, b) where a and b are the numbers.
let sum be a number.
m = maxDigit(a,b)
allocate maxDigit(a,b) + 1 memory for sum
carry = 0;
for (i = 1 to n)
temp = a[i] + b[i] + carry
// reset carry
carry = 0
if (temp > 10)
carry = 1
temp = temp - 10;
sum[i] = temp
// one last step to get the leftover carry
if (digits(a) == digits(b)
sum[n + 1] = carry
return
if (digits(a) > digits(b)
toCopy = a
else
toCopy = b
for (i = n to m)
temp = toCopy[i] + carry
// reset carry
carry = 0
if (temp > 10)
carry = 1
temp = temp - 10;
sum[i] = temp
Let me know if it helps
A and B are the integers you want to sum.
Note that the while loop ends when all the three integers are equal to zero.
carry = 0
sum = 0
d = 1
while (A > 0 or B > 0 or carry > 0)
tmp = carry + A mod 10 + B mod 10
sum = sum + (tmp mod 10) * d
carry = tmp / 10
d = d * 10
A = A / 10
B = B / 10

How to calculate the index (lexicographical order) when the combination is given

I know that there is an algorithm that permits, given a combination of number (no repetitions, no order), calculates the index of the lexicographic order.
It would be very useful for my application to speedup things...
For example:
combination(10, 5)
1 - 1 2 3 4 5
2 - 1 2 3 4 6
3 - 1 2 3 4 7
....
251 - 5 7 8 9 10
252 - 6 7 8 9 10
I need that the algorithm returns the index of the given combination.
es: index( 2, 5, 7, 8, 10 ) --> index
EDIT: actually I'm using a java application that generates all combinations C(53, 5) and inserts them into a TreeMap.
My idea is to create an array that contains all combinations (and related data) that I can index with this algorithm.
Everything is to speedup combination searching.
However I tried some (not all) of your solutions and the algorithms that you proposed are slower that a get() from TreeMap.
If it helps: my needs are for a combination of 5 from 53 starting from 0 to 52.
Thank you again to all :-)
Here is a snippet that will do the work.
#include <iostream>
int main()
{
const int n = 10;
const int k = 5;
int combination[k] = {2, 5, 7, 8, 10};
int index = 0;
int j = 0;
for (int i = 0; i != k; ++i)
{
for (++j; j != combination[i]; ++j)
{
index += c(n - j, k - i - 1);
}
}
std::cout << index + 1 << std::endl;
return 0;
}
It assumes you have a function
int c(int n, int k);
that will return the number of combinations of choosing k elements out of n elements.
The loop calculates the number of combinations preceding the given combination.
By adding one at the end we get the actual index.
For the given combination there are
c(9, 4) = 126 combinations containing 1 and hence preceding it in lexicographic order.
Of the combinations containing 2 as the smallest number there are
c(7, 3) = 35 combinations having 3 as the second smallest number
c(6, 3) = 20 combinations having 4 as the second smallest number
All of these are preceding the given combination.
Of the combinations containing 2 and 5 as the two smallest numbers there are
c(4, 2) = 6 combinations having 6 as the third smallest number.
All of these are preceding the given combination.
Etc.
If you put a print statement in the inner loop you will get the numbers
126, 35, 20, 6, 1.
Hope that explains the code.
Convert your number selections to a factorial base number. This number will be the index you want. Technically this calculates the lexicographical index of all permutations, but if you only give it combinations, the indexes will still be well ordered, just with some large gaps for all the permutations that come in between each combination.
Edit: pseudocode removed, it was incorrect, but the method above should work. Too tired to come up with correct pseudocode at the moment.
Edit 2: Here's an example. Say we were choosing a combination of 5 elements from a set of 10 elements, like in your example above. If the combination was 2 3 4 6 8, you would get the related factorial base number like so:
Take the unselected elements and count how many you have to pass by to get to the one you are selecting.
1 2 3 4 5 6 7 8 9 10
2 -> 1
1 3 4 5 6 7 8 9 10
3 -> 1
1 4 5 6 7 8 9 10
4 -> 1
1 5 6 7 8 9 10
6 -> 2
1 5 7 8 9 10
8 -> 3
So the index in factorial base is 1112300000
In decimal base, it's
1*9! + 1*8! + 1*7! + 2*6! + 3*5! = 410040
This is Algorithm 2.7 kSubsetLexRank on page 44 of Combinatorial Algorithms by Kreher and Stinson.
r = 0
t[0] = 0
for i from 1 to k
if t[i - 1] + 1 <= t[i] - 1
for j from t[i - 1] to t[i] - 1
r = r + choose(n - j, k - i)
return r
The array t holds your values, for example [5 7 8 9 10]. The function choose(n, k) calculates the number "n choose k". The result value r will be the index, 251 for the example. Other inputs are n and k, for the example they would be 10 and 5.
zero-base,
# v: array of length k consisting of numbers between 0 and n-1 (ascending)
def index_of_combination(n,k,v):
idx = 0
for p in range(k-1):
if p == 0: arrg = range(1,v[p]+1)
else: arrg = range(v[p-1]+2, v[p]+1)
for a in arrg:
idx += combi[n-a, k-1-p]
idx += v[k-1] - v[k-2] - 1
return idx
Null Set has the right approach. The index corresponds to the factorial-base number of the sequence. You build a factorial-base number just like any other base number, except that the base decreases for each digit.
Now, the value of each digit in the factorial-base number is the number of elements less than it that have not yet been used. So, for combination(10, 5):
(1 2 3 4 5) == 0*9!/5! + 0*8!/5! + 0*7!/5! + 0*6!/5! + 0*5!/5!
== 0*3024 + 0*336 + 0*42 + 0*6 + 0*1
== 0
(10 9 8 7 6) == 9*3024 + 8*336 + 7*42 + 6*6 + 5*1
== 30239
It should be pretty easy to calculate the index incrementally.
If you have a set of positive integers 0<=x_1 < x_2< ... < x_k , then you could use something called the squashed order:
I = sum(j=1..k) Choose(x_j,j)
The beauty of the squashed order is that it works independent of the largest value in the parent set.
The squashed order is not the order you are looking for, but it is related.
To use the squashed order to get the lexicographic order in the set of k-subsets of {1,...,n) is by taking
1 <= x1 < ... < x_k <=n
compute
0 <= n-x_k < n-x_(k-1) ... < n-x_1
Then compute the squashed order index of (n-x_k,...,n-k_1)
Then subtract the squashed order index from Choose(n,k) to get your result, which is the lexicographic index.
If you have relatively small values of n and k, you can cache all the values Choose(a,b) with a
See Anderson, Combinatorics on Finite Sets, pp 112-119
I needed also the same for a project of mine and the fastest solution I found was (Python):
import math
def nCr(n,r):
f = math.factorial
return f(n) / f(r) / f(n-r)
def index(comb,n,k):
r=nCr(n,k)
for i in range(k):
if n-comb[i]<k-i:continue
r=r-nCr(n-comb[i],k-i)
return r
My input "comb" contained elements in increasing order You can test the code with for example:
import itertools
k=3
t=[1,2,3,4,5]
for x in itertools.combinations(t, k):
print x,index(x,len(t),k)
It is not hard to prove that if comb=(a1,a2,a3...,ak) (in increasing order) then:
index=[nCk-(n-a1+1)Ck] + [(n-a1)C(k-1)-(n-a2+1)C(k-1)] + ... =
nCk -(n-a1)Ck -(n-a2)C(k-1) - .... -(n-ak)C1
There's another way to do all this. You could generate all possible combinations and write them into a binary file where each comb is represented by it's index starting from zero. Then, when you need to find an index, and the combination is given, you apply a binary search on the file. Here's the function. It's written in VB.NET 2010 for my lotto program, it works with Israel lottery system so there's a bonus (7th) number; just ignore it.
Public Function Comb2Index( _
ByVal gAr() As Byte) As UInt32
Dim mxPntr As UInt32 = WHL.AMT.WHL_SYS_00 '(16.273.488)
Dim mdPntr As UInt32 = mxPntr \ 2
Dim eqCntr As Byte
Dim rdAr() As Byte
modBinary.OpenFile(WHL.WHL_SYS_00, _
FileMode.Open, FileAccess.Read)
Do
modBinary.ReadBlock(mdPntr, rdAr)
RP: If eqCntr = 7 Then GoTo EX
If gAr(eqCntr) = rdAr(eqCntr) Then
eqCntr += 1
GoTo RP
ElseIf gAr(eqCntr) < rdAr(eqCntr) Then
If eqCntr > 0 Then eqCntr = 0
mxPntr = mdPntr
mdPntr \= 2
ElseIf gAr(eqCntr) > rdAr(eqCntr) Then
If eqCntr > 0 Then eqCntr = 0
mdPntr += (mxPntr - mdPntr) \ 2
End If
Loop Until eqCntr = 7
EX: modBinary.CloseFile()
Return mdPntr
End Function
P.S. It takes 5 to 10 mins to generate 16 million combs on a Core 2 Duo. To find the index using binary search on file takes 397 milliseconds on a SATA drive.
Assuming the maximum setSize is not too large, you can simply generate a lookup table, where the inputs are encoded this way:
int index(a,b,c,...)
{
int key = 0;
key |= 1<<a;
key |= 1<<b;
key |= 1<<c;
//repeat for all arguments
return Lookup[key];
}
To generate the lookup table, look at this "banker's order" algorithm. Generate all the combinations, and also store the base index for each nItems. (For the example on p6, this would be [0,1,5,11,15]). Note that by you storing the answers in the opposite order from the example (LSBs set first) you will only need one table, sized for the largest possible set.
Populate the lookup table by walking through the combinations doing Lookup[combination[i]]=i-baseIdx[nItems]
EDIT: Never mind. This is completely wrong.
Let your combination be (a1, a2, ..., ak-1, ak) where a1 < a2 < ... < ak. Let choose(a,b) = a!/(b!*(a-b)!) if a >= b and 0 otherwise. Then, the index you are looking for is
choose(ak-1, k) + choose(ak-1-1, k-1) + choose(ak-2-1, k-2) + ... + choose (a2-1, 2) + choose (a1-1, 1) + 1
The first term counts the number of k-element combinations such that the largest element is less than ak. The second term counts the number of (k-1)-element combinations such that the largest element is less than ak-1. And, so on.
Notice that the size of the universe of elements to be chosen from (10 in your example) does not play a role in the computation of the index. Can you see why?
Sample solution:
class Program
{
static void Main(string[] args)
{
// The input
var n = 5;
var t = new[] { 2, 4, 5 };
// Helping transformations
ComputeDistances(t);
CorrectDistances(t);
// The algorithm
var r = CalculateRank(t, n);
Console.WriteLine("n = 5");
Console.WriteLine("t = {2, 4, 5}");
Console.WriteLine("r = {0}", r);
Console.ReadKey();
}
static void ComputeDistances(int[] t)
{
var k = t.Length;
while (--k >= 0)
t[k] -= (k + 1);
}
static void CorrectDistances(int[] t)
{
var k = t.Length;
while (--k > 0)
t[k] -= t[k - 1];
}
static int CalculateRank(int[] t, int n)
{
int k = t.Length - 1, r = 0;
for (var i = 0; i < t.Length; i++)
{
if (t[i] == 0)
{
n--;
k--;
continue;
}
for (var j = 0; j < t[i]; j++)
{
n--;
r += CalculateBinomialCoefficient(n, k);
}
n--;
k--;
}
return r;
}
static int CalculateBinomialCoefficient(int n, int k)
{
int i, l = 1, m, x, y;
if (n - k < k)
{
x = k;
y = n - k;
}
else
{
x = n - k;
y = k;
}
for (i = x + 1; i <= n; i++)
l *= i;
m = CalculateFactorial(y);
return l/m;
}
static int CalculateFactorial(int n)
{
int i, w = 1;
for (i = 1; i <= n; i++)
w *= i;
return w;
}
}
The idea behind the scenes is to associate a k-subset with an operation of drawing k-elements from the n-size set. It is a combination, so the overall count of possible items will be (n k). It is a clue that we could seek the solution in Pascal Triangle. After a while of comparing manually written examples with the appropriate numbers from the Pascal Triangle, we will find the pattern and hence the algorithm.
I used user515430's answer and converted to python3. Also this supports non-continuous values so you could pass in [1,3,5,7,9] as your pool instead of range(1,11)
from itertools import combinations
from scipy.special import comb
from pandas import Index
debugcombinations = False
class IndexedCombination:
def __init__(self, _setsize, _poolvalues):
self.setsize = _setsize
self.poolvals = Index(_poolvalues)
self.poolsize = len(self.poolvals)
self.totalcombinations = 1
fast_k = min(self.setsize, self.poolsize - self.setsize)
for i in range(1, fast_k + 1):
self.totalcombinations = self.totalcombinations * (self.poolsize - fast_k + i) // i
#fill the nCr cache
self.choose_cache = {}
n = self.poolsize
k = self.setsize
for i in range(k + 1):
for j in range(n + 1):
if n - j >= k - i:
self.choose_cache[n - j,k - i] = comb(n - j,k - i, exact=True)
if debugcombinations:
print('testnth = ' + str(self.testnth()))
def get_nth_combination(self,index):
n = self.poolsize
r = self.setsize
c = self.totalcombinations
#if index < 0 or index >= c:
# raise IndexError
result = []
while r:
c, n, r = c*r//n, n-1, r-1
while index >= c:
index -= c
c, n = c*(n-r)//n, n-1
result.append(self.poolvals[-1 - n])
return tuple(result)
def get_n_from_combination(self,someset):
n = self.poolsize
k = self.setsize
index = 0
j = 0
for i in range(k):
setidx = self.poolvals.get_loc(someset[i])
for j in range(j + 1, setidx + 1):
index += self.choose_cache[n - j, k - i - 1]
j += 1
return index
#just used to test whether nth_combination from the internet actually works
def testnth(self):
n = 0
_setsize = self.setsize
mainset = self.poolvals
for someset in combinations(mainset, _setsize):
nthset = self.get_nth_combination(n)
n2 = self.get_n_from_combination(nthset)
if debugcombinations:
print(str(n) + ': ' + str(someset) + ' vs ' + str(n2) + ': ' + str(nthset))
if n != n2:
return False
for x in range(_setsize):
if someset[x] != nthset[x]:
return False
n += 1
return True
setcombination = IndexedCombination(5, list(range(1,10+1)))
print( str(setcombination.get_n_from_combination([2,5,7,8,10])))
returns 188

Resources