Determine whether a symbol is part of the ith combination nCr - algorithm

UPDATE:
Combinatorics and unranking was eventually what I needed.
The links below helped alot:
http://msdn.microsoft.com/en-us/library/aa289166(v=vs.71).aspx
http://www.codeproject.com/Articles/21335/Combinations-in-C-Part-2
The Problem
Given a list of N symbols say {0,1,2,3,4...}
And NCr combinations of these
eg. NC3 will generate:
0 1 2
0 1 3
0 1 4
...
...
1 2 3
1 2 4
etc...
For the ith combination (i = [1 .. NCr]) I want to determine Whether a symbol (s) is part of it.
Func(N, r, i, s) = True/False or 0/1
eg. Continuing from above
The 1st combination contains 0 1 2 but not 3
F(N,3,1,"0") = TRUE
F(N,3,1,"1") = TRUE
F(N,3,1,"2") = TRUE
F(N,3,1,"3") = FALSE
Current approaches and tibits that might help or be related.
Relation to matrices
For r = 2 eg. 4C2 the combinations are the upper (or lower) half of a 2D matrix
1,2 1,3 1,4
----2,3 2,4
--------3,4
For r = 3 its the corner of a 3D matrix or cube
for r = 4 Its the "corner" of a 4D matrix and so on.
Another relation
Ideally the solution would be of a form something like the answer to this:
Calculate Combination based on position
The nth combination in the list of combinations of length r (with repitition allowed), the ith symbol can be calculated
Using integer division and remainder:
n/r^i % r = (0 for 0th symbol, 1 for 1st symbol....etc)
eg for the 6th comb of 3 symbols the 0th 1st and 2nd symbols are:
i = 0 => 6 / 3^0 % 3 = 0
i = 1 => 6 / 3^1 % 3 = 2
i = 2 => 6 / 3^2 % 3 = 0
The 6th comb would then be 0 2 0
I need something similar but with repition not allowed.
Thank you for following this question this far :]
Kevin.

I believe your problem is that of unranking combinations or subsets.
I will give you an implementation in Mathematica, from the package Combinatorica, but the Google link above is probably a better place to start, unless you are familiar with the semantics.
UnrankKSubset::usage = "UnrankKSubset[m, k, l] gives the mth k-subset of set l, listed in lexicographic order."
UnrankKSubset[m_Integer, 1, s_List] := {s[[m + 1]]}
UnrankKSubset[0, k_Integer, s_List] := Take[s, k]
UnrankKSubset[m_Integer, k_Integer, s_List] :=
Block[{i = 1, n = Length[s], x1, u, $RecursionLimit = Infinity},
u = Binomial[n, k];
While[Binomial[i, k] < u - m, i++];
x1 = n - (i - 1);
Prepend[UnrankKSubset[m - u + Binomial[i, k], k-1, Drop[s, x1]], s[[x1]]]
]
Usage is like:
UnrankKSubset[5, 3, {0, 1, 2, 3, 4}]
{0, 3, 4}
Yielding the 6th (indexing from 0) length-3 combination of set {0, 1, 2, 3, 4}.

There's a very efficient algorithm for this problem, which is also contained in the recently published:Knuth, The Art of Computer Programming, Volume 4A (section 7.2.1.3).
Since you don't care about the order in which the combinations are generated, let's use the lexicographic order of the combinations where each combination is listed in descending order. Thus for r=3, the first 11 combinations of 3 symbols would be: 210, 310, 320, 321, 410, 420, 421, 430, 431, 432, 510. The advantage of this ordering is that the enumeration is independent of n; indeed it is an enumeration over all combinations of 3 symbols from {0, 1, 2, …}.
There is a standard method to directly generate the ith combination given i, so to test whether a symbol s is part of the ith combination, you can simply generate it and check.
Method
How many combinations of r symbols start with a particular symbol s? Well, the remaining r-1 positions must come from the s symbols 0, 1, 2, …, s-1, so it's (s choose r-1), where (s choose r-1) or C(s,r-1) is the binomial coefficient denoting the number of ways of choosing r-1 objects from s objects. As this is true for all s, the first symbol of the ith combination is the smallest s such that
&Sum;k=0s(k choose r-1) ≥ i.
Once you know the first symbol, the problem reduces to finding the (i - &Sum;k=0s-1(k choose r-1))-th combination of r-1 symbols, where we've subtracted those combinations that start with a symbol less than s.
Code
Python code (you can write C(n,r) more efficiently, but this is fast enough for us):
#!/usr/bin/env python
tC = {}
def C(n,r):
if tC.has_key((n,r)): return tC[(n,r)]
if r>n-r: r=n-r
if r<0: return 0
if r==0: return 1
tC[(n,r)] = C(n-1,r) + C(n-1,r-1)
return tC[(n,r)]
def combination(r, k):
'''Finds the kth combination of r letters.'''
if r==0: return []
sum = 0
s = 0
while True:
if sum + C(s,r-1) < k:
sum += C(s,r-1)
s += 1
else:
return [s] + combination(r-1, k-sum)
def Func(N, r, i, s): return s in combination(r, i)
for i in range(1, 20): print combination(3, i)
print combination(500, 10000000000000000000000000000000000000000000000000000000000000000)
Note how fast this is: it finds the 10000000000000000000000000000000000000000000000000000000000000000th combination of 500 letters (it starts with 542) in less than 0.5 seconds.

I have written a class to handle common functions for working with the binomial coefficient, which is the type of problem that your problem falls under. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
This class can easily be applied to your problem. If you have the rank (or index) to the binomial coefficient table, then simply call the class method that returns the K-indexes in an array. Then, loop through that returned array to see if any of the K-index values match the value you have. Pretty straight forward...

Related

Daily Coding Problem 316 : Coin Change Problem - determination of denomination?

I'm going through the Daily Coding Problems and am currently stuck in one of the problems. It goes by:
You are given an array of length N, where each element i represents
the number of ways we can produce i units of change. For example, [1,
0, 1, 1, 2] would indicate that there is only one way to make 0, 2, or
3 units, and two ways of making 4 units.
Given such an array, determine the denominations that must be in use.
In the case above, for example, there must be coins with values 2, 3,
and 4.
I'm unable to figure out how to determine the denomination from the total number of ways array. Can you work it out?
Somebody already worked out this problem here, but it's devoid of any explanation.
From what I could gather is that he collects all the elements whose value(number of ways == 1) and appends it to his answer, but I think it doesn't consider the fact that the same number can be formed from a combination of lower denominations for which still the number of ways would come out to be 1 irrespective of the denomination's presence.
For example, in the case of arr = [1, 1, a, b, c, 1]. We know that denomination 1 exists since arr[1] = 1. Now we can also see that arr[5] = 1, this should not necessarily mean that denomination 5 is available since 5 can be formed using coins of denomination 1, i.e. (1 + 1 + 1 + 1 + 1).
Thanks in advance!
If you're solving the coin change problem, the best technique is to maintain an array of ways of making change with a partial set of the available denominations, and add in a new denomination d by updating the array like this:
for i = d upto N
a[i] += a[i-d]
Your actual problem is the reverse of this: finding denominations based on the total number of ways. Note that if you know one d, you can remove it from the ways array by reversing the above procedure:
for i = N downto d
a[i] -= a[i-d]
You can find the lowest denomination available by looking for the first 1 in the array (other than the value at index 0, which is always 1). Then, once you've found the lowest denomination, you can remove its effect on the ways array, and repeat until the array is zeroed (except for the first value).
Here's a full solution in Python:
def rways(A):
dens = []
for i in range(1, len(A)):
if not A[i]: continue
dens.append(i)
for j in range(len(A)-1, i-1, -1):
A[j] -= A[j-i]
return dens
print(rways([1, 0, 1, 1, 2]))
You might want to add error-checking: if you find a non-zero value that's not 1 when searching for the next denomination, then the original array isn't valid.
For reference and comparison, here's some code for computing the ways of making change from a set of denominations:
def ways(dens, N):
A = [1] + [0] * N
for d in dens:
for i in range(d, N+1):
A[i] += A[i-d]
return A
print(ways([2, 3, 4], 4))

How to generate a pseudo-random involution?

For generating a pseudo-random permutation, the Knuth shuffles can be used. An involution is a self-inverse permutation and I guess, I could adapt the shuffles by forbidding touching an element multiple times. However, I'm not sure whether I could do it efficiently and whether it generates every involution equiprobably.
I'm afraid, an example is needed: On a set {0,1,2}, there are 6 permutation, out of which 4 are involutions. I'm looking for an algorithm generating one of them at random with the same probability.
A correct but very inefficient algorithm would be: Use Knuth shuffle, retry if it's no involution.
Let's here use a(n) as the number of involutions on a set of size n (as OEIS does). For a given set of size n and a given element in that set, the total number of involutions on that set is a(n). That element must either be unchanged by the involution or be swapped with another element. The number of involutions that leave our element fixed is a(n-1), since those are involutions on the other elements. Therefore a uniform distribution on the involutions must have a probability of a(n-1)/a(n) of keeping that element fixed. If it is to be fixed, just leave that element alone. Otherwise, choose another element that has not yet been examined by our algorithm to swap with our element. We have just decided what happens with one or two elements in the set: keep going and decide what happens with one or two elements at a time.
To do this, we need a list of the counts of involutions for each i <= n, but that is easily done with the recursion formula
a(i) = a(i-1) + (i-1) * a(i-2)
(Note that this formula from OEIS also comes from my algorithm: the first term counts the involutions keeping the first element where it is, and the second term is for the elements that are swapped with it.) If you are working with involutions, this is probably important enough to break out into another function, precompute some smaller values, and cache the function's results for greater speed, as in this code:
# Counts of involutions (self-inverse permutations) for each size
_invo_cnts = [1, 1, 2, 4, 10, 26, 76, 232, 764, 2620, 9496, 35696, 140152]
def invo_count(n):
"""Return the number of involutions of size n and cache the result."""
for i in range(len(_invo_cnts), n+1):
_invo_cnts.append(_invo_cnts[i-1] + (i-1) * _invo_cnts[i-2])
return _invo_cnts[n]
We also need a way to keep track of the elements that have not yet been decided, so we can efficiently choose one of those elements with uniform probability and/or mark an element as decided. We can keep them in a shrinking list, with a marker to the current end of the list. When we decide an element, we move the current element at the end of the list to replace the decided element then reduce the list. With that efficiency, the complexity of this algorithm is O(n), with one random number calculation for each element except perhaps the last. No better order complexity is possible.
Here is code in Python 3.5.2. The code is somewhat complicated by the indirection involved through the list of undecided elements.
from random import randrange
def randinvolution(n):
"""Return a random (uniform) involution of size n."""
# Set up main variables:
# -- the result so far as a list
involution = list(range(n))
# -- the list of indices of unseen (not yet decided) elements.
# unseen[0:cntunseen] are unseen/undecided elements, in any order.
unseen = list(range(n))
cntunseen = n
# Make an involution, progressing one or two elements at a time
while cntunseen > 1: # if only one element remains, it must be fixed
# Decide whether current element (index cntunseen-1) is fixed
if randrange(invo_count(cntunseen)) < invo_count(cntunseen - 1):
# Leave the current element as fixed and mark it as seen
cntunseen -= 1
else:
# In involution, swap current element with another not yet seen
idxother = randrange(cntunseen - 1)
other = unseen[idxother]
current = unseen[cntunseen - 1]
involution[current], involution[other] = (
involution[other], involution[current])
# Mark both elements as seen by removing from start of unseen[]
unseen[idxother] = unseen[cntunseen - 2]
cntunseen -= 2
return involution
I did several tests. Here is the code I used to check for validity and uniform distribution:
def isinvolution(p):
"""Flag if a permutation is an involution."""
return all(p[p[i]] == i for i in range(len(p)))
# test the validity and uniformness of randinvolution()
n = 4
cnt = 10 ** 6
distr = {}
for j in range(cnt):
inv = tuple(randinvolution(n))
assert isinvolution(inv)
distr[inv] = distr.get(inv, 0) + 1
print('In {} attempts, there were {} random involutions produced,'
' with the distribution...'.format(cnt, len(distr)))
for x in sorted(distr):
print(x, str(distr[x]).rjust(2 + len(str(cnt))))
And the results were
In 1000000 attempts, there were 10 random involutions produced, with the distribution...
(0, 1, 2, 3) 99874
(0, 1, 3, 2) 100239
(0, 2, 1, 3) 100118
(0, 3, 2, 1) 99192
(1, 0, 2, 3) 99919
(1, 0, 3, 2) 100304
(2, 1, 0, 3) 100098
(2, 3, 0, 1) 100211
(3, 1, 2, 0) 100091
(3, 2, 1, 0) 99954
That looks pretty uniform to me, as do other results I checked.
An involution is a one-to-one mapping that is its own inverse. Any cipher is a one-to-one mapping; it has to be in order for a cyphertext to be unambiguously decrypyed.
For an involution you need a cipher that is its own inverse. Such ciphers exist, ROT13 is an example. See Reciprocal Cipher for some others.
For your question I would suggest an XOR cipher. Pick a random key at least as long as the longest piece of data in your initial data set. If you are using 32 bit numbers, then use a 32 bit key. To permute, XOR the key with each piece of data in turn. The reverse permutation (equivalent to decrypting) is exactly the same XOR operation and will get back to the original data.
This will solve the mathematical problem, but it is most definitely not cryptographically secure. Repeatedly using the same key will allow an attacker to discover the key. I assume that there is no security requirement over and above the need for a random-seeming involution with an even distribution.
ETA: This is a demo, in Java, of what I am talking about in my second comment. Being Java, I use indexes 0..12 for your 13 element set.
public static void Demo() {
final int key = 0b1001;
System.out.println("key = " + key);
System.out.println();
for (int i = 0; i < 13; ++i) {
System.out.print(i + " -> ");
int ctext = i ^ key;
while (ctext >= 13) {
System.out.print(ctext + " -> ");
ctext = ctext ^ key;
}
System.out.println(ctext);
}
} // end Demo()
The output from the demo is:
key = 9
0 -> 9
1 -> 8
2 -> 11
3 -> 10
4 -> 13 -> 4
5 -> 12
6 -> 15 -> 6
7 -> 14 -> 7
8 -> 1
9 -> 0
10 -> 3
11 -> 2
12 -> 5
Where a transformed key would fall off the end of the array it is transformed again until it falls within the array. I am not sure if a while construction will fall within the strict mathematical definition of a function.

Fast algorithm to optimize a sequence of arithmetic expression

EDIT: clarified description of problem
Is there a fast algorithm solving following problem?
And, is also for extendend version of this problem
that is replaced natural numbers to Z/(2^n Z)?(This problem was too complex to add more quesion in one place, IMO.)
Problem:
For a given set of natural numbers like {7, 20, 17, 100}, required algorithm
returns the shortest sequence of additions, mutliplications and powers compute
all of given numbers.
Each item of sequence are (correct) equation that matches following pattern:
<number> = <number> <op> <number>
where <number> is a natual number, <op> is one of {+, *, ^}.
In the sequence, each operand of <op> should be one of
1
numbers which are already appeared in the left-hand-side of equal.
Example:
Input: {7, 20, 17, 100}
Output:
2 = 1 + 1
3 = 1 + 2
6 = 2 * 3
7 = 1 + 6
10 = 3 + 7
17 = 7 + 10
20 = 2 * 10
100 = 10 ^ 2
I wrote backtracking algorithm in Haskell.
it works for small input like above, but my real query is
randomly distributed ~30 numbers in [0,255].
for real query, following code takes 2~10 minutes in my PC.
(Actual code,
very simple test)
My current (Pseudo)code:
-- generate set of sets required to compute n.
-- operater (+) on set is set union.
requiredNumbers 0 = { {} }
requiredNumbers 1 = { {} }
requiredNumbers n =
{ {j, k} | j^k == n, j >= 2, k >= 2 }
+ { {j, k} | j*k == n, j >= 2, k >= 2 }
+ { {j, k} | j+k == n, j >= 1, k >= 1 }
-- remember the smallest set of "computed" number
bestSet := {i | 1 <= i <= largeNumber}
-- backtracking algorithm
-- from: input
-- to: accumulator of "already computed" number
closure from to =
if (from is empty)
if (|bestSet| > |to|)
bestSet := to
return
else if (|from| + |to| >= |bestSet|)
-- cut branch
return
else
m := min(from)
from' := deleteMin(from)
foreach (req in (requiredNumbers m))
closure (from' + (req - to)) (to + {m})
-- recoverEquation is a function converts set of number to set of equation.
-- it can be done easily.
output = recoverEquation (closure input {})
Additional Note:
Answers like
There isn't a fast algorithm, because...
There is a heuristic algorithm, it is...
are also welcomed. Now I'm feeling that there is no fast and exact algorithm...
Answer #1 can be used as a heuristic, I think.
What if you worked backwards from the highest number in a sorted input, checking if/how to utilize the smaller numbers (and numbers that are being introduced) in its construction?
For example, although this may not guarantee the shortest sequence...
input: {7, 20, 17, 100}
(100) = (20) * 5 =>
(7) = 5 + 2 =>
(17) = 10 + (7) =>
(20) = 10 * 2 =>
10 = 5 * 2 =>
5 = 3 + 2 =>
3 = 2 + 1 =>
2 = 1 + 1
What I recommend is to transform it into some kind of graph shortest path algorithm.
For each number, you compute (and store) the shortest path of operations. Technically one step is enough: For each number you can store the operation and the two operands (left and right, because power operation is not commutative), and also the weight ("nodes")
Initially you register 1 with the weight of zero
Every time you register a new number, you have to generate all calculations with that number (all additions, multiplications, powers) with all already-registered numbers. ("edges")
Filter for the calculations: it the result of the calculation is already registered, you shouldn't store that, because there is an easier way to get to that number
Store only 1 operation for the commutative ones (1+2=2+1)
Prefilter the power operation because that may even cause overflow
You have to order this list to the shortest sum path (weight of the edge). Weight = (weight of operand1) + (weight of operand2) + (1, which is the weight of the operation)
You can exclude all resulting numbers which are greater than the maximum number that we have to find (e.g. if we found 100 already, anything greater that 20 can be excluded) - this can be refined so that you can check the members of the operations also.
If you hit one of your target numbers, then you found the shortest way of calculating one of your target numbers, you have to restart the generations:
Recalculate the maximum of the target numbers
Go back on the paths of the currently found number, set their weight to 0 (they will be given from now on, because their cost is already paid)
Recalculate the weight for the operations in the generation list, because the source operand weight may have been changed (this results reordering at the end) - here you can exclude those where either operand is greater than the new maximum
If all the numbers are hit, then the search is over
You can build your expression using the "backlinks" (operation, left and right operands) for each of your target numbers.
The main point is that we always keep our eye on the target function, which is that the total number of operation must be the minimum possible. In order to get this, we always calculate the shortest path to a certain number, then considering that number (and all the other numbers on the way) as given numbers, then extending our search to the remaining targets.
Theoretically, this algorithm processes (registers) each numbers only once. Applying the proper filters cuts the unnecessary branches, so nothing is calculated twice (except the weights of the in-queue elements)

Algorithm to count the number of valid blocks in a permutation [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Finding sorted sub-sequences in a permutation
Given an array A which holds a permutation of 1,2,...,n. A sub-block A[i..j]
of an array A is called a valid block if all the numbers appearing in A[i..j]
are consecutive numbers (may not be in order).
Given an array A= [ 7 3 4 1 2 6 5 8] the valid blocks are [3 4], [1,2], [6,5],
[3 4 1 2], [3 4 1 2 6 5], [7 3 4 1 2 6 5], [7 3 4 1 2 6 5 8]
So the count for above permutation is 7.
Give an O( n log n) algorithm to count the number of valid blocks.
Ok, I am down to 1 rep because I put 200 bounty on a related question: Finding sorted sub-sequences in a permutation
so I cannot leave comments for a while.
I have an idea:
1) Locate all permutation groups. They are: (78), (34), (12), (65). Unlike in group theory, their order and position, and whether they are adjacent matters. So, a group (78) can be represented as a structure (7, 8, false), while (34) would be (3,4,true). I am using Python's notation for tuples, but it is actually might be better to use a whole class for the group. Here true or false means contiguous or not. Two groups are "adjacent" if (max(gp1) == min(gp2) + 1 or max(gp2) == min(gp1) + 1) and contigous(gp1) and contiguos(gp2). This is not the only condition, for union(gp1, gp2) to be contiguous, because (14) and (23) combine into (14) nicely. This is a great question for algo class homework, but a terrible one for interview. I suspect this is homework.
Just some thoughts:
At first sight, this sounds impossible: a fully sorted array would have O(n2) valid sub-blocks.
So, you would need to count more than one valid sub-block at a time. Checking the validity of a sub-block is O(n). Checking whether a sub-block is fully sorted is O(n) as well. A fully sorted sub-block contains n·(n - 1)/2 valid sub-blocks, which you can count without further breaking this sub-block up.
Now, the entire array is obviously always valid. For a divide-and-conquer approach, you would need to break this up. There are two conceivable breaking points: the location of the highest element, and that of the lowest element. If you break the array into two at one of these points, including the extremum in the part that contains the second-to-extreme element, there cannot be a valid sub-block crossing this break-point.
By always choosing the extremum that produces a more even split, this should work quite well (average O(n log n)) for "random" arrays. However, I can see problems when your input is something like (1 5 2 6 3 7 4 8), which seems to produce O(n2) behaviour. (1 4 7 2 5 8 3 6 9) would be similar (I hope you see the pattern). I currently see no trick to catch this kind of worse case, but it seems that it requires other splitting techniques.
This question does involve a bit of a "math trick" but it's fairly straight forward once you get it. However, the rest of my solution won't fit the O(n log n) criteria.
The math portion:
For any two consecutive numbers their sum is 2k+1 where k is the smallest element. For three it is 3k+3, 4 : 4k+6 and for N such numbers it is Nk + sum(1,N-1). Hence, you need two steps which can be done simultaneously:
Create the sum of all the sub-arrays.
Determine the smallest element of a sub-array.
The dynamic programming portion
Build two tables using the results of the previous row's entries to build each successive row's entries. Unfortunately, I'm totally wrong as this would still necessitate n^2 sub-array checks. Ugh!
My proposition
STEP = 2 // amount of examed number
B [0,0,0,0,0,0,0,0]
B [1,1,0,0,0,0,0,0]
VALID(A,B) - if not valid move one
B [0,1,1,0,0,0,0,0]
VALID(A,B) - if valid move one and step
B [0,0,0,1,1,0,0,0]
VALID (A,B)
B [0,0,0,0,0,1,1,0]
STEP = 3
B [1,1,1,0,0,0,0,0] not ok
B [0,1,1,1,0,0,0,0] ok
B [0,0,0,0,1,1,1,0] not ok
STEP = 4
B [1,1,1,1,0,0,0,0] not ok
B [0,1,1,1,1,0,0,0] ok
.....
CON <- 0
STEP <- 2
i <- 0
j <- 0
WHILE(STEP <= LEN(A)) DO
j <- STEP
WHILE(STEP <= LEN(A) - j) DO
IF(VALID(A,i,j)) DO
CON <- CON + 1
i <- j + 1
j <- j + STEP
ELSE
i <- i + 1
j <- j + 1
END
END
STEP <- STEP + 1
END
The valid method check that all elements are consecutive
Never tested but, might be ok
The original array doesn't contain duplicates so must itself be a consecutive block. Lets call this block (1 ~ n). We can test to see whether block (2 ~ n) is consecutive by checking if the first element is 1 or n which is O(1). Likewise we can test block (1 ~ n-1) by checking whether the last element is 1 or n.
I can't quite mould this into a solution that works but maybe it will help someone along...
Like everybody else, I'm just throwing this out ... it works for the single example below, but YMMV!
The idea is to count the number of illegal sub-blocks, and subtract this from the total possible number. We count the illegal ones by examining each array element in turn and ruling out sub-blocks that include the element but not its predecessor or successor.
Foreach i in [1,N], compute B[A[i]] = i.
Let Count = the total number of sub-blocks with length>1, which is N-choose-2 (one for each possible combination of starting and ending index).
Foreach i, consider A[i]. Ignoring edge cases, let x=A[i]-1, and let y=A[i]+1. A[i] cannot participate in any sub-block that does not include x or y. Let iX=B[x] and iY=B[y]. There are several cases to be treated independently here. The general case is that iX<i<iY<i. In this case, we can eliminate the sub-block A[iX+1 .. iY-1] and all intervening blocks containing i. There are (i - iX + 1) * (iY - i + 1) such sub-blocks, so call this number Eliminated. (Other cases left as an exercise for the reader, as are those edge cases.) Set Count = Count - Eliminated.
Return Count.
The total cost appears to be N * (cost of step 2) = O(N).
WRINKLE: In step 2, we must be careful not to eliminate each sub-interval more than once. We can accomplish this by only eliminating sub-intervals that lie fully or partly to the right of position i.
Example:
A = [1, 3, 2, 4]
B = [1, 3, 2, 4]
Initial count = (4*3)/2 = 6
i=1: A[i]=1, so need sub-blocks with 2 in them. We can eliminate [1,3] from consideration. Eliminated = 1, Count -> 5.
i=2: A[i]=3, so need sub-blocks with 2 or 4 in them. This rules out [1,3] but we already accounted for it when looking right from i=1. Eliminated = 0.
i=3: A[i] = 2, so need sub-blocks with [1] or [3] in them. We can eliminate [2,4] from consideration. Eliminated = 1, Count -> 4.
i=4: A[i] = 4, so we need sub-blocks with [3] in them. This rules out [2,4] but we already accounted for it when looking right from i=3. Eliminated = 0.
Final Count = 4, corresponding to the sub-blocks [1,3,2,4], [1,3,2], [3,2,4] and [3,2].
(This is an attempt to do this N.log(N) worst case. Unfortunately it's wrong -- it sometimes undercounts. It incorrectly assumes you can find all the blocks by looking at only adjacent pairs of smaller valid blocks. In fact you have to look at triplets, quadruples, etc, to get all the larger blocks.)
You do it with a struct that represents a subblock and a queue for subblocks.
struct
c_subblock
{
int index ; /* index into original array, head of subblock */
int width ; /* width of subblock > 0 */
int lo_value;
c_subblock * p_above ; /* null or subblock above with same index */
};
Alloc an array of subblocks the same size as the original array, and init each subblock to have exactly one item in it. Add them to the queue as you go. If you start with array [ 7 3 4 1 2 6 5 8 ] you will end up with a queue like this:
queue: ( [7,7] [3,3] [4,4] [1,1] [2,2] [6,6] [5,5] [8,8] )
The { index, width, lo_value, p_above } values for subbblock [7,7] will be { 0, 1, 7, null }.
Now it's easy. Forgive the c-ish pseudo-code.
loop {
c_subblock * const p_left = Pop subblock from queue.
int const right_index = p_left.index + p_left.width;
if ( right_index < length original array ) {
// Find adjacent subblock on the right.
// To do this you'll need the original array of length-1 subblocks.
c_subblock const * p_right = array_basic_subblocks[ right_index ];
do {
Check the left/right subblocks to see if the two merged are also a subblock.
If they are add a new merged subblock to the end of the queue.
p_right = p_right.p_above;
}
while ( p_right );
}
}
This will find them all I think. It's usually O(N log(N)), but it'll be O(N^2) for a fully sorted or anti-sorted list. I think there's an answer to this though -- when you build the original array of subblocks you look for sorted and anti-sorted sequences and add them as the base-level subblocks. If you are keeping a count increment it by (width * (width + 1))/2 for the base-level. That'll give you the count INCLUDING all the 1-length subblocks.
After that just use the loop above, popping and pushing the queue. If you're counting you'll have to have a multiplier on both the left and right subblocks and multiply these together to calculate the increment. The multiplier is the width of the leftmost (for p_left) or rightmost (for p_right) base-level subblock.
Hope this is clear and not too buggy. I'm just banging it out, so it may even be wrong.
[Later note. This doesn't work after all. See note below.]

Compute rank of a combination?

I want to pre-compute some values for each combination in a set of combinations. For example, when choosing 3 numbers from 0 to 12, I'll compute some value for each one:
>>> for n in choose(range(13), 3):
print n, foo(n)
(0, 1, 2) 78
(0, 1, 3) 4
(0, 1, 4) 64
(0, 1, 5) 33
(0, 1, 6) 20
(0, 1, 7) 64
(0, 1, 8) 13
(0, 1, 9) 24
(0, 1, 10) 85
(0, 1, 11) 13
etc...
I want to store these values in an array so that given the combination, I can compute its and get the value. For example:
>>> a = [78, 4, 64, 33]
>>> a[magic((0,1,2))]
78
What would magic be?
Initially I thought to just store it as a 3-d matrix of size 13 x 13 x 13, so I can easily index it that way. While this is fine for 13 choose 3, this would have way too much overhead for something like 13 choose 7.
I don't want to use a dict because eventually this code will be in C, and an array would be much more efficient anyway.
UPDATE: I also have a similar problem, but using combinations with repetitions, so any answers on how to get the rank of those would be much appreciated =).
UPDATE: To make it clear, I'm trying to conserve space. Each of these combinations actually indexes into something take up a lot of space, let's say 2 kilobytes. If I were to use a 13x13x13 array, that would be 4 megabytes, of which I only need 572 kilobytes using (13 choose 3) spots.
Here is a conceptual answer and a code based on how lex ordering works. (So I guess my answer is like that of "moron", except that I think that he has too few details and his links have too many.) I wrote a function unchoose(n,S) for you that works assuming that S is an ordered list subset of range(n). The idea: Either S contains 0 or it does not. If it does, remove 0 and compute the index for the remaining subset. If it does not, then it comes after the binomial(n-1,k-1) subsets that do contain 0.
def binomial(n,k):
if n < 0 or k < 0 or k > n: return 0
b = 1
for i in xrange(k): b = b*(n-i)/(i+1)
return b
def unchoose(n,S):
k = len(S)
if k == 0 or k == n: return 0
j = S[0]
if k == 1: return j
S = [x-1 for x in S]
if not j: return unchoose(n-1,S[1:])
return binomial(n-1,k-1)+unchoose(n-1,S)
def choose(X,k):
n = len(X)
if k < 0 or k > n: return []
if not k: return [[]]
if k == n: return [X]
return [X[:1] + S for S in choose(X[1:],k-1)] + choose(X[1:],k)
(n,k) = (13,3)
for S in choose(range(n),k): print unchoose(n,S),S
Now, it is also true that you can cache or hash values of both functions, binomial and unchoose. And what's nice about this is that you can compromise between precomputing everything and precomputing nothing. For instance you can precompute only for len(S) <= 3.
You can also optimize unchoose so that it adds the binomial coefficients with a loop if S[0] > 0, instead of decrementing and using tail recursion.
You can try using the lexicographic index of the combination. Maybe this page will help: http://saliu.com/bbs/messages/348.html
This MSDN page has more details: Generating the mth Lexicographical Element of a Mathematical Combination.
NOTE: The MSDN page has been retired. If you download the documentation at the above link, you will find the article on page 10201 of the pdf that is downloaded.
To be a bit more specific:
When treated as a tuple, you can order the combinations lexicographically.
So (0,1,2) < (0,1,3) < (0,1,4) etc.
Say you had the number 0 to n-1 and chose k out of those.
Now if the first element is zero, you know that it is one among the first n-1 choose k-1.
If the first element is 1, then it is one among the next n-2 choose k-1.
This way you can recursively compute the exact position of the given combination in the lexicographic ordering and use that to map it to your number.
This works in reverse too and the MSDN page explains how to do that.
Use a hash table to store the results. A decent hash function could be something like:
h(x) = (x1*p^(k - 1) + x2*p^(k - 2) + ... + xk*p^0) % pp
Where x1 ... xk are the numbers in your combination (for example (0, 1, 2) has x1 = 0, x2 = 1, x3 = 2) and p and pp are primes.
So you would store Hash[h(0, 1, 2)] = 78 and then you would retrieve it the same way.
Note: the hash table is just an array of size pp, not a dict.
I would suggest a specialised hash table. The hash for a combination should be the exclusive-or of the hashes for the values. Hashes for values are basically random bit-patterns.
You could code the table to cope with collisions, but it should be fairly easy to derive a minimal perfect hash scheme - one where no two three-item combinations give the same hash value, and where the hash-size and table-size are kept to a minimum.
This is basically Zobrist hashing - think of a "move" as adding or removing one item of the combination.
EDIT
The reason to use a hash table is that the lookup performance O(n) where n is the number of items in the combination (assuming no collisions). Calculating lexicographical indexes into the combinations is significantly slower, IIRC.
The downside is obviously the up-front work done to generate the table.
For now, I've reached a compromise: I have a 13x13x13 array which just maps to the index of the combination, taking up 13x13x13x2 bytes = 4 kilobytes (using short ints), plus the normal-sized (13 choose 3) * 2 kilobytes = 572 kilobytes, for a total of 576 kilobytes. Much better than 4 megabytes, and also faster than a rank calculation!
I did this partly cause I couldn't seem to get Moron's answer to work. Also this is more extensible - I have a case where I need combinations with repetitions, and I haven't found a way to compute the rank of those, yet.
What you want are called combinadics. Here's my implementation of this concept, in Python:
def nthresh(k, idx):
"""Finds the largest value m such that C(m, k) <= idx."""
mk = k
while ncombs(mk, k) <= idx:
mk += 1
return mk - 1
def idx_to_set(k, idx):
ret = []
for i in range(k, 0, -1):
element = nthresh(i, idx)
ret.append(element)
idx -= ncombs(element, i)
return ret
def set_to_idx(input):
ret = 0
for k, ck in enumerate(sorted(input)):
ret += ncombs(ck, k + 1)
return ret
I have written a class to handle common functions for working with the binomial coefficient, which is the type of problem that your problem falls under. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration and it does not use very much memory. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
It should not be hard to convert this class to C++.

Resources