finding a triplet having a given sum - algorithm

I have been struggling with this questions for sometime now. The question goes like this:-
We have n^2 numbers. We need to find out if there exists a triplet a,b,c such that a+b+c = 0. For a more generic case, a+b+c = k. (k is given)
There exists a solution with O(n^2log(n)) complexity.
Any help would be greatly appreciated.
thanks

To get this in O(n²logn), you'd have to sort the numbers. Find all combinations of 2 numbers, and do a binary search to find the third.
The upper bound is much higher for the general version of the problem.

I wrote a rough solution.
It can definitely be done in O(n^2).
You don't have to sort this.
It's an extension of the problem which requires summing two numbers to x and the trick is to use the hash table.
def triplets(l, total):
"""Sum of 3 numbers to get to total
Basically an extension of the 2 table
"""
l = set( l)
d = { }
for i in l:
remain = total - i
inside = {}
for j in l:
if i == j:
continue
inside[j] = remain -j
d[i] = inside
good = set()
for first, dic in d.iteritems():
for second, third in dic.iteritems():
if third in l:
good.add( tuple(sorted([first, second, third])) )
for each in good:
print each
triplets( [2, 3, 4, 5, 6], 3+4+5)
NOTE: we can use a fast sorting method for triplets which will be O(1).

Related

Interview Question: Remove repeating numbers at the end of an array

I got a surprising interview question today at a big Bay Area tech company that I was absolutely stumped by despite seeming so easy. Was wondering if anyone has seen it or can offer a simpler solution as the interviewer didn't want to show me the answer. The solution can be written in any language or pseudocode.
Question:
Given a list of numbers, remove any extraneous repeating suffix sequences of numbers that appear at the end of the list until it has no repeating suffix sequences. The repeating sequence can be cut-off.
For example:
[1,2,3,4,5,6,7,5,6,7,5,6] -> [1,2,3,4,5,6,7]
explanation: [5, 6, 7] were repeating
Also consider the situation
[1,2,3,4,5,4,5,1,4,5,4,5,1,4,5,4,5,] -> [1,2,3,4,5,4,5,1] # not [1,2,3,4,5,4,5,1,4,5,4,5,1]
explanation: [4,5,4,5,1] is a repeating sequence
There are always two ways to approach this topic. Finding any solution and finding an efficient one. It is usually better to start with any and then think on how to optimize it.
Now as we can see in the second example, the problem is complicated by the fact that the repeating pattern is not known. So we could just do it for all the possible patterns at the end. Then we would need to check two things
is it actually repeating
how long is the result
Then we could just take the shortest result. Here is the Python code:
def remove_repeating_tail(a: list) -> list:
results = []
for i in range(len(a)):
tail = a[i:]
results.append(remove_repeats(a, tail))
if len(results) == 0:
return a
return sorted(results, key=len)[0]
Also we made sure we cover all the cases. Empty list, no repeating pattern. Next we need to write remove_repeats. Also we check the empty repeating pattern, so we need to be aware of that.
def remove_repeats(a: list, tail: list) -> list:
assert len(tail) <= len(a)
if len(tail) == 0:
return a
remainder = a
count = 0
while remainder[-len(tail):] == tail:
remainder = remainder[:-len(tail)]
count += 1
if count <= 1:
return a
return remainder
We remove the repeating pattern and then add it back at the end. Now it's time to test the code if it actually works, if that is possible in the interview.
remove_repeating_tail([1,2,3,4,5,6,7,5,6,7,5,6])
-> [1, 2, 3, 4, 5, 6]
remove_repeating_tail([1,2,3,4,5,4,5,1,4,5,4,5,1,4,5,4,5])
-> [1, 2, 3, 4, 5, 4, 5]
Also good to check some other cases:
remove_repeating_tail([1,2,3,4])
-> [1, 2, 3, 4]
remove_repeating_tail([])
-> []
After quite a bit of fixing we got the above, which I think is correct. In particular I missed:
first I had an infinite loop in remove_repeats for an empty tail
remove_repeats removed always the tail and sometimes everything, as I wasn't checking that there is at least one repeat. I then added the counting.
I made simple mistakes like writing results = res instead of results.append(res) leading to some Exceptions.
Then a lot of simplification. First I used some sentinel None to communicate back that it is not repeating, but we could just return the whole list. Then I checked the repeating with some if before the while loop, but realized its basically doing the same as the first iteration, so I used counting.
Similarly I don't like the if len(results) == 0: check. I would probably add a to the result in the beginning and remove the check, as now there is always a result. Then we could start the counting from 1 instead of 0. Still I kept it in.
If we want something fast, we first need to analyze the complexity.
So remove repeating tails for a list of size n and tail size k is: O(n / k). Then we call this function n times. And then we sort it. Wait why do we sort it, we could just take the minimum return min(results, key=len). That's better.
In each loop we call remove_repeats starting with k = 1 to n. So we have:
sum(k = 1 .. n) O(n / k). This is n / 1 + n / 2 + n / 3 + .. n / n. I had to look this up on Wikipedia, but these are called harmonic numbers. We can also just make our live easy and say its less than O(n^2) for now. Otherwise I found an approximation of H_n = n ln(n) + 0.5 n here. So the complexity overall is O(n log n). Not to bad I would say. Is it the optimal? Maybe. Here I would compare it to some other similar algorithms (like substring search, etc).
Before going there, at this point, I would check with the interviewer, where he would like to go next. As there are many directions.
This seems a tricky question and there may not be a simple solution. Best solution I can think of would be O(n) time and O(n) and that is if I am not missing any edge case.
Let's take as example
[1,2,3,4,5,4,5,1,4,5,4,5,1,4,5,4,5] -> [1,2,3,4,5,4,5,1]
Steps would be as follows:
Iterate over the input array from last index to first and build a dictionary (hashtable) with every number in the array being a key and value: a list of positions where the specific number is found in the array.
Occurrences dictionary will become:
{
5: [14, 11, 9, 6, 4],
4: [13, 10, 8, 5, 3],
1: [12, 7, 0],
3: [2]
2: [1]
}
Find the possible suffix lengths by calculating deltas between every position and first position for every number. This way we take into consideration the case in which a specific number repeats in the suffix or in the prefix.
We then add each distinct possible suffix length to a set.
We sort the possible suffix lengths in descending order.
We get following suffix lengths:
[12, 10, 7, 5, 2]
For every possible length l, we test if arr[n-1] == arr[n-1-l]. If l is our suffix's length, it means that the number at last position is repeated at exactly l positions before. We then check the last l elements to respect the same condition. If they do, we found the maximum suffix length. If not, the max suffix length is even smaller, so we check the next possible length.
After finding the correct suffix length, we delete the remaining numbers that repeat at positions pos-l. We then return the slice of array with suffix removed.
def removeRepeatingSuffixes(arr):
if not arr:
return []
n = len(arr)
occurrences = {}
for i in range(n - 1, -1, -1):
c = arr[i]
if c not in occurrences:
occurrences[c] = []
occurrences[c].append(i)
# treat edge case: no repeating suffix
if len(occurrences[arr[n-1]]) == 1:
return arr
# create a set of possible suffix lengths,
# based on the differences between the positions of each number.
possible_suffixes_lengths_set = set()
for c, olist in occurrences.items():
if len(olist) >= 2:
for i in range(len(olist)-1):
delta = olist[i] - olist[len(olist)-1]
possible_suffixes_lengths_set.add(delta)
suff_lengths = sorted(possible_suffixes_lengths_set, reverse=True)
for l in suff_lengths:
if arr[n - 1] == arr[n - 1 - l]:
# possible suffix length, check if last l characters repeat
ok_length = True
for j in range(n-2, n-1-l, -1):
if arr[j] != arr[j-l]:
ok_length = False
break
if ok_length:
last_i = n-1-l
while last_i > 0 and arr[last_i] == arr[last_i - l]:
last_i -= 1
# return non-repeating slice, from 0 to last_i
return arr[0:last_i + 1]
quick way to remove repeating or dedupe is change to a type set() instead of a list

Daily Coding Problem 316 : Coin Change Problem - determination of denomination?

I'm going through the Daily Coding Problems and am currently stuck in one of the problems. It goes by:
You are given an array of length N, where each element i represents
the number of ways we can produce i units of change. For example, [1,
0, 1, 1, 2] would indicate that there is only one way to make 0, 2, or
3 units, and two ways of making 4 units.
Given such an array, determine the denominations that must be in use.
In the case above, for example, there must be coins with values 2, 3,
and 4.
I'm unable to figure out how to determine the denomination from the total number of ways array. Can you work it out?
Somebody already worked out this problem here, but it's devoid of any explanation.
From what I could gather is that he collects all the elements whose value(number of ways == 1) and appends it to his answer, but I think it doesn't consider the fact that the same number can be formed from a combination of lower denominations for which still the number of ways would come out to be 1 irrespective of the denomination's presence.
For example, in the case of arr = [1, 1, a, b, c, 1]. We know that denomination 1 exists since arr[1] = 1. Now we can also see that arr[5] = 1, this should not necessarily mean that denomination 5 is available since 5 can be formed using coins of denomination 1, i.e. (1 + 1 + 1 + 1 + 1).
Thanks in advance!
If you're solving the coin change problem, the best technique is to maintain an array of ways of making change with a partial set of the available denominations, and add in a new denomination d by updating the array like this:
for i = d upto N
a[i] += a[i-d]
Your actual problem is the reverse of this: finding denominations based on the total number of ways. Note that if you know one d, you can remove it from the ways array by reversing the above procedure:
for i = N downto d
a[i] -= a[i-d]
You can find the lowest denomination available by looking for the first 1 in the array (other than the value at index 0, which is always 1). Then, once you've found the lowest denomination, you can remove its effect on the ways array, and repeat until the array is zeroed (except for the first value).
Here's a full solution in Python:
def rways(A):
dens = []
for i in range(1, len(A)):
if not A[i]: continue
dens.append(i)
for j in range(len(A)-1, i-1, -1):
A[j] -= A[j-i]
return dens
print(rways([1, 0, 1, 1, 2]))
You might want to add error-checking: if you find a non-zero value that's not 1 when searching for the next denomination, then the original array isn't valid.
For reference and comparison, here's some code for computing the ways of making change from a set of denominations:
def ways(dens, N):
A = [1] + [0] * N
for d in dens:
for i in range(d, N+1):
A[i] += A[i-d]
return A
print(ways([2, 3, 4], 4))

Find minimum distance between points

I have a set of points (x,y).
i need to return two points with minimal distance.
I use this:
http://www.cs.ucsb.edu/~suri/cs235/ClosestPair.pdf
but , i dont really understand how the algo is working.
Can explain in more simple how the algo working?
or suggest another idea?
Thank!
If the number of points is small, you can use the brute force approach i.e:
for each point find the closest point among other points and save the minimum distance with the current two indices till now.
If the number of points is large, I think you may find the answer in this thread:
Shortest distance between points algorithm
Solution for Closest Pair Problem with minimum time complexity O(nlogn) is divide-and-conquer methodology as it mentioned in the document that you have read.
Divide-and-conquer Approach for Closest-Pair Problem
Easiest way to understand this algorithm is reading an implementation of it in a high-level language (because sometimes understanding the algorithms or pseudo-codes can be harder than understanding the real codes) like Python:
# closest pairs by divide and conquer
# David Eppstein, UC Irvine, 7 Mar 2002
from __future__ import generators
def closestpair(L):
def square(x): return x*x
def sqdist(p,q): return square(p[0]-q[0])+square(p[1]-q[1])
# Work around ridiculous Python inability to change variables in outer scopes
# by storing a list "best", where best[0] = smallest sqdist found so far and
# best[1] = pair of points giving that value of sqdist. Then best itself is never
# changed, but its elements best[0] and best[1] can be.
#
# We use the pair L[0],L[1] as our initial guess at a small distance.
best = [sqdist(L[0],L[1]), (L[0],L[1])]
# check whether pair (p,q) forms a closer pair than one seen already
def testpair(p,q):
d = sqdist(p,q)
if d < best[0]:
best[0] = d
best[1] = p,q
# merge two sorted lists by y-coordinate
def merge(A,B):
i = 0
j = 0
while i < len(A) or j < len(B):
if j >= len(B) or (i < len(A) and A[i][1] <= B[j][1]):
yield A[i]
i += 1
else:
yield B[j]
j += 1
# Find closest pair recursively; returns all points sorted by y coordinate
def recur(L):
if len(L) < 2:
return L
split = len(L)/2
L = list(merge(recur(L[:split]), recur(L[split:])))
# Find possible closest pair across split line
# Note: this is not quite the same as the algorithm described in class, because
# we use the global minimum distance found so far (best[0]), instead of
# the best distance found within the recursive calls made by this call to recur().
for i in range(len(E)):
for j in range(1,8):
if i+j < len(E):
testpair(E[i],E[i+j])
return L
L.sort()
recur(L)
return best[1]
closestpair([(0,0),(7,6),(2,20),(12,5),(16,16),(5,8),\
(19,7),(14,22),(8,19),(7,29),(10,11),(1,13)])
# returns: (7,6),(5,8)
Taken from: https://www.ics.uci.edu/~eppstein/161/python/closestpair.py
Detailed explanation:
First we define an Euclidean distance aka Square distance function to prevent code repetition.
def square(x): return x*x # Define square function
def sqdist(p,q): return square(p[0]-q[0])+square(p[1]-q[1]) # Define Euclidean distance function
Then we are taking the first two points as our initial best guess:
best = [sqdist(L[0],L[1]), (L[0],L[1])]
This is a function definition for comparing Euclidean distances of next pair with our current best pair:
def testpair(p,q):
d = sqdist(p,q)
if d < best[0]:
best[0] = d
best[1] = p,q
def merge(A,B): is just a rewind function for the algorithm to merge two sorted lists that previously divided to half.
def recur(L): function definition is the actual body of the algorithm. So I will explain this function definition in more detail:
if len(L) < 2:
return L
with this part, algorithm terminates the recursion if there is only one element/point left in the list of points.
Split the list to half: split = len(L)/2
Create a recursion (by calling function's itself) for each half: L = list(merge(recur(L[:split]), recur(L[split:])))
Then lastly this nested loops will test whole pairs in the current half-list with each other:
for i in range(len(E)):
for j in range(1,8):
if i+j < len(E):
testpair(E[i],E[i+j])
As the result of this, if a better pair is found best pair will be updated.
So they solve for the problem in Many dimensions using a divide-and-conquer approach. Binary search or divide-and-conquer is mega fast. Basically, if you can split a dataset into two halves, and keep doing that until you find some info you want, you are doing it as fast as humanly and computerly possible most of the time.
For this question, it means that we divide the data set of points into two sets, S1 and S2.
All the points are numerical, right? So we have to pick some number where to divide the dataset.
So we pick some number m and say it is the median.
So let's take a look at an example:
(14, 2)
(11, 2)
(5, 2)
(15, 2)
(0, 2)
What's the closest pair?
Well, they all have the same Y coordinate, so we can look at Xs only... X shortest distance is 14 to 15, a distance of 1.
How can we figure that out using divide-and-conquer?
We look at the greatest value of X and the smallest value of X and we choose the median as a dividing line to make our two sets.
Our median is 7.5 in this example.
We then make 2 sets
S1: (0, 2) and (5, 2)
S2: (11, 2) and (14, 2) and (15, 2)
Median: 7.5
We must keep track of the median for every split, because that is actually a vital piece of knowledge in this algorithm. They don't show it very clearly on the slides, but knowing the median value (where you split a set to make two sets) is essential to solving this question quickly.
We keep track of a value they call delta in the algorithm. Ugh I don't know why most computer scientists absolutely suck at naming variables, you need to have descriptive names when you code so you don't forget what the f000 you coded 10 years ago, so instead of delta let's call this value our-shortest-twig-from-the-median-so-far
Since we have the median value of 7.5 let's go and see what our-shortest-twig-from-the-median-so-far is for Set1 and Set2, respectively:
Set1 : shortest-twig-from-the-median-so-far 2.5 (5 to m where m is 7.5)
Set 2: shortest-twig-from-the-median-so-far 3.5 (looking at 11 to m)
So I think the key take-away from the algorithm is that this shortest-twig-from-the-median-so-far is something that you're trying to improve upon every time you divide a set.
Since S1 in our case has 2 elements only, we are done with the left set, and we have 3 in the right set, so we continue dividing:
S2 = { (11,2) (14,2) (15,2) }
What do you do? You make a new median, call it S2-median
S2-median is halfway between 15 and 11... or 13, right? My math may be fuzzy, but I think that's right so far.
So let's look at the shortest-twig-so-far-for-our-right-side-with-median-thirteen ...
15 to 13 is... 2
11 to 13 is .... 2
14 to 13 is ... 1 (!!!)
So our m value or shortest-twig-from-the-median-so-far is improved (where we updated our median from before because we're in a new chunk or Set...)
Now that we've found it we know that (14, 2) is one of the points that satisfies the shortest pair equation. You can then check exhaustively against the points in this subset (15, 11, 14) to see which one is the closer one.
Clearly, (15,2) and (14,2) are the winning pair in this case.
Does that make sense? You must keep track of the median when you cut the set, and keep a new median for everytime you cut the set until you have only 2 elements remaining on each side (or in our case 3)
The magic is in the median or shortest-twig-from-the-median-so-far
Thanks for asking this question, I went in not knowing how this algorithm worked but found the right highlighted bullet point on the slide and rolled with it. Do you get it now? I don't know how to explain the median magic other than binary search is f000ing awesome.

algorithm for series to calculate the maximum descend inside?

Given a series x(i), i from 1 to N, let's say N = 10,000.
for any i < j,
D(i,j) = x(i) - x(j), if x(i) > x (j); or,
= 0, if x(i) <= x(j).
Define
Dmax(im, jm) := max D(i,j), for all 1 <= i < j <=N.
What's the best algorithm to calculate Dmax, im, and jm?
I tried to use Dynamic programming, but this seems is not dividable... Then i'm a bit lost... Could you guys please suggest? is backtracking the way out?
Iterate over the series, keeping track of the following values:
The maximum element so far
The maximum descent so far
For each element, there are two possible values for the new maximum descent:
It remains the same
It equals maximumElementSoFar - newElement
So pick the one which gives the higher value. The maximum descent at the end of iteration will be your result. This will work in linear time and constant additional space.
If I understand you correctly you have an array of numbers, and want to find the largest positive difference between two neighbouring elements of the array ?
Since you're going to have to go through the array at least once, to compute the differences, I don't see why you can't just keep a record, as you go, of the largest difference found so far (and of its location), updating as that changes.
If your problem is as simple as I understand it, I'm not sure why you need to think about dynamic programming. I expect I've misunderstood the question.
Dmax(im, jm) := max D(i,j) = max(x(i) -x(j),0) = max(max(x(i) -x(j)),0)
You just need to compute x(i) -x(j) for all values , which is O(n^2), and then get the max. No need for dynamic programming.
You can divide the series x(i) into sub series where each sub series contains and descending sub list of x(i) (e.g if x = 5, 4, 1, 2, 1 then x1 = 5, 4, 1 and x2 = 2, 1) and then in each sub list you can do: first_in_sub_series - last_sub_series and then compare all the results you get and find the maximum and this is the answer.
If i understood the problem correctly this should provide you with a basic linear algorithm to solve it.
e.g:
x = 5, 4, 1, 2, 1 then x1 = 5, 4, 1 and x2 = 2, 1
rx1 = 4
rx2 = 1
dmax = 4 and im = 1 and jm = 3 because we are talking about x1 which is the first 3 items of x.

Compute rank of a combination?

I want to pre-compute some values for each combination in a set of combinations. For example, when choosing 3 numbers from 0 to 12, I'll compute some value for each one:
>>> for n in choose(range(13), 3):
print n, foo(n)
(0, 1, 2) 78
(0, 1, 3) 4
(0, 1, 4) 64
(0, 1, 5) 33
(0, 1, 6) 20
(0, 1, 7) 64
(0, 1, 8) 13
(0, 1, 9) 24
(0, 1, 10) 85
(0, 1, 11) 13
etc...
I want to store these values in an array so that given the combination, I can compute its and get the value. For example:
>>> a = [78, 4, 64, 33]
>>> a[magic((0,1,2))]
78
What would magic be?
Initially I thought to just store it as a 3-d matrix of size 13 x 13 x 13, so I can easily index it that way. While this is fine for 13 choose 3, this would have way too much overhead for something like 13 choose 7.
I don't want to use a dict because eventually this code will be in C, and an array would be much more efficient anyway.
UPDATE: I also have a similar problem, but using combinations with repetitions, so any answers on how to get the rank of those would be much appreciated =).
UPDATE: To make it clear, I'm trying to conserve space. Each of these combinations actually indexes into something take up a lot of space, let's say 2 kilobytes. If I were to use a 13x13x13 array, that would be 4 megabytes, of which I only need 572 kilobytes using (13 choose 3) spots.
Here is a conceptual answer and a code based on how lex ordering works. (So I guess my answer is like that of "moron", except that I think that he has too few details and his links have too many.) I wrote a function unchoose(n,S) for you that works assuming that S is an ordered list subset of range(n). The idea: Either S contains 0 or it does not. If it does, remove 0 and compute the index for the remaining subset. If it does not, then it comes after the binomial(n-1,k-1) subsets that do contain 0.
def binomial(n,k):
if n < 0 or k < 0 or k > n: return 0
b = 1
for i in xrange(k): b = b*(n-i)/(i+1)
return b
def unchoose(n,S):
k = len(S)
if k == 0 or k == n: return 0
j = S[0]
if k == 1: return j
S = [x-1 for x in S]
if not j: return unchoose(n-1,S[1:])
return binomial(n-1,k-1)+unchoose(n-1,S)
def choose(X,k):
n = len(X)
if k < 0 or k > n: return []
if not k: return [[]]
if k == n: return [X]
return [X[:1] + S for S in choose(X[1:],k-1)] + choose(X[1:],k)
(n,k) = (13,3)
for S in choose(range(n),k): print unchoose(n,S),S
Now, it is also true that you can cache or hash values of both functions, binomial and unchoose. And what's nice about this is that you can compromise between precomputing everything and precomputing nothing. For instance you can precompute only for len(S) <= 3.
You can also optimize unchoose so that it adds the binomial coefficients with a loop if S[0] > 0, instead of decrementing and using tail recursion.
You can try using the lexicographic index of the combination. Maybe this page will help: http://saliu.com/bbs/messages/348.html
This MSDN page has more details: Generating the mth Lexicographical Element of a Mathematical Combination.
NOTE: The MSDN page has been retired. If you download the documentation at the above link, you will find the article on page 10201 of the pdf that is downloaded.
To be a bit more specific:
When treated as a tuple, you can order the combinations lexicographically.
So (0,1,2) < (0,1,3) < (0,1,4) etc.
Say you had the number 0 to n-1 and chose k out of those.
Now if the first element is zero, you know that it is one among the first n-1 choose k-1.
If the first element is 1, then it is one among the next n-2 choose k-1.
This way you can recursively compute the exact position of the given combination in the lexicographic ordering and use that to map it to your number.
This works in reverse too and the MSDN page explains how to do that.
Use a hash table to store the results. A decent hash function could be something like:
h(x) = (x1*p^(k - 1) + x2*p^(k - 2) + ... + xk*p^0) % pp
Where x1 ... xk are the numbers in your combination (for example (0, 1, 2) has x1 = 0, x2 = 1, x3 = 2) and p and pp are primes.
So you would store Hash[h(0, 1, 2)] = 78 and then you would retrieve it the same way.
Note: the hash table is just an array of size pp, not a dict.
I would suggest a specialised hash table. The hash for a combination should be the exclusive-or of the hashes for the values. Hashes for values are basically random bit-patterns.
You could code the table to cope with collisions, but it should be fairly easy to derive a minimal perfect hash scheme - one where no two three-item combinations give the same hash value, and where the hash-size and table-size are kept to a minimum.
This is basically Zobrist hashing - think of a "move" as adding or removing one item of the combination.
EDIT
The reason to use a hash table is that the lookup performance O(n) where n is the number of items in the combination (assuming no collisions). Calculating lexicographical indexes into the combinations is significantly slower, IIRC.
The downside is obviously the up-front work done to generate the table.
For now, I've reached a compromise: I have a 13x13x13 array which just maps to the index of the combination, taking up 13x13x13x2 bytes = 4 kilobytes (using short ints), plus the normal-sized (13 choose 3) * 2 kilobytes = 572 kilobytes, for a total of 576 kilobytes. Much better than 4 megabytes, and also faster than a rank calculation!
I did this partly cause I couldn't seem to get Moron's answer to work. Also this is more extensible - I have a case where I need combinations with repetitions, and I haven't found a way to compute the rank of those, yet.
What you want are called combinadics. Here's my implementation of this concept, in Python:
def nthresh(k, idx):
"""Finds the largest value m such that C(m, k) <= idx."""
mk = k
while ncombs(mk, k) <= idx:
mk += 1
return mk - 1
def idx_to_set(k, idx):
ret = []
for i in range(k, 0, -1):
element = nthresh(i, idx)
ret.append(element)
idx -= ncombs(element, i)
return ret
def set_to_idx(input):
ret = 0
for k, ck in enumerate(sorted(input)):
ret += ncombs(ck, k + 1)
return ret
I have written a class to handle common functions for working with the binomial coefficient, which is the type of problem that your problem falls under. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration and it does not use very much memory. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
It should not be hard to convert this class to C++.

Resources