Forming a sub-sequence of certain digit from given array - algorithm

I had a contest where the question was to find the possible K digits of valid numbers from an array with N number of items.
For instance,
3 -> N (Number of items in Array)
1 2 3 ->A (The array itself constituting N items)
2 -> K (This is the given number of digits to form)
Output should be like
[0, 1] = 12
[0, 2] = 13
[1, 0] = 21
[1, 2] = 23
[2, 0] = 31
[2, 1] = 32
What could be the logic I guess dynamic problem can solve this problem but I would be glad if I get some help.

We can formulate a recursive function which can prints us all the possible K digit number from the given array of single digit integers. Ideone link: https://ideone.com/RTNz2o
def gen(A, K, arr = []):
if len(arr) == K:
print (arr, "=", "".join([str(A[i]) for i in arr]))
for i in range(0, len(A)):
val = A[i]
if val == 0 and len(arr) == 0:
# We don't want numbers starting with 0
continue
if i in arr:
# We don't want to include the same element again
continue
arr.append(i)
gen(A, K, arr)
arr.pop()
gen([1, 2, 3], 2)

Related

Detect outlier in repeating sequence

I have a repeating sequence of say 0~9 (but may start and stop at any of these numbers). e.g.:
3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2
And it has outliers at random location, including 1st and last one, e.g.:
9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6
I need to find & correct the outliers, in the above example, I need correct the first "9" into "3", and "8" into "5", etc..
What I came up with is to construct a sequence with no outlier of desired length, but since I don't know which number the sequence starts with, I'd have to construct 10 sequences each starting from "0", "1", "2" ... "9". And then I can compare these 10 sequences with the given sequence and find the one sequence that match the given sequence the most. However this is very inefficient when the repeating pattern gets large (say if the repeating pattern is 0~99, I'd need to create 100 sequences to compare).
Assuming there won't be consecutive outliers, is there a way to find & correct these outliers efficiently?
edit: added some explanation and added the algorithm tag. Hopefully it is more appropriate now.
I'm going to propose a variation of #trincot's fine answer. Like that one, it doesn't care how many outliers there may be in a row, but unlike that one doesn't care either about how many in a row aren't outliers.
The base idea is just to let each sequence element "vote" on what the first sequence element "should be". Whichever gets the most votes wins. By construction, this maximizes the number of elements left unchanged: after the 1-liner loop ends, votes[i] is the number of elements left unchanged if i is picked as the starting point.
def correct(numbers, mod=None):
# this part copied from #trincot's program
if mod is None: # if argument is not provided:
# Make a guess what the range is of the values
mod = max(numbers) + 1
votes = [0] * mod
for i, x in enumerate(numbers):
# which initial number would make x correct?
votes[(x - i) % mod] += 1
winning_count = max(votes)
winning_numbers = [i for i, v in enumerate(votes)
if v == winning_count]
if len(winning_numbers) > 1:
raise ValueError("ambiguous!", winning_numbers)
winning_number = winning_numbers[0]
for i in range(len(numbers)):
numbers[i] = (winning_number + i) % mod
return numbers
Then, e.g.,
>>> correct([9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6])
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]
but
>>> correct([1, 5, 3, 7, 5, 9])
...
ValueError: ('ambiguous!', [1, 4])
That is, it's impossible to guess whether you want [1, 2, 3, 4, 5, 6] or [4, 5, 6, 7, 8, 9]. They both have 3 numbers "right", and despite that there are never two adjacent outliers in either case.
I would do a first scan of the list to find the longest sublist in the input that maintains the right order. We will then assume that those values are all correct, and calculate backwards what the first value would have to be to produce those values in that sublist.
Here is how that would look in Python:
def correct(numbers, mod=None):
if mod is None: # if argument is not provided:
# Make a guess what the range is of the values
mod = max(numbers) + 1
# Find the longest slice in the list that maintains order
start = 0
longeststart = 0
longest = 1
expected = -1
for last in range(len(numbers)):
if numbers[last] != expected:
start = last
elif last - start >= longest:
longest = last - start + 1
longeststart = start
expected = (numbers[last] + 1) % mod
# Get from that longest slice what the starting value should be
val = (numbers[longeststart] - longeststart) % mod
# Repopulate the list starting from that value
for i in range(len(numbers)):
numbers[i] = val
val = (val + 1) % mod
# demo use
numbers = [9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6]
correct(numbers, 10) # for 0..9 provide 10 as argument, ...etc
print(numbers)
The advantage of this method is that it would even give a good result if there were errors with two consecutive values, provided that there are enough correct values in the list of course.
Still this runs in linear time.
Here is another way using groupby and count from Python's itertools module:
from itertools import count, groupby
def correct(lst):
groupped = [list(v) for _, v in groupby(lst, lambda a, b=count(): a - next(b))]
# Check if all groups are singletons
if all(len(k) == 1 for k in groupped):
raise ValueError('All groups are singletons!')
for k, v in zip(groupped, groupped[1:]):
if len(k) < 2:
out = v[0] - 1
if out >= 0:
yield out
else:
yield from k
else:
yield from k
# check last element of the groupped list
if len(v) < 2:
yield k[-1] + 1
else:
yield from v
lst = "9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6"
lst = [int(k) for k in lst.split(',')]
out = list(correct(lst))
print(out)
Output:
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]
Edit:
For the case of [1, 5, 3, 7, 5, 9] this solution will return something not accurate, because i can't see which value you want to modify. This is why the best solution is to check & raise a ValueError if all groups are singletons.
Like this?
numbers = [9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6]
i = 0
for n in numbers[:-1]:
i += 1
if n > numbers[i] and n > 0:
numbers[i-1] = numbers[i]-1
elif n > numbers[i] and n == 0:
numbers[i - 1] = 9
n = numbers[-1]
if n > numbers[0] and n > 0:
numbers[-1] = numbers[0] - 1
elif n > numbers[0] and n == 0:
numbers[-1] = 9
print(numbers)

Shuffle an int array such that array elements in even indices are smaller than array elements in odd indices

I need to have all the elements in the even indices arr[0],arr[2],arr[4] etc be smaller than the elements with odd indices arr[1],arr[3],arr[5], etc
My approach was to find the MEDIAN and then write out all elements smaller than the median in odd indices and all elements larger than the median in even places.
Question: is there a way to do the array shuffling IN PLACE after finding the median ?
import random
def quickselect(items, item_index):
def select(lst, l, r, index):
# base case
if r == l:
return lst[l]
# choose random pivot
pivot_index = random.randint(l, r)
# move pivot to beginning of list
lst[l], lst[pivot_index] = lst[pivot_index], lst[l]
# partition
i = l
for j in range(l+1, r+1):
if lst[j] < lst[l]:
i += 1
lst[i], lst[j] = lst[j], lst[i]
# move pivot to correct location
lst[i], lst[l] = lst[l], lst[i]
# recursively partition one side only
if index == i:
return lst[i]
elif index < i:
return select(lst, l, i-1, index)
else:
return select(lst, i+1, r, index)
if items is None or len(items) < 1:
return None
if item_index < 0 or item_index > len(items) - 1:
raise IndexError()
return select(items, 0, len(items) - 1, item_index)
def shuffleArray(array, median):
newArray = [0] * len(array)
i = 0
for x in range(0,len(array),2):
newArray[x] = array[i]
i+=1
for y in range(1,len(array),2):
newArray[y] = array[i]
i+=1
return newArray
So here's my interpretation of the question.
Shuffle an array so that all data in even indices are smaller than all data in odd indices.
Eg
[1, 3, 2, 4] would be valid, but [1, 2, 3, 4] wouldn't be.
This stops us just being able to sort the array.
Sort the array, smallest to largest.
Split the array at its mid point (rounding the mid point down).
Shuffle the two arrays together. Such that given array [1, 2, 3] and array [4, 5, 6] it becomes [1, 4, 2, 5, 3, 6].
To elaborate on 3, here's some example code... (using javascript)
let a = [ 1, 2, 3 ];
let b = [ 4, 5, 6 ];
let c = [ ] // this will be the sorted array
for (let i = 0; i < a.length + b.length; i++ ) {
if(i % 2 == 0) c.push( a[Math.floor( i/2 )]);
else c.push( b[Math.floor( i/2 )]);
}
This produces the array [1, 4, 2, 5, 3, 6], which i believe fufils the requirement.

no. of permutation of number from 1 to n in which i >i+1 and i-1

for a given N how many permutations of [1, 2, 3, ..., N] satisfy the following property.
Let P1, P2, ..., PN denote the permutation. The property we want to satisfy is that there exists an i between 2 and n-1 (inclusive) such that
Pj > Pj + 1 ∀ i ≤ j ≤ N - 1.
Pj > Pj - 1 ∀ 2 ≤ j ≤ i.
like for N=3
Permutations [1, 3, 2] and [2, 3, 1] satisfy the property.
Is there any direct formula or algorithm to find these set in programming.
There are 2^(n-1) - 2 such permutations. If n is the largest element, then the permutation is uniquely determined by the nonempty, proper subset of {1, 2, ..., n-1} which lies to the left of n in the permutation. This answer is consistent with the excellent answer of #גלעדברקן in view of the well-known fact that the elements in each row of Pascal's triangle sum to a power of two (hence the part of the row between the two ones is two less than a power of two).
Here is a Python enumeration which generates all n! permutations and checks them for validity:
import itertools
def validPerm(p):
n = max(p)
i = p.index(n)
if i == 0 or i == n-1:
return False
else:
before = p[:i]
after = p[i+1:]
return before == sorted(before) and after == sorted(after, reverse = True)
def validPerms(n):
nums = list(range(1,n+1))
valids = []
for p in itertools.permutations(nums):
lp = list(p)
if validPerm(lp): valids.append(lp)
return valids
For example,
>>> validPerms(4)
[[1, 2, 4, 3], [1, 3, 4, 2], [1, 4, 3, 2], [2, 3, 4, 1], [2, 4, 3, 1], [3, 4, 2, 1]]
which gives the expected number of 6.
On further edit: The above code was to verify the formula for nondegenerate unimodal permutations (to coin a phrase since "unimodal permutations" is used in the literature for the 2^(n-1) permutations with exactly one peak, but the 2 which either begin or end with n are arguably in some sense degenerate). From an enumeration point of view you would want to do something more efficient. The following is a Python implementation of the idea behind the answer of #גלעדברקן :
def validPerms(n):
valids = []
nums = list(range(1,n)) #1,2,...,n-1
snums = set(nums)
for i in range(1,n-1):
for first in itertools.combinations(nums,i):
#first will be already sorted
rest = sorted(snums - set(first),reverse = True)
valids.append(list(first) + [n] + rest)
return valids
It is functionally equivalent to the above code, but substantially more efficient.
Let's look at an example:
{1,2,3,4,5,6}
Clearly, any positioning of 6 at i will mean the right side of it will be sorted descending and the left side of it ascending. For example, i = 3
{1,2,6,5,4,3}
{1,3,6,5,4,2}
{1,4,6,5,3,2}
...
So for each positioning of N between 2 and n-1, we have (n - 1) choose (position - 1) arrangements. This leads to the answer:
sum [(n - 1) choose (i - 1)], for i = 2...(n - 1)
there are ans perm. and ans is as follows
ans equal to 2^(n-1) and
ans -= 2
as it need to be in between 2 <=i <= n-1 && we know that nC1 ans nCn = 1

How can the complexity of this function be decreased?

I got this function:
def get_sum_slices(a, sum)
count = 0
a.length.times do |n|
a.length.times do |m|
next if n > m
count += 1 if a[n..m].inject(:+) == sum
end
end
count
end
Given this [-2, 0, 3, 2, -7, 4] array and 2 as sum it will return 2 because two sums of a slice equal 0 - [2] and [3, 2, -7, 4]. Anyone an idea on how to improve this to a O(N*log(N))?
I am not familiar with ruby, but it seems to me you are trying to find how many contiguous subarrays that sums to sum.
Your code is doing a brute force of finding ALL subarrays - O(N^2) of those, summing them - O(N) each, and checking if it matches.
This totals in O(N^3) code.
It can be done more efficiently1:
define a new array sums as follows:
sums[i] = arr[0] + arr[1] + ... + arr[i]
It is easy to calculate the above in O(N) time. Note that with the assumption of non negative numbers - this sums array is sorted.
Now, iterate the sums array, and do a binary search for each element sums[i], if there is some index j such that sums[j]-sums[i] == SUM. If the answer is true, add by one (more simple work is needed if array can contains zero, it does not affect complexity).
Since the search is binary search, and is done in O(logN) per iteration, and you do it for each element - you actually have O(NlogN) algorithm.
Similarly, but adding the elements in sums to a hash-set instead of placing them in a sorted array, you can reach O(N) average case performance, since seeking each element is now O(1) on average.
pseudo code:
input: arr , sum
output: numOccurances - number of contiguous subarrays that sums to sum
currSum = 0
S = new hash set (multiset actually)
for each element x in arr:
currSum += x
add x to S
numOccurances= 0
for each element x in S:
let k = number of occurances of sum-x in the hashset
numOccurances += k
return numOccurances
Note that the hash set variant does not need the restriction of non-negative numbers, and can handle it as well.
(1) Assuming your array contains only non negative numbers.
According to amit's algorithm :
def get_sum_slices3(a, sum)
s = a.inject([]) { |m, e| m << e + m.last.to_i }
s.sort!
s.count { |x| s.bsearch { |y| x - y == sum } }
end
Ruby uses quicksort which is nlogn in most cases
You should detail better what you're trying to achieve here. Anyway computing the number of subarray that have a specific sum could be done like this:
def get_sum_slices(a, sum)
count = 0
(2..a.length).each do |n|
a.combination(n).each do |combination|
if combination.inject(:+) == sum
count += 1
puts combination.inspect
end
end
end
count
end
btw your example should return 6
irb> get_sum_slices [-2, 0, 3, 2, -7, 4], 0
[-2, 2]
[-2, 0, 2]
[3, -7, 4]
[0, 3, -7, 4]
[-2, 3, 2, -7, 4]
[-2, 0, 3, 2, -7, 4]
=> 6

Transform a set of large integers into a set of small ones

How do we recode a set of strictly increasing (or strictly decreasing) positive integers P, to decrease the number of positive integers that can occur between the integers in our set?
Why would we want to do this: Say we want to randomly sample P but 1.) P is too large to enumerate, and 2.) members of P are related in a nonrandom way, but in a way that is too complicated to sample by. However, we know a member of P when we see it. Say we know P[0] and P[n] but can't entertain the idea of enumerating all of P or understanding precisely how members of P are related. Likewise, the number of all possible integers occurring between P[0] and P[n] are many times greater than the size of P, making the chance of randomly drawing a member of P very small.
Example: Let P[0] = 2101010101 & P[n] = 505050505. Now, maybe we're only interested in integers between P[0] and P[n] that have a specific quality (e.g. all integers in P[x] sum to Q or less, each member of P has 7 or less as the largest integer). So, not all positive integers P[n] <= X <= P[0] belong to P. The P I'm interested in is discussed in the comments below.
What I've tried: If P is a strictly decreasing set and we know P[0] and P[n], then we can treat each member as if it were subtracted from P[0]. Doing so decreases each number, perhaps greatly and maintains each member as a unique integer. For the P I'm interested in (below), one can treat each decreased value of P as being divided by a common denominator (9,11,99), which decreases the number of possible integers between members of P. I've found that used in conjunction, these approaches decrease the set of all P[0] <= X <= P[n] by a few orders of magnitude, making the chance of randomly drawing a member of P from all positive integers P[n] <= X <= P[0] still very small.
Note: As should be clear, we have to know something about P. If we don't, that basically means we have no clue of what we're looking for. When we randomly sample integers between P[0] and P[n] (recoded or not) we need to be able to say "Yup, that belongs to P.", if indeed it does.
A good answer could greatly increase the practical application of a computing algorithm I have developed. An example of the kind of P I'm interested in is given in comment 2. I am adamant about giving due credit.
While the original question is asking about a very generic scenario concerning integer encodings, I would suggest that it is unlikely that there exists an approach that works in complete generality. For example, if the P[i] are more or less random (from an information-theoretic standpoint), I would be surprised if anything should work.
So, instead, let us turn our attention to the OP's actual problem of generating partitions of an integer N containing exactly K parts. When encoding with combinatorial objects as integers, it behooves us to preserve as much of the combinatorial structure as possible.
For this, we turn to the classic text Combinatorial Algorithms by Nijenhuis and Wilf, specifically Chapter 13. In fact, in this chapter, they demonstrate a framework to enumerate and sample from a number of combinatorial families -- including partitions of N where the largest part is equal to K. Using the well-known duality between partitions with K parts and partitions where the largest part is K (take the transpose of the Ferrers diagram), we find that we only need to make a change to the decoding process.
Anyways, here's some source code:
import sys
import random
import time
if len(sys.argv) < 4 :
sys.stderr.write("Usage: {0} N K iter\n".format(sys.argv[0]))
sys.stderr.write("\tN = number to be partitioned\n")
sys.stderr.write("\tK = number of parts\n")
sys.stderr.write("\titer = number of iterations (if iter=0, enumerate all partitions)\n")
quit()
N = int(sys.argv[1])
K = int(sys.argv[2])
iters = int(sys.argv[3])
if (N < K) :
sys.stderr.write("Error: N<K ({0}<{1})\n".format(N,K))
quit()
# B[n][k] = number of partitions of n with largest part equal to k
B = [[0 for j in range(K+1)] for i in range(N+1)]
def calc_B(n,k) :
for j in xrange(1,k+1) :
for m in xrange(j, n+1) :
if j == 1 :
B[m][j] = 1
elif m - j > 0 :
B[m][j] = B[m-1][j-1] + B[m-j][j]
else :
B[m][j] = B[m-1][j-1]
def generate(n,k,r=None) :
path = []
append = path.append
# Invalid input
if n < k or n == 0 or k == 0:
return []
# Pick random number between 1 and B[n][k] if r is not specified
if r == None :
r = random.randrange(1,B[n][k]+1)
# Construct path from r
while r > 0 :
if n==1 and k== 1:
append('N')
r = 0 ### Finish loop
elif r <= B[n-k][k] and B[n-k][k] > 0 : # East/West Move
append('E')
n = n-k
else : # Northeast/Southwest move
append('N')
r -= B[n-k][k]
n = n-1
k = k-1
# Decode path into partition
partition = []
l = 0
d = 0
append = partition.append
for i in reversed(path) :
if i == 'N' :
if d > 0 : # apply East moves all at once
for j in xrange(l) :
partition[j] += d
d = 0 # reset East moves
append(1) # apply North move
l += 1
else :
d += 1 # accumulate East moves
if d > 0 : # apply any remaining East moves
for j in xrange(l) :
partition[j] += d
return partition
t = time.clock()
sys.stderr.write("Generating B table... ")
calc_B(N, K)
sys.stderr.write("Done ({0} seconds)\n".format(time.clock()-t))
bmax = B[N][K]
Bits = 0
sys.stderr.write("B[{0}][{1}]: {2}\t".format(N,K,bmax))
while bmax > 1 :
bmax //= 2
Bits += 1
sys.stderr.write("Bits: {0}\n".format(Bits))
if iters == 0 : # enumerate all partitions
for i in xrange(1,B[N][K]+1) :
print i,"\t",generate(N,K,i)
else : # generate random partitions
t=time.clock()
for i in xrange(1,iters+1) :
Q = generate(N,K)
print Q
if i%1000==0 :
sys.stderr.write("{0} written ({1:.3f} seconds)\r".format(i,time.clock()-t))
sys.stderr.write("{0} written ({1:.3f} seconds total) ({2:.3f} iterations per second)\n".format(i, time.clock()-t, float(i)/(time.clock()-t) if time.clock()-t else 0))
And here's some examples of the performance (on a MacBook Pro 8.3, 2GHz i7, 4 GB, Mac OSX 10.6.3, Python 2.6.1):
mhum$ python part.py 20 5 10
Generating B table... Done (6.7e-05 seconds)
B[20][5]: 84 Bits: 6
[7, 6, 5, 1, 1]
[6, 6, 5, 2, 1]
[5, 5, 4, 3, 3]
[7, 4, 3, 3, 3]
[7, 5, 5, 2, 1]
[8, 6, 4, 1, 1]
[5, 4, 4, 4, 3]
[6, 5, 4, 3, 2]
[8, 6, 4, 1, 1]
[10, 4, 2, 2, 2]
10 written (0.000 seconds total) (37174.721 iterations per second)
mhum$ python part.py 20 5 1000000 > /dev/null
Generating B table... Done (5.9e-05 seconds)
B[20][5]: 84 Bits: 6
100000 written (2.013 seconds total) (49665.478 iterations per second)
mhum$ python part.py 200 25 100000 > /dev/null
Generating B table... Done (0.002296 seconds)
B[200][25]: 147151784574 Bits: 37
100000 written (8.342 seconds total) (11987.843 iterations per second)
mhum$ python part.py 3000 200 100000 > /dev/null
Generating B table... Done (0.313318 seconds)
B[3000][200]: 3297770929953648704695235165404132029244952980206369173 Bits: 181
100000 written (59.448 seconds total) (1682.135 iterations per second)
mhum$ python part.py 5000 2000 100000 > /dev/null
Generating B table... Done (4.829086 seconds)
B[5000][2000]: 496025142797537184410324290349759736884515893324969819660 Bits: 188
100000 written (255.328 seconds total) (391.653 iterations per second)
mhum$ python part-final2.py 20 3 0
Generating B table... Done (0.0 seconds)
B[20][3]: 33 Bits: 5
1 [7, 7, 6]
2 [8, 6, 6]
3 [8, 7, 5]
4 [9, 6, 5]
5 [10, 5, 5]
6 [8, 8, 4]
7 [9, 7, 4]
8 [10, 6, 4]
9 [11, 5, 4]
10 [12, 4, 4]
11 [9, 8, 3]
12 [10, 7, 3]
13 [11, 6, 3]
14 [12, 5, 3]
15 [13, 4, 3]
16 [14, 3, 3]
17 [9, 9, 2]
18 [10, 8, 2]
19 [11, 7, 2]
20 [12, 6, 2]
21 [13, 5, 2]
22 [14, 4, 2]
23 [15, 3, 2]
24 [16, 2, 2]
25 [10, 9, 1]
26 [11, 8, 1]
27 [12, 7, 1]
28 [13, 6, 1]
29 [14, 5, 1]
30 [15, 4, 1]
31 [16, 3, 1]
32 [17, 2, 1]
33 [18, 1, 1]
I'll leave it to the OP to verify that this code indeed generates partitions according to the desired (uniform) distribution.
EDIT: Added an example of the enumeration functionality.
Below is a script that accomplishes what I've asked, as far as recoding integers that represent integer partitions of N with K parts. A better recoding method is needed for this approach to be practical for K > 4. This is definitely not a best or preferred approach. However, it's conceptually simple and easily argued as fundamentally unbiased. It's also very fast for small K. The script runs fine in Sage notebook and does not call Sage functions. It is NOT a script for random sampling. Random sampling per se is not the problem.
The method:
1.) Treat integer partitions as if their summands are concatenated together and padded with zeros according to size of largest summand in first lexical partition, e.g. [17,1,1,1] -> 17010101 & [5,5,5,5] -> 05050505
2.) Treat the resulting integers as if they are subtracted from the largest integer (i.e. the int representing the first lexical partition). e.g. 17010101 - 5050505 = 11959596
3.) Treat each resulting decreased integer as divided by a common denominator, e.g. 11959596/99 = 120804
So, if we wanted to choose a random partition we would:
1.) Choose a number between 0 and 120,804 (instead of a number between 5,050,505 and 17,010,101)
2.) Multiply the number by 99 and substract from 17010101
3.) Split the resulting integer according to how we treated each integer as being padded with 0's
Pro's and Con's: As stated in the body of the question, this particular recoding method doesn't do enough to greatly improve the chance of randomly selecting an integer representing a member of P. For small numbers of parts, e.g. K < 5 and substantially larger totals, e.g. N > 100, a function that implements this concept can be very fast because the approach avoids timely recursion (snake eating its tail) that slows other random partition functions or makes other functions impractical for dealing with large N.
At small K, the probability of drawing a member of P can be reasonable when considering how fast the rest of the process is. Coupled with quick random draws, decoding, and evaluation, this function can find uniform random partitions for combinations of N&K (e.g. N = 20000, K = 4) that are untennable with other algorithms. A better way to recode integers is greatly needed to make this a generally powerful approach.
import random
import sys
First, some generally useful and straightforward functions
def first_partition(N,K):
part = [N-K+1]
ones = [1]*(K-1)
part.extend(ones)
return part
def last_partition(N,K):
most_even = [int(floor(float(N)/float(K)))]*K
_remainder = int(N%K)
j = 0
while _remainder > 0:
most_even[j] += 1
_remainder -= 1
j += 1
return most_even
def first_part_nmax(N,K,Nmax):
part = [Nmax]
N -= Nmax
K -= 1
while N > 0:
Nmax = min(Nmax,N-K+1)
part.append(Nmax)
N -= Nmax
K -= 1
return part
#print first_partition(20,4)
#print last_partition(20,4)
#print first_part_nmax(20,4,12)
#sys.exit()
def portion(alist, indices):
return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]
def next_restricted_part(part,N,K): # *find next partition matching N&K w/out recursion
if part == last_partition(N,K):return first_partition(N,K)
for i in enumerate(reversed(part)):
if i[1] - part[-1] > 1:
if i[0] == (K-1):
return first_part_nmax(N,K,(i[1]-1))
else:
parts = portion(part,[K-i[0]-1]) # split p
h1 = parts[0]
h2 = parts[1]
next = first_part_nmax(sum(h2),len(h2),(h2[0]-1))
return h1+next
""" *I don't know a math software that has this function and Nijenhuis and Wilf (1978)
don't give it (i.e. NEXPAR is not restricted by K). Apparently, folks often get the
next restricted part using recursion, which is unnecessary """
def int_to_list(i): # convert an int to a list w/out padding with 0'
return [int(x) for x in str(i)]
def int_to_list_fill(i,fill):# convert an int to a list and pad with 0's
return [x for x in str(i).zfill(fill)]
def list_to_int(l):# convert a list to an integer
return "".join(str(x) for x in l)
def part_to_int(part,fill):# convert an int to a partition of K parts
# and pad with the respective number of 0's
p_list = []
for p in part:
if len(int_to_list(p)) != fill:
l = int_to_list_fill(p,fill)
p = list_to_int(l)
p_list.append(p)
_int = list_to_int(p_list)
return _int
def int_to_part(num,fill,K): # convert an int to a partition of K parts
# and pad with the respective number of 0's
# This function isn't called by the script, but I thought I'd include
# it anyway because it would be used to recover the respective partition
_list = int_to_list(num)
if len(_list) != fill*K:
ct = fill*K - len(_list)
while ct > 0:
_list.insert(0,0)
ct -= 1
new_list1 = []
new_list2 = []
for i in _list:
new_list1.append(i)
if len(new_list1) == fill:
new_list2.append(new_list1)
new_list1 = []
part = []
for i in new_list2:
j = int(list_to_int(i))
part.append(j)
return part
Finally, we get to the total N and number of parts K. The following will print partitions satisfying N&K in lexical order, with associated recoded integers
N = 20
K = 4
print '#, partition, coded, _diff, smaller_diff'
first_part = first_partition(N,K) # first lexical partition for N&K
fill = len(int_to_list(max(first_part)))
# pad with zeros to 1.) ensure a strictly decreasing relationship w/in P,
# 2.) keep track of (encode/decode) partition summand values
first_num = part_to_int(first_part,fill)
last_part = last_partition(N,K)
last_num = part_to_int(last_part,fill)
print '1',first_part,first_num,'',0,' ',0
part = list(first_part)
ct = 1
while ct < 10:
part = next_restricted_part(part,N,K)
_num = part_to_int(part,fill)
_diff = int(first_num) - int(_num)
smaller_diff = (_diff/99)
ct+=1
print ct, part, _num,'',_diff,' ',smaller_diff
OUTPUT:
ct, partition, coded, _diff, smaller_diff
1 [17, 1, 1, 1] 17010101 0 0
2 [16, 2, 1, 1] 16020101 990000 10000
3 [15, 3, 1, 1] 15030101 1980000 20000
4 [15, 2, 2, 1] 15020201 1989900 20100
5 [14, 4, 1, 1] 14040101 2970000 30000
6 [14, 3, 2, 1] 14030201 2979900 30100
7 [14, 2, 2, 2] 14020202 2989899 30201
8 [13, 5, 1, 1] 13050101 3960000 40000
9 [13, 4, 2, 1] 13040201 3969900 40100
10 [13, 3, 3, 1] 13030301 3979800 40200
In short, integers in the last column could be a lot smaller.
Why a random sampling strategy based on this idea is fundamentally unbiased:
Each integer partition of N having K parts corresponds to one and only one recoded integer. That is, we don't pick a number at random, decode it, and then try to rearrange the elements to form a proper partition of N&K. Consequently, each integer (whether corresponding to partitions of N&K or not) has the same chance of being drawn. The goal is to inherently reduce the number of integers not corresponding to partitions of N with K parts, and so, to make the process of random sampling faster.

Resources