Detect outlier in repeating sequence - algorithm

I have a repeating sequence of say 0~9 (but may start and stop at any of these numbers). e.g.:
3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2
And it has outliers at random location, including 1st and last one, e.g.:
9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6
I need to find & correct the outliers, in the above example, I need correct the first "9" into "3", and "8" into "5", etc..
What I came up with is to construct a sequence with no outlier of desired length, but since I don't know which number the sequence starts with, I'd have to construct 10 sequences each starting from "0", "1", "2" ... "9". And then I can compare these 10 sequences with the given sequence and find the one sequence that match the given sequence the most. However this is very inefficient when the repeating pattern gets large (say if the repeating pattern is 0~99, I'd need to create 100 sequences to compare).
Assuming there won't be consecutive outliers, is there a way to find & correct these outliers efficiently?
edit: added some explanation and added the algorithm tag. Hopefully it is more appropriate now.

I'm going to propose a variation of #trincot's fine answer. Like that one, it doesn't care how many outliers there may be in a row, but unlike that one doesn't care either about how many in a row aren't outliers.
The base idea is just to let each sequence element "vote" on what the first sequence element "should be". Whichever gets the most votes wins. By construction, this maximizes the number of elements left unchanged: after the 1-liner loop ends, votes[i] is the number of elements left unchanged if i is picked as the starting point.
def correct(numbers, mod=None):
# this part copied from #trincot's program
if mod is None: # if argument is not provided:
# Make a guess what the range is of the values
mod = max(numbers) + 1
votes = [0] * mod
for i, x in enumerate(numbers):
# which initial number would make x correct?
votes[(x - i) % mod] += 1
winning_count = max(votes)
winning_numbers = [i for i, v in enumerate(votes)
if v == winning_count]
if len(winning_numbers) > 1:
raise ValueError("ambiguous!", winning_numbers)
winning_number = winning_numbers[0]
for i in range(len(numbers)):
numbers[i] = (winning_number + i) % mod
return numbers
Then, e.g.,
>>> correct([9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6])
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]
but
>>> correct([1, 5, 3, 7, 5, 9])
...
ValueError: ('ambiguous!', [1, 4])
That is, it's impossible to guess whether you want [1, 2, 3, 4, 5, 6] or [4, 5, 6, 7, 8, 9]. They both have 3 numbers "right", and despite that there are never two adjacent outliers in either case.

I would do a first scan of the list to find the longest sublist in the input that maintains the right order. We will then assume that those values are all correct, and calculate backwards what the first value would have to be to produce those values in that sublist.
Here is how that would look in Python:
def correct(numbers, mod=None):
if mod is None: # if argument is not provided:
# Make a guess what the range is of the values
mod = max(numbers) + 1
# Find the longest slice in the list that maintains order
start = 0
longeststart = 0
longest = 1
expected = -1
for last in range(len(numbers)):
if numbers[last] != expected:
start = last
elif last - start >= longest:
longest = last - start + 1
longeststart = start
expected = (numbers[last] + 1) % mod
# Get from that longest slice what the starting value should be
val = (numbers[longeststart] - longeststart) % mod
# Repopulate the list starting from that value
for i in range(len(numbers)):
numbers[i] = val
val = (val + 1) % mod
# demo use
numbers = [9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6]
correct(numbers, 10) # for 0..9 provide 10 as argument, ...etc
print(numbers)
The advantage of this method is that it would even give a good result if there were errors with two consecutive values, provided that there are enough correct values in the list of course.
Still this runs in linear time.

Here is another way using groupby and count from Python's itertools module:
from itertools import count, groupby
def correct(lst):
groupped = [list(v) for _, v in groupby(lst, lambda a, b=count(): a - next(b))]
# Check if all groups are singletons
if all(len(k) == 1 for k in groupped):
raise ValueError('All groups are singletons!')
for k, v in zip(groupped, groupped[1:]):
if len(k) < 2:
out = v[0] - 1
if out >= 0:
yield out
else:
yield from k
else:
yield from k
# check last element of the groupped list
if len(v) < 2:
yield k[-1] + 1
else:
yield from v
lst = "9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6"
lst = [int(k) for k in lst.split(',')]
out = list(correct(lst))
print(out)
Output:
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]
Edit:
For the case of [1, 5, 3, 7, 5, 9] this solution will return something not accurate, because i can't see which value you want to modify. This is why the best solution is to check & raise a ValueError if all groups are singletons.

Like this?
numbers = [9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6]
i = 0
for n in numbers[:-1]:
i += 1
if n > numbers[i] and n > 0:
numbers[i-1] = numbers[i]-1
elif n > numbers[i] and n == 0:
numbers[i - 1] = 9
n = numbers[-1]
if n > numbers[0] and n > 0:
numbers[-1] = numbers[0] - 1
elif n > numbers[0] and n == 0:
numbers[-1] = 9
print(numbers)

Related

Why doesn't the algorithm for partitioning even and odd numbers in an array work for partitioning around a pivot?

We use the following algorithm to bring all even to the left and all odd to the right side of the array:
def evenOddPartition(self,nums):
# partition an array such that all even are on the left
# and all odd are on the right
i = 0
j = len(nums) - 1
while i < j:
## if i is even skip this index
if nums[i]%2 == 0:
i+=1
elif nums[j] %2 == 0:
## if nums[i] is odd and nums[j] is even
nums[i],nums[j] = nums[j],nums[i]
j-= 1
else:
## both are odd
## decrement j (i.e try to see if there is any other even before it)
j-=1
return nums
Even/Odd is a binary classification like true false.
My question is now why we can't apply this same binary classification to a problems like this:
Consider this array:
y = [2,3,5,-100,100,5,5,6,3,5]
And you are asked to move all elements <= 5 to the left side and all > 5 to the right side:
Using the same logic as the Even/Odd problem, I present this code
def tryTwo(self,nums):
pivot = 5
i = 0
j = len(nums) - 1
while i < j:
if nums[i] < pivot:
i+=1
elif nums[i] == pivot:
i+=1
elif nums[i] > pivot:
if nums[j] <= pivot:
nums[i], nums[j] = nums[j], nums[i]
j-=1
else:
j-=1
else:
j-=1
return nums
However, this code outputs [2, 3, 5, -100, 5, 5, 5, 3, 6, 100], which is the wrong answer. The correct answer is something like this
[2, 3, -100, 3, 5, 5, 5, 5, 100, 6]
What am I missing here? Is there a bug in my second code?
Hi :) You stated that the task you're given is
You are asked to move all elements <= 5 to the left side and all >
5 to the right side
Which means that the output [2, 3, 5, -100, 5, 5, 5, 3, 6, 100] is perfectly valid.
All the elements that are less than OR equals to 5 are moved to the left side of the list
[2, 3, 5, -100, 5, 5, 5, 3,.
And the others are moved to the right side
6, 100].
As Blastfurnace mentioned, your code treats every number below 6 as the same thing, and doesn't have any sorting mechanism for those numbers. So it makes sense for 3 to be after 5 (since these numbers aren't sorted).
If you want all the elements that are less than 5 to the left, every 5s to the middle and everything larger than 5 to the right, you will need a different algorithm (or just use a sorting algorithm).

Find local min based on the length of occurences of successive means without falling in wrong min

1. Problem description
I have the following list of values [10, 10, 10, 10, 5, 5, 5, 5, 7, 7, 7, 2, 4, 3, 3, 3, 10] It is shown in the following picture.
What I want to do is find the minimum based on the value of the element and
its duration. From the previous list we can construct the following dictionary (key:val) :[10:4, 5:4, 7:2, 2:1, 4:1, 3:3, 10:1]. Meaning we have 4 sucessive 10s followed by 4 successive 5s, 2 successive 7s and 3 successive 3s.
Based on what I said the local min is 5. But I don't want that The local min should be 3. We didn't select 2 because it happened only once.
Do you have an idea on how we can solve that problem. Is there an existing method that can be used to solve it?
Of course we can sort the dictionary by values [10:4, 5:4, 7:2, 3:3, 10:1] and select the lowest key that has a value different than 1. Is that a good solution?
2. Selection criteria
must be a local min (find_local_min(prices))
must have the highest numbers of succession
the min succession must be > 1
AND I AM STUCK! because now I have 3 as local minimum but it is repeated only 3 times. I was testing if My idea is correct and I tried to find a counter example and I shot my foot
3. source code
the following code extracts the minimums with the dictionary:
#!/usr/bin/env python
import csv
import sys
import os
from collections import defaultdict
def find_local_min(prices):
i = 1
minPrices = []
while i < len(prices):
if prices[i] < prices[i-1]:
minPrices.append(prices[i])
j = i + 1
while j < len(prices) and prices[j] == prices[j-1]:
minPrices.append(prices[j])
j += 1
i = j
else:
i += 1
return minPrices
if __name__ == "__main__":
l = [10, 10, 10, 10, 5, 5, 5, 5, 7, 7, 7, 2,4, 3, 3, 3, 10]
minPrices = find_local_min(l)
minPriceDict = defaultdict(int)
for future in minPrices :
minPriceDict[future] += 1
print minPriceDict
As output if gives the following: defaultdict(<type 'int'>, {2: 1, 3:
3, 5: 4}) Based on this output the algorithm will select 5 as the min
because it is repeated 5 successive times. But that's wrong! it
should be 3. I really want to know how to solve that problem

Python Coin Change: Incrementing list in return statement?

Edit: Still working on this, making progress though.
def recursion_change(available_coins, tender):
"""
Returns a tuple containing:
:an array counting which coins are used to make change, mirroring the input array
:the number of coins to make tender.
:coins: List[int]
:money: int
:rtype: (List[int], int)
"""
change_list = [0] * len(available_coins)
def _helper_recursion_change(change_index, remaining_balance, change_list):
if remaining_balance == 0:
return (change_list, sum(change_list))
elif change_index == -1 or remaining_balance < 0:
return float('inf')
else:
test_a = _helper_recursion_change(change_index-1, remaining_balance, change_list)
test_b = _helper_recursion_change(_helper_recursion_change(len(available_coins)-1, tender, change_list))
test_min = min(test_a or test_b)
if :
_helper_recursion_change()
else:
_helper_recursion_change()
return 1 + _helper_recursion_change(change_index, remaining_balance-available_coins[change_index], change_list))
print str(recursion_change([1, 5, 10, 25, 50, 100], 72)) # Current Output: 5
# Desired Output: ([2, 0, 2, 0, 1, 0], 5)
Quick overview: this coin-change algorithm is supposed to receive a list of possible change options and tender. It's supposed to recursively output a mirror array and the number of coins needed to make tender, and I think the best way to do that is with a tuple.
For example:
> recursion_change([1, 2, 5, 10, 25], 49)
>> ([0, 2, 0, 2, 1], 5)
Working code sample:
http://ideone.com/mmtuMr
def recursion_change(coins, money):
"""
Returns a tuple containing:
:an array counting which coins are used to make change, mirroring the input array
:the number of coins to make tender.
:coins: List[int]
:money: int
:rtype: (List[int], int)
"""
change_list = [0] * len(coins)
def _helper_recursion_change(i, k, change_list):
if k == 0: # Base case: money in this (sub)problem matches change precisely
return 0
elif i == -1 or k < 0: # Base case: change cannot be made for this subproblem
return float('inf')
else: # Otherwise, simplify by recursing:
# Take the minimum of:
# the number of coins to make i cents
# the number of coins to make k-i cents
return min(_helper_recursion_change(i-1, k, change_list), 1 + _helper_recursion_change(i, k-coins[i], change_list))
return (_helper_recursion_change(len(coins)-1, money, change_list))
print str(recursion_change([1, 5, 10, 25, 50, 100], 6)) # Current Output: 2
# Desired Output: ([1, 1, 0, 0, 0, 0], 2)
Particularly, this line:
1 + _helper_recursion_change(i, k-coins[i], change_list))
It's easy enough to catch the number of coins we need, as the program does now. Do I have to change the return value to include change_list, so I can increment it? What's the best way to do that without messing with the recursion, as it currently returns just a simple integer.
Replacing change_list in the list above with change_list[i] + 1 gives me a
TypeError: 'int' object is unsubscriptable or change_list[i] += 1 fails to run because it's 'invalid syntax'.

Finding minimum element to the right of an index in an array for all indices

Given an array, I wish to find the minimum element to the right of the current element at i where 0=<i<n and store the index of the corresponding minimum element in another array.
For example, I have an array A ={1,3,6,7,8}
The result array would contain R={1,2,3,4} .(R array stores indices to min element).
I could only think of an O(N^2) approach.. where for each element in A, I would traverse the remaining elements to right of A and find the minimum.
Is it possible to do this in O(N)? I want to use the solution to solve another problem.
You should be able to do this in O(n) by filling the array from the right hand side and maintaining the index of the current minimum, as per the following pseudo-code:
def genNewArray (oldArray):
newArray = new array[oldArray.size]
saveIndex = -1
for i = newArray.size - 1 down to 0:
newArray[i] = saveIndex
if saveIndex == -1 or oldArray[i] < oldArray[saveIndex]:
saveIndex = i
return newArray
This passes through the array once, giving you the O(n) time complexity. It can do this because, once you've found a minimum beyond element N, it will only change for element N-1 if element N is less than the current minimum.
The following Python code shows this in action:
def genNewArray (oldArray):
newArray = []
saveIndex = -1
for i in range (len (oldArray) - 1, -1, -1):
newArray.insert (0, saveIndex)
if saveIndex == -1 or oldArray[i] < oldArray[saveIndex]:
saveIndex = i
return newArray
oldList = [1,3,6,7,8,2,7,4]
x = genNewArray (oldList)
print "idx", [0,1,2,3,4,5,6,7]
print "old", oldList
print "new", x
The output of this is:
idx [0, 1, 2, 3, 4, 5, 6, 7]
old [1, 3, 6, 7, 8, 2, 7, 4]
new [5, 5, 5, 5, 5, 7, 7, -1]
and you can see that the indexes at each element of the new array (the second one) correctly point to the minimum value to the right of each element in the original (first one).
Note that I've taken one specific definition of "to the right of", meaning it doesn't include the current element. If your definition of "to the right of" includes the current element, just change the order of the insert and if statement within the loop so that the index is updated first:
idx [0, 1, 2, 3, 4, 5, 6, 7]
old [1, 3, 6, 7, 8, 2, 7, 4]
new [0, 5, 5, 5, 5, 5, 7, 7]
The code for that removes the check on saveIndex since you know that the minimum index for the last element can be found at the last element:
def genNewArray (oldArray):
newArray = []
saveIndex = len (oldArray) - 1
for i in range (len (oldArray) - 1, -1, -1):
if oldArray[i] < oldArray[saveIndex]:
saveIndex = i
newArray.insert (0, saveIndex)
return newArray
Looks like HW. Let f(i) denote the index of the minimum element to the right of the element at i. Now consider walking backwards (filling in f(n-1), then f(n-2), f(n-3), ..., f(3), f(2), f(1)) and think about how information of f(i) can give you information of f(i-1).

Transform a set of large integers into a set of small ones

How do we recode a set of strictly increasing (or strictly decreasing) positive integers P, to decrease the number of positive integers that can occur between the integers in our set?
Why would we want to do this: Say we want to randomly sample P but 1.) P is too large to enumerate, and 2.) members of P are related in a nonrandom way, but in a way that is too complicated to sample by. However, we know a member of P when we see it. Say we know P[0] and P[n] but can't entertain the idea of enumerating all of P or understanding precisely how members of P are related. Likewise, the number of all possible integers occurring between P[0] and P[n] are many times greater than the size of P, making the chance of randomly drawing a member of P very small.
Example: Let P[0] = 2101010101 & P[n] = 505050505. Now, maybe we're only interested in integers between P[0] and P[n] that have a specific quality (e.g. all integers in P[x] sum to Q or less, each member of P has 7 or less as the largest integer). So, not all positive integers P[n] <= X <= P[0] belong to P. The P I'm interested in is discussed in the comments below.
What I've tried: If P is a strictly decreasing set and we know P[0] and P[n], then we can treat each member as if it were subtracted from P[0]. Doing so decreases each number, perhaps greatly and maintains each member as a unique integer. For the P I'm interested in (below), one can treat each decreased value of P as being divided by a common denominator (9,11,99), which decreases the number of possible integers between members of P. I've found that used in conjunction, these approaches decrease the set of all P[0] <= X <= P[n] by a few orders of magnitude, making the chance of randomly drawing a member of P from all positive integers P[n] <= X <= P[0] still very small.
Note: As should be clear, we have to know something about P. If we don't, that basically means we have no clue of what we're looking for. When we randomly sample integers between P[0] and P[n] (recoded or not) we need to be able to say "Yup, that belongs to P.", if indeed it does.
A good answer could greatly increase the practical application of a computing algorithm I have developed. An example of the kind of P I'm interested in is given in comment 2. I am adamant about giving due credit.
While the original question is asking about a very generic scenario concerning integer encodings, I would suggest that it is unlikely that there exists an approach that works in complete generality. For example, if the P[i] are more or less random (from an information-theoretic standpoint), I would be surprised if anything should work.
So, instead, let us turn our attention to the OP's actual problem of generating partitions of an integer N containing exactly K parts. When encoding with combinatorial objects as integers, it behooves us to preserve as much of the combinatorial structure as possible.
For this, we turn to the classic text Combinatorial Algorithms by Nijenhuis and Wilf, specifically Chapter 13. In fact, in this chapter, they demonstrate a framework to enumerate and sample from a number of combinatorial families -- including partitions of N where the largest part is equal to K. Using the well-known duality between partitions with K parts and partitions where the largest part is K (take the transpose of the Ferrers diagram), we find that we only need to make a change to the decoding process.
Anyways, here's some source code:
import sys
import random
import time
if len(sys.argv) < 4 :
sys.stderr.write("Usage: {0} N K iter\n".format(sys.argv[0]))
sys.stderr.write("\tN = number to be partitioned\n")
sys.stderr.write("\tK = number of parts\n")
sys.stderr.write("\titer = number of iterations (if iter=0, enumerate all partitions)\n")
quit()
N = int(sys.argv[1])
K = int(sys.argv[2])
iters = int(sys.argv[3])
if (N < K) :
sys.stderr.write("Error: N<K ({0}<{1})\n".format(N,K))
quit()
# B[n][k] = number of partitions of n with largest part equal to k
B = [[0 for j in range(K+1)] for i in range(N+1)]
def calc_B(n,k) :
for j in xrange(1,k+1) :
for m in xrange(j, n+1) :
if j == 1 :
B[m][j] = 1
elif m - j > 0 :
B[m][j] = B[m-1][j-1] + B[m-j][j]
else :
B[m][j] = B[m-1][j-1]
def generate(n,k,r=None) :
path = []
append = path.append
# Invalid input
if n < k or n == 0 or k == 0:
return []
# Pick random number between 1 and B[n][k] if r is not specified
if r == None :
r = random.randrange(1,B[n][k]+1)
# Construct path from r
while r > 0 :
if n==1 and k== 1:
append('N')
r = 0 ### Finish loop
elif r <= B[n-k][k] and B[n-k][k] > 0 : # East/West Move
append('E')
n = n-k
else : # Northeast/Southwest move
append('N')
r -= B[n-k][k]
n = n-1
k = k-1
# Decode path into partition
partition = []
l = 0
d = 0
append = partition.append
for i in reversed(path) :
if i == 'N' :
if d > 0 : # apply East moves all at once
for j in xrange(l) :
partition[j] += d
d = 0 # reset East moves
append(1) # apply North move
l += 1
else :
d += 1 # accumulate East moves
if d > 0 : # apply any remaining East moves
for j in xrange(l) :
partition[j] += d
return partition
t = time.clock()
sys.stderr.write("Generating B table... ")
calc_B(N, K)
sys.stderr.write("Done ({0} seconds)\n".format(time.clock()-t))
bmax = B[N][K]
Bits = 0
sys.stderr.write("B[{0}][{1}]: {2}\t".format(N,K,bmax))
while bmax > 1 :
bmax //= 2
Bits += 1
sys.stderr.write("Bits: {0}\n".format(Bits))
if iters == 0 : # enumerate all partitions
for i in xrange(1,B[N][K]+1) :
print i,"\t",generate(N,K,i)
else : # generate random partitions
t=time.clock()
for i in xrange(1,iters+1) :
Q = generate(N,K)
print Q
if i%1000==0 :
sys.stderr.write("{0} written ({1:.3f} seconds)\r".format(i,time.clock()-t))
sys.stderr.write("{0} written ({1:.3f} seconds total) ({2:.3f} iterations per second)\n".format(i, time.clock()-t, float(i)/(time.clock()-t) if time.clock()-t else 0))
And here's some examples of the performance (on a MacBook Pro 8.3, 2GHz i7, 4 GB, Mac OSX 10.6.3, Python 2.6.1):
mhum$ python part.py 20 5 10
Generating B table... Done (6.7e-05 seconds)
B[20][5]: 84 Bits: 6
[7, 6, 5, 1, 1]
[6, 6, 5, 2, 1]
[5, 5, 4, 3, 3]
[7, 4, 3, 3, 3]
[7, 5, 5, 2, 1]
[8, 6, 4, 1, 1]
[5, 4, 4, 4, 3]
[6, 5, 4, 3, 2]
[8, 6, 4, 1, 1]
[10, 4, 2, 2, 2]
10 written (0.000 seconds total) (37174.721 iterations per second)
mhum$ python part.py 20 5 1000000 > /dev/null
Generating B table... Done (5.9e-05 seconds)
B[20][5]: 84 Bits: 6
100000 written (2.013 seconds total) (49665.478 iterations per second)
mhum$ python part.py 200 25 100000 > /dev/null
Generating B table... Done (0.002296 seconds)
B[200][25]: 147151784574 Bits: 37
100000 written (8.342 seconds total) (11987.843 iterations per second)
mhum$ python part.py 3000 200 100000 > /dev/null
Generating B table... Done (0.313318 seconds)
B[3000][200]: 3297770929953648704695235165404132029244952980206369173 Bits: 181
100000 written (59.448 seconds total) (1682.135 iterations per second)
mhum$ python part.py 5000 2000 100000 > /dev/null
Generating B table... Done (4.829086 seconds)
B[5000][2000]: 496025142797537184410324290349759736884515893324969819660 Bits: 188
100000 written (255.328 seconds total) (391.653 iterations per second)
mhum$ python part-final2.py 20 3 0
Generating B table... Done (0.0 seconds)
B[20][3]: 33 Bits: 5
1 [7, 7, 6]
2 [8, 6, 6]
3 [8, 7, 5]
4 [9, 6, 5]
5 [10, 5, 5]
6 [8, 8, 4]
7 [9, 7, 4]
8 [10, 6, 4]
9 [11, 5, 4]
10 [12, 4, 4]
11 [9, 8, 3]
12 [10, 7, 3]
13 [11, 6, 3]
14 [12, 5, 3]
15 [13, 4, 3]
16 [14, 3, 3]
17 [9, 9, 2]
18 [10, 8, 2]
19 [11, 7, 2]
20 [12, 6, 2]
21 [13, 5, 2]
22 [14, 4, 2]
23 [15, 3, 2]
24 [16, 2, 2]
25 [10, 9, 1]
26 [11, 8, 1]
27 [12, 7, 1]
28 [13, 6, 1]
29 [14, 5, 1]
30 [15, 4, 1]
31 [16, 3, 1]
32 [17, 2, 1]
33 [18, 1, 1]
I'll leave it to the OP to verify that this code indeed generates partitions according to the desired (uniform) distribution.
EDIT: Added an example of the enumeration functionality.
Below is a script that accomplishes what I've asked, as far as recoding integers that represent integer partitions of N with K parts. A better recoding method is needed for this approach to be practical for K > 4. This is definitely not a best or preferred approach. However, it's conceptually simple and easily argued as fundamentally unbiased. It's also very fast for small K. The script runs fine in Sage notebook and does not call Sage functions. It is NOT a script for random sampling. Random sampling per se is not the problem.
The method:
1.) Treat integer partitions as if their summands are concatenated together and padded with zeros according to size of largest summand in first lexical partition, e.g. [17,1,1,1] -> 17010101 & [5,5,5,5] -> 05050505
2.) Treat the resulting integers as if they are subtracted from the largest integer (i.e. the int representing the first lexical partition). e.g. 17010101 - 5050505 = 11959596
3.) Treat each resulting decreased integer as divided by a common denominator, e.g. 11959596/99 = 120804
So, if we wanted to choose a random partition we would:
1.) Choose a number between 0 and 120,804 (instead of a number between 5,050,505 and 17,010,101)
2.) Multiply the number by 99 and substract from 17010101
3.) Split the resulting integer according to how we treated each integer as being padded with 0's
Pro's and Con's: As stated in the body of the question, this particular recoding method doesn't do enough to greatly improve the chance of randomly selecting an integer representing a member of P. For small numbers of parts, e.g. K < 5 and substantially larger totals, e.g. N > 100, a function that implements this concept can be very fast because the approach avoids timely recursion (snake eating its tail) that slows other random partition functions or makes other functions impractical for dealing with large N.
At small K, the probability of drawing a member of P can be reasonable when considering how fast the rest of the process is. Coupled with quick random draws, decoding, and evaluation, this function can find uniform random partitions for combinations of N&K (e.g. N = 20000, K = 4) that are untennable with other algorithms. A better way to recode integers is greatly needed to make this a generally powerful approach.
import random
import sys
First, some generally useful and straightforward functions
def first_partition(N,K):
part = [N-K+1]
ones = [1]*(K-1)
part.extend(ones)
return part
def last_partition(N,K):
most_even = [int(floor(float(N)/float(K)))]*K
_remainder = int(N%K)
j = 0
while _remainder > 0:
most_even[j] += 1
_remainder -= 1
j += 1
return most_even
def first_part_nmax(N,K,Nmax):
part = [Nmax]
N -= Nmax
K -= 1
while N > 0:
Nmax = min(Nmax,N-K+1)
part.append(Nmax)
N -= Nmax
K -= 1
return part
#print first_partition(20,4)
#print last_partition(20,4)
#print first_part_nmax(20,4,12)
#sys.exit()
def portion(alist, indices):
return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]
def next_restricted_part(part,N,K): # *find next partition matching N&K w/out recursion
if part == last_partition(N,K):return first_partition(N,K)
for i in enumerate(reversed(part)):
if i[1] - part[-1] > 1:
if i[0] == (K-1):
return first_part_nmax(N,K,(i[1]-1))
else:
parts = portion(part,[K-i[0]-1]) # split p
h1 = parts[0]
h2 = parts[1]
next = first_part_nmax(sum(h2),len(h2),(h2[0]-1))
return h1+next
""" *I don't know a math software that has this function and Nijenhuis and Wilf (1978)
don't give it (i.e. NEXPAR is not restricted by K). Apparently, folks often get the
next restricted part using recursion, which is unnecessary """
def int_to_list(i): # convert an int to a list w/out padding with 0'
return [int(x) for x in str(i)]
def int_to_list_fill(i,fill):# convert an int to a list and pad with 0's
return [x for x in str(i).zfill(fill)]
def list_to_int(l):# convert a list to an integer
return "".join(str(x) for x in l)
def part_to_int(part,fill):# convert an int to a partition of K parts
# and pad with the respective number of 0's
p_list = []
for p in part:
if len(int_to_list(p)) != fill:
l = int_to_list_fill(p,fill)
p = list_to_int(l)
p_list.append(p)
_int = list_to_int(p_list)
return _int
def int_to_part(num,fill,K): # convert an int to a partition of K parts
# and pad with the respective number of 0's
# This function isn't called by the script, but I thought I'd include
# it anyway because it would be used to recover the respective partition
_list = int_to_list(num)
if len(_list) != fill*K:
ct = fill*K - len(_list)
while ct > 0:
_list.insert(0,0)
ct -= 1
new_list1 = []
new_list2 = []
for i in _list:
new_list1.append(i)
if len(new_list1) == fill:
new_list2.append(new_list1)
new_list1 = []
part = []
for i in new_list2:
j = int(list_to_int(i))
part.append(j)
return part
Finally, we get to the total N and number of parts K. The following will print partitions satisfying N&K in lexical order, with associated recoded integers
N = 20
K = 4
print '#, partition, coded, _diff, smaller_diff'
first_part = first_partition(N,K) # first lexical partition for N&K
fill = len(int_to_list(max(first_part)))
# pad with zeros to 1.) ensure a strictly decreasing relationship w/in P,
# 2.) keep track of (encode/decode) partition summand values
first_num = part_to_int(first_part,fill)
last_part = last_partition(N,K)
last_num = part_to_int(last_part,fill)
print '1',first_part,first_num,'',0,' ',0
part = list(first_part)
ct = 1
while ct < 10:
part = next_restricted_part(part,N,K)
_num = part_to_int(part,fill)
_diff = int(first_num) - int(_num)
smaller_diff = (_diff/99)
ct+=1
print ct, part, _num,'',_diff,' ',smaller_diff
OUTPUT:
ct, partition, coded, _diff, smaller_diff
1 [17, 1, 1, 1] 17010101 0 0
2 [16, 2, 1, 1] 16020101 990000 10000
3 [15, 3, 1, 1] 15030101 1980000 20000
4 [15, 2, 2, 1] 15020201 1989900 20100
5 [14, 4, 1, 1] 14040101 2970000 30000
6 [14, 3, 2, 1] 14030201 2979900 30100
7 [14, 2, 2, 2] 14020202 2989899 30201
8 [13, 5, 1, 1] 13050101 3960000 40000
9 [13, 4, 2, 1] 13040201 3969900 40100
10 [13, 3, 3, 1] 13030301 3979800 40200
In short, integers in the last column could be a lot smaller.
Why a random sampling strategy based on this idea is fundamentally unbiased:
Each integer partition of N having K parts corresponds to one and only one recoded integer. That is, we don't pick a number at random, decode it, and then try to rearrange the elements to form a proper partition of N&K. Consequently, each integer (whether corresponding to partitions of N&K or not) has the same chance of being drawn. The goal is to inherently reduce the number of integers not corresponding to partitions of N with K parts, and so, to make the process of random sampling faster.

Resources