I am trying to find all the possible factorizations of a number provided in Python.
For example: 1)given n=12,
the output will be, f(n)=[[2,2,3],[4,3],[6,2],[12]]
2) given n=24,
the output will be,f(n)=[2,2,2,3],[2,2,6],[2,12],[4,6],[8,3],[24]]
Here is my code:
def p(a):
k=1
m=1
n=[]
for i in range (len(a)):
for j in range(0,i+1):
k*=a[j]
for l in range(i+1,len(a)):
m*=a[l]
n+=[[k,m],]
k=1
m=1
return n
def f(n):
primfac = []
d = 2
while d*d <= n:
while (n % d) == 0:
primfac.append(d)
n //= d
d += 1
if n > 1:
primfac.append(n)
return p(primfac)
But my code returns following values:
1) For n=12,The output is ,
[[2, 10], [4, 5], [20, 1]]
2)1) For n=24,The output is ,
[[2, 12], [4, 6], [8, 3], [24, 1]]
What can I do for getting relevant results?
I don't know python, so can't help you with the code, but here in an explanation I provided for a related question (a bit of Java code as well, if you can read Java).
get your number factored with multiplicity - this is with high probability the most expensive step O(sqrt(N)) - you can stop here if this is al that you want
build you sets of {1, pi1, pi1, ..., pimi} - pi being a prime factor with multiplicity of mi
perform a Cartesian product between these sets and you'll get all the divisors of your number - you'll spend longer time here only for numbers with many distinct factors (and multiplicities) - e.g 210 x 3 8 x 54 x 73 will have 1980 divisors.
Now, each divisor d resulted from the above will come with it's pair (N/d) so if you want distinct factorisation irrespective of the order, you''l need to sort them and eliminate the duplicates.
How do we recode a set of strictly increasing (or strictly decreasing) positive integers P, to decrease the number of positive integers that can occur between the integers in our set?
Why would we want to do this: Say we want to randomly sample P but 1.) P is too large to enumerate, and 2.) members of P are related in a nonrandom way, but in a way that is too complicated to sample by. However, we know a member of P when we see it. Say we know P[0] and P[n] but can't entertain the idea of enumerating all of P or understanding precisely how members of P are related. Likewise, the number of all possible integers occurring between P[0] and P[n] are many times greater than the size of P, making the chance of randomly drawing a member of P very small.
Example: Let P[0] = 2101010101 & P[n] = 505050505. Now, maybe we're only interested in integers between P[0] and P[n] that have a specific quality (e.g. all integers in P[x] sum to Q or less, each member of P has 7 or less as the largest integer). So, not all positive integers P[n] <= X <= P[0] belong to P. The P I'm interested in is discussed in the comments below.
What I've tried: If P is a strictly decreasing set and we know P[0] and P[n], then we can treat each member as if it were subtracted from P[0]. Doing so decreases each number, perhaps greatly and maintains each member as a unique integer. For the P I'm interested in (below), one can treat each decreased value of P as being divided by a common denominator (9,11,99), which decreases the number of possible integers between members of P. I've found that used in conjunction, these approaches decrease the set of all P[0] <= X <= P[n] by a few orders of magnitude, making the chance of randomly drawing a member of P from all positive integers P[n] <= X <= P[0] still very small.
Note: As should be clear, we have to know something about P. If we don't, that basically means we have no clue of what we're looking for. When we randomly sample integers between P[0] and P[n] (recoded or not) we need to be able to say "Yup, that belongs to P.", if indeed it does.
A good answer could greatly increase the practical application of a computing algorithm I have developed. An example of the kind of P I'm interested in is given in comment 2. I am adamant about giving due credit.
While the original question is asking about a very generic scenario concerning integer encodings, I would suggest that it is unlikely that there exists an approach that works in complete generality. For example, if the P[i] are more or less random (from an information-theoretic standpoint), I would be surprised if anything should work.
So, instead, let us turn our attention to the OP's actual problem of generating partitions of an integer N containing exactly K parts. When encoding with combinatorial objects as integers, it behooves us to preserve as much of the combinatorial structure as possible.
For this, we turn to the classic text Combinatorial Algorithms by Nijenhuis and Wilf, specifically Chapter 13. In fact, in this chapter, they demonstrate a framework to enumerate and sample from a number of combinatorial families -- including partitions of N where the largest part is equal to K. Using the well-known duality between partitions with K parts and partitions where the largest part is K (take the transpose of the Ferrers diagram), we find that we only need to make a change to the decoding process.
Anyways, here's some source code:
import sys
import random
import time
if len(sys.argv) < 4 :
sys.stderr.write("Usage: {0} N K iter\n".format(sys.argv[0]))
sys.stderr.write("\tN = number to be partitioned\n")
sys.stderr.write("\tK = number of parts\n")
sys.stderr.write("\titer = number of iterations (if iter=0, enumerate all partitions)\n")
quit()
N = int(sys.argv[1])
K = int(sys.argv[2])
iters = int(sys.argv[3])
if (N < K) :
sys.stderr.write("Error: N<K ({0}<{1})\n".format(N,K))
quit()
# B[n][k] = number of partitions of n with largest part equal to k
B = [[0 for j in range(K+1)] for i in range(N+1)]
def calc_B(n,k) :
for j in xrange(1,k+1) :
for m in xrange(j, n+1) :
if j == 1 :
B[m][j] = 1
elif m - j > 0 :
B[m][j] = B[m-1][j-1] + B[m-j][j]
else :
B[m][j] = B[m-1][j-1]
def generate(n,k,r=None) :
path = []
append = path.append
# Invalid input
if n < k or n == 0 or k == 0:
return []
# Pick random number between 1 and B[n][k] if r is not specified
if r == None :
r = random.randrange(1,B[n][k]+1)
# Construct path from r
while r > 0 :
if n==1 and k== 1:
append('N')
r = 0 ### Finish loop
elif r <= B[n-k][k] and B[n-k][k] > 0 : # East/West Move
append('E')
n = n-k
else : # Northeast/Southwest move
append('N')
r -= B[n-k][k]
n = n-1
k = k-1
# Decode path into partition
partition = []
l = 0
d = 0
append = partition.append
for i in reversed(path) :
if i == 'N' :
if d > 0 : # apply East moves all at once
for j in xrange(l) :
partition[j] += d
d = 0 # reset East moves
append(1) # apply North move
l += 1
else :
d += 1 # accumulate East moves
if d > 0 : # apply any remaining East moves
for j in xrange(l) :
partition[j] += d
return partition
t = time.clock()
sys.stderr.write("Generating B table... ")
calc_B(N, K)
sys.stderr.write("Done ({0} seconds)\n".format(time.clock()-t))
bmax = B[N][K]
Bits = 0
sys.stderr.write("B[{0}][{1}]: {2}\t".format(N,K,bmax))
while bmax > 1 :
bmax //= 2
Bits += 1
sys.stderr.write("Bits: {0}\n".format(Bits))
if iters == 0 : # enumerate all partitions
for i in xrange(1,B[N][K]+1) :
print i,"\t",generate(N,K,i)
else : # generate random partitions
t=time.clock()
for i in xrange(1,iters+1) :
Q = generate(N,K)
print Q
if i%1000==0 :
sys.stderr.write("{0} written ({1:.3f} seconds)\r".format(i,time.clock()-t))
sys.stderr.write("{0} written ({1:.3f} seconds total) ({2:.3f} iterations per second)\n".format(i, time.clock()-t, float(i)/(time.clock()-t) if time.clock()-t else 0))
And here's some examples of the performance (on a MacBook Pro 8.3, 2GHz i7, 4 GB, Mac OSX 10.6.3, Python 2.6.1):
mhum$ python part.py 20 5 10
Generating B table... Done (6.7e-05 seconds)
B[20][5]: 84 Bits: 6
[7, 6, 5, 1, 1]
[6, 6, 5, 2, 1]
[5, 5, 4, 3, 3]
[7, 4, 3, 3, 3]
[7, 5, 5, 2, 1]
[8, 6, 4, 1, 1]
[5, 4, 4, 4, 3]
[6, 5, 4, 3, 2]
[8, 6, 4, 1, 1]
[10, 4, 2, 2, 2]
10 written (0.000 seconds total) (37174.721 iterations per second)
mhum$ python part.py 20 5 1000000 > /dev/null
Generating B table... Done (5.9e-05 seconds)
B[20][5]: 84 Bits: 6
100000 written (2.013 seconds total) (49665.478 iterations per second)
mhum$ python part.py 200 25 100000 > /dev/null
Generating B table... Done (0.002296 seconds)
B[200][25]: 147151784574 Bits: 37
100000 written (8.342 seconds total) (11987.843 iterations per second)
mhum$ python part.py 3000 200 100000 > /dev/null
Generating B table... Done (0.313318 seconds)
B[3000][200]: 3297770929953648704695235165404132029244952980206369173 Bits: 181
100000 written (59.448 seconds total) (1682.135 iterations per second)
mhum$ python part.py 5000 2000 100000 > /dev/null
Generating B table... Done (4.829086 seconds)
B[5000][2000]: 496025142797537184410324290349759736884515893324969819660 Bits: 188
100000 written (255.328 seconds total) (391.653 iterations per second)
mhum$ python part-final2.py 20 3 0
Generating B table... Done (0.0 seconds)
B[20][3]: 33 Bits: 5
1 [7, 7, 6]
2 [8, 6, 6]
3 [8, 7, 5]
4 [9, 6, 5]
5 [10, 5, 5]
6 [8, 8, 4]
7 [9, 7, 4]
8 [10, 6, 4]
9 [11, 5, 4]
10 [12, 4, 4]
11 [9, 8, 3]
12 [10, 7, 3]
13 [11, 6, 3]
14 [12, 5, 3]
15 [13, 4, 3]
16 [14, 3, 3]
17 [9, 9, 2]
18 [10, 8, 2]
19 [11, 7, 2]
20 [12, 6, 2]
21 [13, 5, 2]
22 [14, 4, 2]
23 [15, 3, 2]
24 [16, 2, 2]
25 [10, 9, 1]
26 [11, 8, 1]
27 [12, 7, 1]
28 [13, 6, 1]
29 [14, 5, 1]
30 [15, 4, 1]
31 [16, 3, 1]
32 [17, 2, 1]
33 [18, 1, 1]
I'll leave it to the OP to verify that this code indeed generates partitions according to the desired (uniform) distribution.
EDIT: Added an example of the enumeration functionality.
Below is a script that accomplishes what I've asked, as far as recoding integers that represent integer partitions of N with K parts. A better recoding method is needed for this approach to be practical for K > 4. This is definitely not a best or preferred approach. However, it's conceptually simple and easily argued as fundamentally unbiased. It's also very fast for small K. The script runs fine in Sage notebook and does not call Sage functions. It is NOT a script for random sampling. Random sampling per se is not the problem.
The method:
1.) Treat integer partitions as if their summands are concatenated together and padded with zeros according to size of largest summand in first lexical partition, e.g. [17,1,1,1] -> 17010101 & [5,5,5,5] -> 05050505
2.) Treat the resulting integers as if they are subtracted from the largest integer (i.e. the int representing the first lexical partition). e.g. 17010101 - 5050505 = 11959596
3.) Treat each resulting decreased integer as divided by a common denominator, e.g. 11959596/99 = 120804
So, if we wanted to choose a random partition we would:
1.) Choose a number between 0 and 120,804 (instead of a number between 5,050,505 and 17,010,101)
2.) Multiply the number by 99 and substract from 17010101
3.) Split the resulting integer according to how we treated each integer as being padded with 0's
Pro's and Con's: As stated in the body of the question, this particular recoding method doesn't do enough to greatly improve the chance of randomly selecting an integer representing a member of P. For small numbers of parts, e.g. K < 5 and substantially larger totals, e.g. N > 100, a function that implements this concept can be very fast because the approach avoids timely recursion (snake eating its tail) that slows other random partition functions or makes other functions impractical for dealing with large N.
At small K, the probability of drawing a member of P can be reasonable when considering how fast the rest of the process is. Coupled with quick random draws, decoding, and evaluation, this function can find uniform random partitions for combinations of N&K (e.g. N = 20000, K = 4) that are untennable with other algorithms. A better way to recode integers is greatly needed to make this a generally powerful approach.
import random
import sys
First, some generally useful and straightforward functions
def first_partition(N,K):
part = [N-K+1]
ones = [1]*(K-1)
part.extend(ones)
return part
def last_partition(N,K):
most_even = [int(floor(float(N)/float(K)))]*K
_remainder = int(N%K)
j = 0
while _remainder > 0:
most_even[j] += 1
_remainder -= 1
j += 1
return most_even
def first_part_nmax(N,K,Nmax):
part = [Nmax]
N -= Nmax
K -= 1
while N > 0:
Nmax = min(Nmax,N-K+1)
part.append(Nmax)
N -= Nmax
K -= 1
return part
#print first_partition(20,4)
#print last_partition(20,4)
#print first_part_nmax(20,4,12)
#sys.exit()
def portion(alist, indices):
return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]
def next_restricted_part(part,N,K): # *find next partition matching N&K w/out recursion
if part == last_partition(N,K):return first_partition(N,K)
for i in enumerate(reversed(part)):
if i[1] - part[-1] > 1:
if i[0] == (K-1):
return first_part_nmax(N,K,(i[1]-1))
else:
parts = portion(part,[K-i[0]-1]) # split p
h1 = parts[0]
h2 = parts[1]
next = first_part_nmax(sum(h2),len(h2),(h2[0]-1))
return h1+next
""" *I don't know a math software that has this function and Nijenhuis and Wilf (1978)
don't give it (i.e. NEXPAR is not restricted by K). Apparently, folks often get the
next restricted part using recursion, which is unnecessary """
def int_to_list(i): # convert an int to a list w/out padding with 0'
return [int(x) for x in str(i)]
def int_to_list_fill(i,fill):# convert an int to a list and pad with 0's
return [x for x in str(i).zfill(fill)]
def list_to_int(l):# convert a list to an integer
return "".join(str(x) for x in l)
def part_to_int(part,fill):# convert an int to a partition of K parts
# and pad with the respective number of 0's
p_list = []
for p in part:
if len(int_to_list(p)) != fill:
l = int_to_list_fill(p,fill)
p = list_to_int(l)
p_list.append(p)
_int = list_to_int(p_list)
return _int
def int_to_part(num,fill,K): # convert an int to a partition of K parts
# and pad with the respective number of 0's
# This function isn't called by the script, but I thought I'd include
# it anyway because it would be used to recover the respective partition
_list = int_to_list(num)
if len(_list) != fill*K:
ct = fill*K - len(_list)
while ct > 0:
_list.insert(0,0)
ct -= 1
new_list1 = []
new_list2 = []
for i in _list:
new_list1.append(i)
if len(new_list1) == fill:
new_list2.append(new_list1)
new_list1 = []
part = []
for i in new_list2:
j = int(list_to_int(i))
part.append(j)
return part
Finally, we get to the total N and number of parts K. The following will print partitions satisfying N&K in lexical order, with associated recoded integers
N = 20
K = 4
print '#, partition, coded, _diff, smaller_diff'
first_part = first_partition(N,K) # first lexical partition for N&K
fill = len(int_to_list(max(first_part)))
# pad with zeros to 1.) ensure a strictly decreasing relationship w/in P,
# 2.) keep track of (encode/decode) partition summand values
first_num = part_to_int(first_part,fill)
last_part = last_partition(N,K)
last_num = part_to_int(last_part,fill)
print '1',first_part,first_num,'',0,' ',0
part = list(first_part)
ct = 1
while ct < 10:
part = next_restricted_part(part,N,K)
_num = part_to_int(part,fill)
_diff = int(first_num) - int(_num)
smaller_diff = (_diff/99)
ct+=1
print ct, part, _num,'',_diff,' ',smaller_diff
OUTPUT:
ct, partition, coded, _diff, smaller_diff
1 [17, 1, 1, 1] 17010101 0 0
2 [16, 2, 1, 1] 16020101 990000 10000
3 [15, 3, 1, 1] 15030101 1980000 20000
4 [15, 2, 2, 1] 15020201 1989900 20100
5 [14, 4, 1, 1] 14040101 2970000 30000
6 [14, 3, 2, 1] 14030201 2979900 30100
7 [14, 2, 2, 2] 14020202 2989899 30201
8 [13, 5, 1, 1] 13050101 3960000 40000
9 [13, 4, 2, 1] 13040201 3969900 40100
10 [13, 3, 3, 1] 13030301 3979800 40200
In short, integers in the last column could be a lot smaller.
Why a random sampling strategy based on this idea is fundamentally unbiased:
Each integer partition of N having K parts corresponds to one and only one recoded integer. That is, we don't pick a number at random, decode it, and then try to rearrange the elements to form a proper partition of N&K. Consequently, each integer (whether corresponding to partitions of N&K or not) has the same chance of being drawn. The goal is to inherently reduce the number of integers not corresponding to partitions of N with K parts, and so, to make the process of random sampling faster.