Can AWK array be used to get largest cluster in correlation matrix? - matrix

I have a matrix that describes correlation between items A-K, where 1=correlated and 0=uncorrelated.
Is there an easy way to extract the largest cluster from the data? In other words, the cluster with the most correlated elements. Below is some sample data:
# A B C D E F G H I J K
A 1 1 1 1 1 1 1 1 1 1 1
B 1 1 1 1 1 1 1 1 1 1 1
C 1 1 1 1 1 1 1 1 1 1 1
D 1 1 1 1 0 1 0 0 0 1 1
E 1 1 1 0 1 0 0 1 1 0 0
F 1 1 1 1 0 1 0 0 1 0 1
G 1 1 1 0 0 0 1 0 0 0 0
H 1 1 1 0 1 0 0 1 1 1 1
I 1 1 1 0 1 1 0 1 1 0 0
J 1 1 1 1 0 0 0 1 0 1 0
K 1 1 1 1 0 1 0 1 0 0 1
Swapping a few columns/rows by eye, the expected result would be the top left of the matrix, which is a cluster of size 6 that contains: {A, B, C, D, F, K}
I know awk isn't the most user-friendly for this application, but I'm keen on using awk since this will integrate into a larger awk script. That being said, I'm not completely immovable on the language.
Not sure where to start but here's a more complex version of what I'm thinking in python:
https://stats.stackexchange.com/questions/138325/clustering-a-correlation-matrix

Assumptions:
all matrices are symmetric (ie, square; equal to its transpose; matrix[x,y]=matrix[y,x])
matrix[x,x]=1 for all x
all matrix entries are 0 or 1
not interested in 1-element clusters
not interested in permutations of the same cluster (ie, A,B is the same as B,A)
since we don't have to worry about permutations we can focus on processing elements in the order in which they show up in the matrix (eg, we process A,B,C and ignore the equivalents of A,C,B, B,A,C, B,C,A, C,A,B and C,B,A); this allows us to focus on processing just the top/right half of the matrix (above the identity/diagonal) and in order from left to right; this will greatly reduce the number of permutations we need to evaluate
as demonstrated in the question, elements that make up a cluster can be shifted up/left in the matrix so as to fill the top/left of the matrix with 1's (this comes into play during processing where for each new element we merely need to test the equivalent of the new column/row added to this top/left portion of the matrix)
Regarding the last assumption ... assume we have cluster A,D and we now want to test A,D,F; we just need to test the new column/row entries (?):
Current Cluster New Cluster ?
A D A D F
A 1 1 A 1 1 ? # if matrix is symmetric then only need to test
D 1 1 D 1 1 ? # the new column *OR* the new row, not both;
F ? ? 1 # bottom/right == 1 == matrix[F,F] per earlier assumption
One idea using a recursive function and two GNU awk's features: a) array of arrays (aka, multi-dimensional arrays) and b) PROCINFO["sorted_in"] for custom sorting of clusters to stdout:
awk '
######
# load matrix into memory
FNR==1 { n=(NF-1) # number of elements to be processed
for (i=2;i<=NF;i++)
label[i-1]=$i # save labels
next
}
{ for (i=2;i<=NF;i++)
m[FNR-1][i-1]=$i # populate matrix array m[row#][column#]
}
######
# define our recursive function
function find_cluster(cluster, i, clstrcount, stackseq, j, k, corrcount) {
# cluster : current working cluster (eg, "A,B,C")
# i : index of latest element (eg, for "A,B,C" => latest element is "C" so i = 3
# clstrcount : number of elements in current cluster
# stackseq : sequence number of stack[] array
# : stack[] contains list of indexes for current cluster (for "A,B,C" stack = "1,2,3")
# j,k,corrcount : declaring additional variables as "local" to this invocation of the function
clstrcount++ # number of elements to be processed at this call/level
for (j=i+1;j<=n;j++) { # process all elements/indexes greater than i
corrcount=1 # reset correlation count; always start with 1 since m[j][j]=1
# check the new column/row added to the top/left of the matrix to see if it extends the current cluster (ie, all entries are "1")
for (k in stack) { # loop through element/indexes in stack
if (m[stack[k]][j]) # check column entries
corrcount++
if (m[j][stack[k]]) # check row entries; not necessary if matrix is symmetric but we will add it here to show the m[][] references
corrcount++
}
if (corrcount == (stackseq*2 +1) ) { # if we have all "1"s we have a new cluster of size clstrcount
stack[++stackseq]=j # "push" current element/index on stack; increment stack seq/index
cluster=cluster "," label[j] # add current element/label to cluster
max= (clstrcount>max) ? clstrcount : max # new max(cluster count) ?
clusters[clstrcount][++clsterseq]=cluster # add new cluster to our master list: clusters[cluster_count][seq]
find_cluster(cluster, j, clstrcount, stackseq) # recursive call to check for next element(s)
delete stack[stackseq--] # back from recursive call so "pop" curent element (j) from stack
gsub(/[,][^,]+$/,"",cluster) # remove current element/label from cluster to make way for next element/label to be tested
}
}
}
######
# start looking for clusters of size 2+
END { max=2 # not interested in clusters of "1"
for (i=1;i<n;i++) { # loop through list of elements
clstrcount=1 # init cluster count = 1
clstrseq=0 # init clusters[...][seq] sequence seed
cluster=label[i] # reset cluster to current element/label
stackseq=1 # reset stack[seq] sequence seed
stack[stackseq]=i # "push" current element on stack
find_cluster(cluster, i, clstrcount, stackseq) # start recursive calls looking for next element in cluster
}
######
# for now just display clusters with size > 2; adjust this next line to add/remove cluster sizes from stdout
if (max>2) # print list of clusters with length > 2
for (i=max;i>2;i--) { # print from largest to smallest and ...
PROCINFO["sorted_in"]="#val_str_asc" # in alphabetical order
printf "####### clusters of size %s:\n", i
for (j in clusters[i]) # loop through all entries for clusters of size "i"
print clusters[i][j]
}
}
' matrix.dat
NOTE: The current version is (admittedly) a bit verbose ... the results of jotting down a first-pass solution as I was working through the details; with some further analysis it may be possible to reduce the code; having said that, the time it takes to find all 2+ sized clusters in this 11-element matrix isn't too bad:
real 0m0.084s
user 0m0.031s
sys 0m0.046s
This generates:
####### clusters of size 6:
A,B,C,D,F,K
A,B,C,E,H,I
####### clusters of size 5:
A,B,C,D,F
A,B,C,D,J
A,B,C,D,K
A,B,C,E,H
A,B,C,E,I
A,B,C,F,I
A,B,C,F,K
A,B,C,H,I
A,B,C,H,J
A,B,C,H,K
A,B,D,F,K
A,B,E,H,I
A,C,D,F,K
A,C,E,H,I
B,C,D,F,K
B,C,E,H,I
####### clusters of size 4:
A,B,C,D
A,B,C,E
A,B,C,F
A,B,C,G
A,B,C,H
A,B,C,I
A,B,C,J
A,B,C,K
A,B,D,F
A,B,D,J
A,B,D,K
A,B,E,H
A,B,E,I
A,B,F,I
A,B,F,K
A,B,H,I
A,B,H,J
A,B,H,K
A,C,D,F
A,C,D,J
A,C,D,K
A,C,E,H
A,C,E,I
A,C,F,I
A,C,F,K
A,C,H,I
A,C,H,J
A,C,H,K
A,D,F,K
A,E,H,I
B,C,D,F
B,C,D,J
B,C,D,K
B,C,E,H
B,C,E,I
B,C,F,I
B,C,F,K
B,C,H,I
B,C,H,J
B,C,H,K
B,D,F,K
B,E,H,I
C,D,F,K
C,E,H,I
####### clusters of size 3:
A,B,C
A,B,D
A,B,E
A,B,F
A,B,G
A,B,H
A,B,I
A,B,J
A,B,K
A,C,D
A,C,E
A,C,F
A,C,G
A,C,H
A,C,I
A,C,J
A,C,K
A,D,F
A,D,J
A,D,K
A,E,H
A,E,I
A,F,I
A,F,K
A,H,I
A,H,J
A,H,K
B,C,D
B,C,E
B,C,F
B,C,G
B,C,H
B,C,I
B,C,J
B,C,K
B,D,F
B,D,J
B,D,K
B,E,H
B,E,I
B,F,I
B,F,K
B,H,I
B,H,J
B,H,K
C,D,F
C,D,J
C,D,K
C,E,H
C,E,I
C,F,I
C,F,K
C,H,I
C,H,J
C,H,K
D,F,K
E,H,I

Related

How to find all sub rectangles using fastest algorithm?

An example , suppose we have a 2D array such as:
A= [
[1,0,0],
[1,0,0],
[0,1,1]
]
The task is to find all sub rectangles concluding only zeros. So the output of this algorithm should be:
[[0,1,0,2] , [0,1,1,1] , [0,2,1,2] , [0,1,1,2] ,[1,1,1,2], [2,0,2,0] ,
[0,1,0,1] , [0,2,0,2] , [1,1,1,1] , [1,2,1,2]]
Where i,j in [ i , j , a , b ] are coordinates of rectangle's starting point and a,b are coordinates of rectangle's ending point.
I found some algorithms for example Link1 and Link2 but I think first one is simplest algorithm and we want fastest.For the second one we see that the algorithm only calculates rectangles and not all sub rectangles.
Question:
Does anyone know better or fastest algorithm for this problem? My idea is to use dynamic programming but how to use isn't easy for me.
Assume an initial array of size c columns x r rows.
Every 0 is a rectangle of size 1x1.
Now perform an "horizontal dilation", i.e. replace every element by the maximum of itself and the one to its right, and drop the last element in the row. E.g.
1 0 0 1 0
1 0 0 -> 1 0
0 1 1 1 1
Every zero now corresponds to a 1x2 rectangle in the original array. You can repeat this c-1 times, until there is a single column left.
1 0 0 1 0 1
1 0 0 -> 1 0 -> 1
0 1 1 1 1 1
The zeroes correspond to a 1xc rectangles in the original array (initially c columns).
For every dilated array, perform a similar "vertical dilation".
1 0 0 1 0 1
1 0 0 -> 1 0 -> 1
0 1 1 1 1 1
| | |
V V V
1 0 0 1 0 1
1 1 1 -> 1 1 -> 1
| | |
V V V
1 1 1 -> 1 1 -> 1
In these rxc arrays, the zeroes correspond to the subrectangles of all possible sizes. (Here, 5 of size 1x1, 2 of size 2x1, 2 of size 1x2 and one of size 2x2.)
The total workload to detect the zeroes and compute the dilations is of order O(c²r²). I guess that this is worst-case optimal. (In case an array contains no zeroes, there is no need to continue any dilation.)

Distinct values of bitwise and of subarrays

How to find number of distinct values of bitwise and of all subarrays of an array.(Array size <=1e5 and array elements<=1e6).
for eg.
A[]={1,2,3}
distinct values are 4(1,2,3,0).
Let's fix the right boundary r of the subarray. Let's image the left boundary l moves to the left starting from r. How many times can the value of the and change? At most O(log(MAX_VALUE)). Why? When we add one more element to the left, we've got two options:
The and value of the subarray doesn't change.
It changes. In that case, the number of bits in it gets strictly less (as it's a submask of the previous and value).
Thus, we can consider only those values of l where something changes. Now we just need to find them quickly.
Let's iterate over the array from left to right and store the position of the last element that doesn't have the i-th bit for all valid i (we can update it by iterating over all bits of the current element). This way, we'll be able to find the next position where the value changes quickly (namely, it's the largest value in this array over all bits that are set). If we sort the positions, we can find the next largest one in O(1).
The total time complexity of this solution is O(N * log(MAX_VALUE) * log(log(MAX_VALUE))) (we iterate over all bits of each element of the array, we sort the array of positions for each them and iterate over it). The space complexity is O(N + MAX_VALUE). It should be good enough for the given contraints.
Imagine the numbers as columns representing their bits. We will have sequences of 1's extending horizontally. For example:
Array index: 0 1 2 3 4 5 6 7
Bit columns: 0 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1
0 0 1 1 1 1 1 0
1 0 0 0 1 1 0 1
0 1 1 1 1 1 1 0
Looking to the left, the bit-row for any subarray anded after a zero will continue being zero, which means no change after that in that row.
Let's take index 5 for example. Now sorting the horizontal sequences of 1's from index 5 to the left will provide us a simple way to detect a change in the bit configuration (the sorting would have to be done on each iteration):
Index 5 ->
Sorted bit rows: 1 0 0 0 1 1
0 0 0 1 1 1
0 0 1 1 1 1
0 1 1 1 1 1
0 1 1 1 1 1
Index 5 to 4, no change
Index 4 to 3, change
Index 2 to 1, change
Index 1 to 0, change
To easily examine these changes, kraskevich proposes recording only the last unset bit for each row as we go along, which would indicate the length of the horizontal sequence of 1's, and a boolean array (of 1e6 numbers max) to store the unique bit configurations encountered.
Numbers: 1, 2, 3
Bits: 1 0 1
0 1 1
As we move from left to right, keep a record of the index of the last unset bit in each row, and also keep a record of any new bit configuration (at most 1e6 of them):
Indexes of last unset bit for each row on each iteration
Numbers: 1, 2, 3
A[0]: -1 arrayHash = [false,true,false,false], count = 1
0
A[1]: -1 1 Now sort the column descending, representing (current - index)
0 0 the lengths of sequences of 1's extending to the left.
As we move from top to bottom on this column, each value change represents a bit
configuration and a possibly distinct count:
Record present bit configuration b10
=> arrayHash = [false,true,true,false]
1 => 1 - 1 => sequence length 0, ignore sequence length 0
0 => 1 - 0 => sequence length 1,
unset second bit: b10 => b00
=> new bit configuration b00
=> arrayHash = [true,true,true,false]
Third iteration:
Numbers: 1, 2, 3
A[2]: -1 1 1
0 0 0
Record present bit configuration b11
=> arrayHash = [true,true,true,true]
(We continue since we don't necessarily know the arrayHash has filled.)
1 => 2 - 1 => sequence length 1
unset first bit: b11 => b10
=> seen bit configuration b10
0 => 2 - 0 => sequence length 2,
unset second bit: b10 => b00
=> seen bit configuration b00

Gnuplot: how to draw polygon/contour from its vertices

My data.txt file contains the 2D coordinates of points forming a segment of a polygon. These coordinates are evolving over time. The file is structured like this:
itr nbr_pts p1.x p1.y ...... pk.x pk.y
(itr+1) ..........
.....
where pk is the k-th point/vertex of the polygon and nb_pts is the number vertices.
My question is how to draw the 2D polygon from its vertices (p1, p2, ...pk) at a certain iteration (row)?
In addition, note that there is not only one data file/polygon but N ones: data1.txt .... dataN.txt
I tried something like this but did not work (Nbr of files =6)
N = 6
set multiplot
plot for [i=0:N-1] polygon_i = sprintf("%s/data%d.dat",filename, i) polygon_i val=$2 for [j=1:$2] u (j+1):(j+1+1) w lines
I know how many polygones/files there is (6 in this cae), but I have no prior knowledge on the number of columns in each file; the number of vertices can vary from a polygone to another.
Any idea please?
The idea I have would need a modification in the structure of your files. For each iteration time, there is a block containing the x and y coordinates of the polygon's vertices:
# file: data1.txt
# itr 0
0 0
1 1
1 2
0 0
# itr 1
1 3
2 1
0 1
1 2
1 3
# itr 2
3 1
2 1
0 0
3 1
Notice that each block is separated by two empty lines. For iteration 0 (block 0 or itr 0) there is a polygon with three vertices, itr 1 has four vertices, and itr 2 has three vertices. To obtain a closed curve, it is needed to specify the end point, for example, for itr 1 I put the point 1 3 twice.
For this file, we can plot the polygon at iteration iter as
iter = 1 # select block 1, or itr 1
plot "data1.txt" index iter w lp ps 2 pt 7
If you have several files, then try
# option 1
nbr = 6 # number of files
iter = 1 # select block 1, or itr 1
plot for [i=1:nbr] "data".i.".txt" index iter w lp ps 2 pt 7 title "".i
#option 2
files = system("ls data*.txt") # get all datafiles in folder
iter = 1 # select block 1, or itr 1
plot for [data in files] data index iter w lp ps 2 pt 7 title data

How to Shuffle an Array with Fixed Row/Column Sum?

I need to assign random papers to students of a class, but I have the constraints that:
Each student should have two papers assigned.
Each paper should be assigned to (approximately) the same number of students.
Is there an elegant way to generate a matrix that has this property? i.e. it is shuffled but the row and column sums are constant? As an illustration:
Student A 1 0 0 1 1 0 | 3
Student B 1 0 1 0 0 1 | 3
Student C 0 1 1 0 1 0 | 3
Student D 0 1 0 1 0 1 | 3
----------------
2 2 2 2 2 2
I thought of first building an "initial matrix" with the right row/column sum, then randomly permuting first the rows, then the colums, but how do I generate this initial matrix? The problem here is that I'd be choosing between (e.g.) the following alternatives, and the fact that there are two students with the same pair of papers assigned (in the left setup) won't change through row/column shuffling:
INITIAL (MA): OR (MB):
A 1 1 1 0 0 0 || 1 1 1 0 0 0
B 1 1 1 0 0 0 || 0 1 1 1 0 0
C 0 0 0 1 1 1 || 0 0 0 1 1 1
D 0 0 0 1 1 1 || 1 0 0 0 1 1
I know I could come up with something quick/dirty and just tweak where necessary but it seemed like a fun exercise.
If you want to make permutations, what about:
Chose randomly a student, say student 1
For this student, chose a random paper he has, say paper A
Chose randomly another student
For this student, chose a random paper he has, say paper B (different from A)
Give paper B to student 1 and paper A to student 2.
That way, you preserve both the number of different papers and the number of papers per student. Indeed, both students give one paper and receive one back. Moreover, no paper is created nor deleted.
In term of table, it means finding two pairs of indices(i1,i2) and (j1,j2) such that A(i1,j1) = 1, A(i2,j2)=1, A(i1,j2)=0 and A(i2,j1)=0 and changing the 0s for 1s and the 1s for 0s => The sums of the rows and columns do not change.
Remark 1: If you do not want to proceed by permutations, you can simply put in a vector all the paper (put 2 times paper A, 2 times paper B,...). Then, random shuffle the vector and attribute the k first to the first student, the k next ones to student 2, ... However, you can end with a student having several times the same paper. In this case, make some permutations starting with the surnumerary papers.
You can generate the initial matrix as follows (pseudo-Python syntax):
column_sum = [0] * n_students
for i in range(n_students):
if column_sum[i] < max_allowed:
for j in range(i + 1, n_students):
if column_sum[j] < max_allowed:
generate_row_with_ones_at(i, j)
column_sum[i] += 1
column_sum[j] += 1
if n_rows == n_wanted:
return
This is a straightforward iteration over all n choose 2 distinct rows, but with the constraint on column sums enforced as early as possible.

Print all ways to sum n integers so that they total a given sum.

I'm trying to come up with an algorithm that will print out all possible ways to sum N integers so that they total a given value.
Example. Print all ways to sum 4 integers so that they sum up to be 5.
Result should be something like:
5 0 0 0
4 1 0 0
3 2 0 0
3 1 1 0
2 3 0 0
2 2 1 0
2 1 2 0
2 1 1 1
1 4 0 0
1 3 1 0
1 2 2 0
1 2 1 1
1 1 3 0
1 1 2 1
1 1 1 2
This is based off Alinium's code.
I modified it so it prints out all the possible combinations, since his already does all the permutations.
Also, I don't think you need the for loop when n=1, because in that case, only one number should cause the sum to equal value.
Various other modifications to get boundary cases to work.
def sum(n, value):
arr = [0]*n # create an array of size n, filled with zeroes
sumRecursive(n, value, 0, n, arr);
def sumRecursive(n, value, sumSoFar, topLevel, arr):
if n == 1:
if sumSoFar <= value:
#Make sure it's in ascending order (or only level)
if topLevel == 1 or (value - sumSoFar >= arr[-2]):
arr[(-1)] = value - sumSoFar #put it in the n_th last index of arr
print arr
elif n > 0:
#Make sure it's in ascending order
start = 0
if (n != topLevel):
start = arr[(-1*n)-1] #the value before this element
for i in range(start, value+1): # i = start...value
arr[(-1*n)] = i # put i in the n_th last index of arr
sumRecursive(n-1, value, sumSoFar + i, topLevel, arr)
Runing sums(4, 5) returns:
[0, 0, 0, 5]
[0, 0, 1, 4]
[0, 0, 2, 3]
[0, 1, 1, 3]
[1, 1, 1, 2]
In pure math, a way of summing integers to get a given total is called a partition. There is a lot of information around if you google for "integer partition". You are looking for integer partitions where there are a specific number of elements. I'm sure you could take one of the known generating mechanisms and adapt for this extra condition. Wikipedia has a good overview of the topic Partition_(number_theory). Mathematica even has a function to do what you want: IntegerPartitions[5, 4].
The key to solving the problem is recursion. Here's a working implementation in python. It prints out all possible permutations that sum up to the total. You'll probably want to get rid of the duplicate combinations, possibly by using some Set or hashing mechanism to filter them out.
def sum(n, value):
arr = [0]*n # create an array of size n, filled with zeroes
sumRecursive(n, value, 0, n, arr);
def sumRecursive(n, value, sumSoFar, topLevel, arr):
if n == 1:
if sumSoFar > value:
return False
else:
for i in range(value+1): # i = 0...value
if (sumSoFar + i) == value:
arr[(-1*n)] = i # put i in the n_th last index of arr
print arr;
return True
else:
for i in range(value+1): # i = 0...value
arr[(-1*n)] = i # put i in the n_th last index of arr
if sumRecursive(n-1, value, sumSoFar + i, topLevel, arr):
if (n == topLevel):
print "\n"
With some extra effort, this can probably be simplified to get rid of some of the parameters I am passing to the recursive function. As suggested by redcayuga's pseudo code, using a stack, instead of manually managing an array, would be a better idea too.
I haven't tested this:
procedure allSum (int tot, int n, int desiredTotal) return int
if n > 0
int i =
for (int i = tot; i>=0; i--) {
push i onto stack;
allSum(tot-i, n-1, desiredTotal);
pop top of stack
}
else if n==0
if stack sums to desiredTotal then print the stack end if
end if
I'm sure there's a better way to do this.
i've find a ruby way with domain specification based on Alinium's code
class Domain_partition
attr_reader :results,
:domain,
:sum,
:size
def initialize(_dom, _size, _sum)
_dom.is_a?(Array) ? #domain=_dom.sort : #domain= _dom.to_a
#results, #sum, #size = [], _sum, _size
arr = [0]*size # create an array of size n, filled with zeroes
sumRecursive(size, 0, arr)
end
def sumRecursive(n, sumSoFar, arr)
if n == 1
#Make sure it's in ascending order (or only level)
if sum - sumSoFar >= arr[-2] and #domain.include?(sum - sumSoFar)
final_arr=Array.new(arr)
final_arr[(-1)] = sum - sumSoFar #put it in the n_th last index of arr
#results<<final_arr
end
elsif n > 1
#********* dom_selector ********
n != size ? start = arr[(-1*n)-1] : start = domain[0]
dom_bounds=(start*(n-1)..domain.last*(n-1))
restricted_dom=domain.select do |x|
if x < start
false; next
end
if size-n > 0
if dom_bounds.cover? sum-(arr.first(size-n).inject(:+)+x) then true
else false end
else
dom_bounds.cover?(sum+x) ? true : false
end
end # ***************************
for i in restricted_dom
_arr=Array.new(arr)
_arr[(-1*n)] = i
sumRecursive(n-1, sumSoFar + i, _arr)
end
end
end
end
a=Domain_partition.new (-6..6),10,0
p a
b=Domain_partition.new [-4,-2,-1,1,2,3],10,0
p b
If you're interested in generating (lexically) ordered integer partitions, i.e. unique unordered sets of S positive integers (no 0's) that sum to N, then try the following. (unordered simply means that [1,2,1] and [1,1,2] are the same partition)
The problem doesn't need recursion and is quickly handled because the concept of finding the next lexical restricted partition is actually very simple...
In concept: Starting from the last addend (integer), find the first instance where the difference between two addends is greater than 1. Split the partition in two at that point. Remove 1 from the higher integer (which will be the last integer in one part) and add 1 to the lower integer (the first integer of the latter part). Then find the first lexically ordered partition for the latter part having the new largest integer as the maximum addend value. I use Sage to find the first lexical partition because it's lightening fast, but it's easily done without it. Finally, join the two portions and voila! You have the next lexical partition of N having S parts.
e.g. [6,5,3,2,2] -> [6,5],[3,2,2] -> [6,4],[4,2,2] -> [6,4],[4,3,1] -> [6,4,4,3,1]
So, in Python and calling Sage for the minor task of finding the first lexical partition given n and s parts...
from sage.all import *
def most_even_partition(n,s): # The main function will need to recognize the most even partition possible (i.e. last lexical partition) so it can loop back to the first lexical partition if need be
most_even = [int(floor(float(n)/float(s)))]*s
_remainder = int(n%s)
j = 0
while _remainder > 0:
most_even[j] += 1
_remainder -= 1
j += 1
return most_even
def portion(alist, indices):
return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]
def next_restricted_part(p,n,s):
if p == most_even_partition(n,s):return Partitions(n,length=s).first()
for i in enumerate(reversed(p)):
if i[1] - p[-1] > 1:
if i[0] == (s-1):
return Partitions(n,length=s,max_part=(i[1]-1)).first()
else:
parts = portion(p,[s-i[0]-1]) # split p (soup?)
h1 = parts[0]
h2 = parts[1]
next = list(Partitions(sum(h2),length=len(h2),max_part=(h2[0]-1)).first())
return h1+next
If you want zeros (not actual integer partitions), then the functions only need small modifications.
Try this code. I hope it is easier to understand. I tested it, it generate correct sequence.
void partition(int n, int m = 0)
{
int i;
// if the partition is done
if(n == 0){
// Output the result
for(i = 0; i < m; ++i)
printf("%d ", list[i]);
printf("\n");
return;
}
// Do the split from large to small int
for(i = n; i > 0; --i){
// if the number not partitioned or
// willbe partitioned no larger than
// previous partition number
if(m == 0 || i <= list[m - 1]){
// store the partition int
list[m] = i;
// partition the rest
partition(n - i, m + 1);
}
}
}
Ask for clarification, if required.
The is One of the output
6
5 1
4 2
4 1 1
3 3
3 2 1
3 1 1 1
2 2 2
2 2 1 1
2 1 1 1 1
1 1 1 1 1 1
10
9 1
8 2
8 1 1
7 3
7 2 1
7 1 1 1
6 4
6 3 1
6 2 2
6 2 1 1
6 1 1 1 1
5 5
5 4 1
5 3 2
5 3 1 1
5 2 2 1
5 2 1 1 1
5 1 1 1 1 1
4 4 2
4 4 1 1
4 3 3
4 3 2 1
4 3 1 1 1
4 2 2 2
4 2 2 1 1
4 2 1 1 1 1
4 1 1 1 1 1 1
3 3 3 1
3 3 2 2
3 3 2 1 1
3 3 1 1 1 1
3 2 2 2 1
3 2 2 1 1 1
3 2 1 1 1 1 1
3 1 1 1 1 1 1 1
2 2 2 2 2
2 2 2 2 1 1
2 2 2 1 1 1 1
2 2 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1

Resources