Randomly sampling integer partitions (without restriction on number of parts) - random

I have an integer N and I wish to generate one of its possible partitions uniformly at random. For example, N=5 has 7 partitions:
(5) - K=1 part
(4, 1) - K=2 parts
(3, 2) - K=2 parts
(3, 1, 1) - K=3 parts
(2, 2, 1) - K=3 parts
(2, 1, 1, 1) - K=4 parts
(1, 1, 1, 1, 1) - K=5 parts
I want an algorithm that can output each one of these with probability 1/7.
Algorithms for generating all such partitions, or all partitions restricted to K parts, are easy to find.
However, what I'm looking for is not restricting K a priori. I cannot pick K uniformly at random, as the K are not uniformly distributed and the distribution is non-trivial. If I knew the precise distribution of part sizes beforehand I could sample K using it, then use one of the existing algorithms, but I could not find a way to do this. A numerical survey shows that the vast majority of partitions have small K.
I cannot generate the list of partitions beforehand, as for N=100 there are hundreds of millions of partitions already. But even for N=1000, which is on the range I need, each individual partition will be mostly a short list of small numbers.
Does such an algorithm exist? I could not find it and I've been searching it for days.

Related

Effective clustering algorithm

I need help (preferably a full algorithm, but any hint or reference will be appreciated) with the following algorithmic problem:
We have a set of N elements. I can define a distance between any two elements, which satisfies the metric conditions. I need to group these elements into disjoint subsets (each element belonging to exactly one subset) according to the following rules:
The maximum distance between any two elements in each subset does not exceed specified threshold.
The number of the subsets is as small as possible.
If there is more than one possible grouping satisfying conditions (1) and (2), the maximum distance between any two elements in each subset should be as small as possible.
Example:
Assume we have the following points on a number axis: 1, 11, 12, 13, 23. The distance is simple the the difference between the points. Our distance threshold is 10. The two possible grouping satisfying conditions (1) and (2) are: (1, 11), (12), (13, 23) or (1), (11, 12, 13), (23). However, the condition (3) says that the latter grouping is the correct one.
In 1 dimensional data, sort your data, and divide into the desired number of bins, then move bin boundaries to optimize.
It gets more interesting in higher dimensionality. There, the problem will be NP hard. So finding the optimum will be expensive. You can indeed use clustering here: use complete-linkage clustering. For a O(n²) and O(n) memory approach, try CLINK. But in my experience, you will need to run this algorithm several times, on shuffled data, to get a good solution.

Maximum number divisible by another one created from sum of previous numbers

There are given numbers 1, 2, ...., b-1. Every number of these can be used a[1], a[2], ...., a[b-1] times.
From them the biggest possible number (from data given) has to be concatenated, while sum of its digits (partial numbers) has to be divisible by b. "Digits" of this number can be of any base bigger than 2.
So basically the biggest number of base b has to be created, by concatenating numbers 1...b-1, up to a[1]...a[b-1] times each, while sum of all used partial numbers/digits has to be divisible by b.
For example:
There is 5 times 1, 10 times 2, 4 times 3 and 2 times 4. As stated above, they have to concatenate the biggest number divisible by b (here 5).
They would give:
44333322222222221111.
Concatenating from the biggest to the lowest gives needed number, as their sum is divisible by 5.
For 1 times 1 it is:
0
Because 1 is not divisible by 2, so no numbers should be used then.
What are the algorithms or similar problems to this? How can it be approached?
At first, we can simply arrange numbers from the biggest to the lowest, so the concatenated number will naturally be the biggest. Then, we have to take the least amount of these numbers, so their sum will be divisible by b. When there can be taken different combinations of these numbers but in the same amounts, the one that has its biggest number the smallest among others should be choosen (or the second biggest and so on).
For example:
If combinations of (3, 3, 2) and (4, 2, 2) can be taken, then the first one should be cut out from the number.
This really looks like change-making problem, but with finite amount of coins of different denominations and at the end, we have to have the combination, not only the minimal amount of coins. In addition, with dynamic approach, 2 different combinations of same length (like 332 and 442 above) can't be rather easily chosen in the middle of dynamic array, as in next steps they can give quite much different values.

Speeding up LCS algorithm for graph construction

Referencing 2nd question from INOI 2011:
N people live in Sequence Land. Instead of a name, each person is identified by a sequence of integers, called his or her id. Each id is a sequence with no duplicate elements. Two people are said to be each other’s relatives if their ids have at least K elements in common. The extended family of a resident of Sequence Land includes herself or himself, all relatives, relatives of relatives, relatives of relatives of relatives, and so on without any limit.
Given the ids of all residents of Sequence Land, including its President, and the number K, find the number of people in the extended family of the President of Sequence Land.
For example, suppose N = 4 and K = 2. Suppose the President has id (4, 6, 7, 8) and the other three residents have ids (8, 3, 0, 4), (0, 10), and (1, 2, 3, 0, 5, 8). Here, the President is directly related to (8, 3, 0, 4), who in turn is directly related to (1, 2, 3, 0, 5, 8). Thus, the President’s extended family consists of everyone other than (0, 10) and so has size 3.
Limits: 1 <= n <= 300 & 1 <= K <= 300. Number of elements per id: 1-300
Currently, my solution is as follows:
For every person, compare his id to all other id's using an algorithm same as LCS, it can be edited to stop searching if k elements aren't there etc. etc. to improve it's average case performance. Time complexity = O(n^2*k^2)
Construct adjacency list using previous step result.
Use BFS. Output results
But the overall time complexity of this algorithm is not good enough for the second subtask. I googled around a little bit, and found most solutions to be similar to that of mine, and not working for larger subtask. The only thing close to a good solution was this one -> Yes, this question has been asked previously. The reason I'm asking essentially the same question again is that that solution is really tough to work with and implement. Recently, a friend of mine told me about a much better solution he read somewhere.
Can someone help me create a better solution ?
Even pointers to better solution would be great.

Put k stones on a n * m grid, to maximize the number of rectangles for which there is one stone in each corner

Put k stones on a n * m grid, each stone should be on an intersection point of the lines of grid.Try to find a way to put the stones, in order to maximize the number of rectangles for which there is one stone in each corner. Output the number.
for example, if k <= 3, the answer is 0, meaning no such rectangle; if k is 4 and n, m >= 2, the answer is 1; more examples:
(n, m, k, answer): (3, 3, 8, 5), (4, 5, 13, 18), (7, 14, 86, 1398)
k is between 0 and n * m.
n, m are positive integers less than 30000
ps:This is actually a problem in Microsoft-Beauty-of-Programming qualification round(But you may not be able to find it since it is held in China and I translate it to English myself.)
pss:I have made some progress. It can be proved that to get the answer, searching through all possible Ferrers diagrams is enough, but the complexity is exponential with regard to k.
EDIT: (by Dukeling)
A visualization of (3, 3, 8, 5), with the rectangles indicated in different colours.
As you noticed, it's actually a (n-1) * (m-1) grid, there's another interpretation using a n * m grid where the stones are placed inside the cells, but then you'll need to add an additional constraint that rectangles can't be width / height 1.
From a programming perspective this suggests a search that takes advantage of the symmetries of the rectangle to reduce search space. What follows is something of an extended hint.
As the OP points out, a naive implementation would check all possible k-subsets of nodes, a size of C(nm,k).
The amount of search space reduction depends on how symmetries are exploited. The square has symmetries of reflection and rotation, so for n=m there's an 8-fold symmetry group. If say n < m, then the lesser amount of symmetry gives a 4-fold group.
A typical approach is to organize the possible k-subsets by a lexicographic ordering, so that a potential configuration is skipped when it's equivalent to one of earlier appearance in that ordering.
But there are additional "wrap-around" symmetries to be exploited. Suppose the last row of the grid is moved to the top (along with any assignment of stones to its nodes). This transformation preserves the count of the 4-stone rectangles (though the exact sizes of those rectangles will differ).
Indeed transposing two rows or two columns preserves the counts of 4-stone rectangles. Once you have that insight, can you see how to parameterize the search space more efficiently?
Added: Even though it's more of a math insight than programming, consider the number of 4-stone rectangles provided by a "full subrectangle", say r x c if rc < k. Consider the incremental number of extra rectangles provided by one more stone; by two more stones.

Divide list into two equal parts algorithm

Related questions:
Algorithm to Divide a list of numbers into 2 equal sum lists
divide list in two parts that their sum closest to each other
Let's assume I have a list, which contains exactly 2k elements. Now, I'm willing to split it into two parts, where each part has a length of k while trying to make the sum of the parts as equal as possible.
Quick example:
[3, 4, 4, 1, 2, 1] might be splitted to [1, 4, 3] and [1, 2, 4] and the sum difference will be 1
Now - if the parts can have arbitrary lengths, this is a variation of the Partition problem and we know that's it's weakly NP-Complete.
But does the restriction about splitting the list into equal parts (let's say it's always k and 2k) make this problem solvable in polynomial time? Any proofs to that (or a proof scheme for the fact that it's still NP)?
It is still NP complete. Proof by reduction of PP (your full variation of the Partition problem) to QPP (equal parts partition problem):
Take an arbitrary list of length k plus additional k elements all valued as zero.
We need to find the best performing partition in terms of PP. Let us find one using an algorithm for QPP and forget about all the additional k zero elements. Shifting zeroes around cannot affect this or any competing partition, so this is still one of the best performing unrestricted partitions of the arbitrary list of length k.

Resources