Dynamic programming - Tree recursion with Memoization - go

For the problem:
Consider an insect in an M by N grid. The insect starts at the bottom left corner, (0, 0), and wants to end up at the top right corner, (M-1, N-1). The insect is only capable of moving right or up. Write a function paths that takes a grid length and width and returns the number of different paths the insect can take from the start to the goal.
For example, the 2 by 2 grid has a total of two ways for the insect to move from the start to the goal. For the 3 by 3 grid, the insect has 6 diferent paths (only 3 are shown above).
Below is the recursive solution:
package main
import "fmt"
func paths(m, n int) int {
var traverse func(int, int) int
traverse = func(x, y int) int {
if x >= m || y >= n {
return 0
}
if x == m-1 && y == n-1 {
return 1
}
return traverse(x+1, y) + traverse(x, y+1)
}
return traverse(0, 0)
}
func main() {
fmt.Println(paths(1, 157))
}
As N increases, below is the effect:
fmt.Println(paths(1, 1)) // 1
fmt.Println(paths(2, 2)) // 2
fmt.Println(paths(3, 3)) // 6
fmt.Println(paths(4, 4)) // 20
fmt.Println(paths(5, 5)) // 70
fmt.Println(paths(6, 6)) // 252
fmt.Println(paths(7, 7)) // 924
Memoization can be applied in fibonacci problem, to reuse previous computations, using tree recursion
Does it make sense to memoiz this path problem? to reuse previous computations
(Note: this problem is meant to apply the idea of tree recursion, as mentioned here.)

The problem is well known, and has a solution of (M+N-2 choose M-1) or equivalently (M+N-2 choose N-1). Does it make sense to use memoization which would take O(NM) time? Not really, since binomial coeffecients can be computed in O(min(M,N)) time (arithmetic operations) with relatively simple code.
For example (playground link):
package main
import (
"fmt"
"math/big"
)
func paths(n, m int) *big.Int {
return big.NewInt(0).Binomial(int64(n+m-2), int64(m-1))
}
func main() {
for i := 1; i < 10; i++ {
fmt.Println(i, paths(i, i))
}
}

Yes, it's possible to use memoization. A key observation to make is consider a grid (1, 1) in a 3x3 grid:
(2, 0) (2, 1) (2, 2)
(1, 0) (1, 1) (1, 2)
(0, 0) (0, 1) (0, 2)
The number of paths in grid (1, 1) is equal to the number of paths from (1, 0) plus the number of paths from (0, 1) since there are only two possible ways one can arrive at path (1, 1).
Generalizing:
npath(x, y) = npath(x-1, y) + npath(x, y-1)
where npath(x, y) = the number of possible paths to visit grid (x, y)
Thus, if you build your recursion backwards you can apply memoization. By backwards I mean you have to start from the smaller case where npath(_, 0) = 1 and npath(0, _) = 1. underscore means any value.
A simple run of this algorithm leads to these number of paths in a 3x3 grid:
1 3 6
1 2 3
1 1 1
In fact, you can just do a double nested loop instead of a recursion as an optimization.

Related

Given an array of ints and a number n, calculate the number of ways to sum to n using the ints

I saw this problem in my interview preparation.
Given an array of ints and a number n, calculate the number of ways to
sum to n using the ints
Following code is my solution. I tried to solve this by recursion. Subproblem is for each int in the array, we can either pick it or not.
public static int count(List<Integer> list, int n) {
System.out.print(list.size() + ", " + n);
System.out.println();
if (n < 0 || list.size() == 0)
return 0;
if (list.get(0) == n)
return 1;
int e = list.remove(0);
return count(list, n) + count(list, n - e);
}
I tried to use [10, 1, 2, 7, 6, 1, 5] for ints, and set n to 8. The result should be 4. However, I got 0. I tried to print what I have on each layer of stack to debug as showed in the code. Following is what I have:
7, 8
6, 8
5, 8
4, 8
3, 8
2, 8
1, 8
0, 8
0, 3
0, 7
0, 2
0, 1
0, 6
0, 7
0, -2
This result confuses me. I think it looks right from beginning to (0, 3). Starting from (0, 7), it looks wrong to me. I expect (1, 7) there. Because if I understand correctly, this is for count(list, n - e) call on second to the bottom layer on the stack. The list operation on the lower layer shouldn't impact the list on the current layer.
So my questions are:
why is it (0, 7) instead of (1, 7) based on my current code?
what adjustment should I do to my current code to get the correct result?
Thanks!
The reason why your algorithm is not working is because you are using one list that is being modified before the recursive calls.
Since the list is passed by reference, what ends up happening is that you recursively call remove until there is nothing in the list any more and then all of your recursive calls are going to return 0
What you could do is create two copies of the list on every recursive step. However, this would be way too inefficient.
A better way would be to use an index i that marks the element in the list that is being looked at during the call:
public static int count(List<Integer> list, int n, int i) {
//System.out.print(list.size() + ", " + n);
//System.out.println();
if (n < 0 || i <= 0)
return 0;
int e = list.get(i); // e is the i-th element in the list
if (e == n)
return 1 + count(list, n, i-1); // Return 1 + check for more possibilities without picking e
return count(list, n, i-1) + count(list, n - e, i-1); // Result if e is not picked + result if e is picked
}
You would then pass yourList.size() - 1 for i on the initial function call.
One more point is that when you return 1, you still have to add the number of possibilities for when your element e is not picked to be part of a sum. Otherwise, if - for example - your last element in the list was n, the recursion would end on the first step only returning 1 and not checking for more possible number combinations.
Finally, you might want to rewrite the algorithm using a dynamic approach, since that would give you a way better running time.

Order (a,b) pairs by result of a*b

I would like to find the highest value m = a*b that satisfies some condition C(m), where
1 <= a <= b <= 1,000,000.
In order to do that, I'd like to iterate all pairs of a,b in decreasing order of a*b.
For example, for values up to 5, the order would be:
5 x 5 = 25
4 x 5 = 20
4 x 4 = 16
3 x 5 = 15
3 x 4 = 12
2 x 5 = 10
3 x 3 = 9
2 x 4 = 8
2 x 3 = 6
1 x 5 = 5
1 x 4 = 4
2 x 2 = 4
1 x 3 = 3
1 x 2 = 2
1 x 1 = 1
So far I've come up with a BFS-like tree search, where I generate candidates from the current "visited" set and pick the highest value candidate, but it's a tangled mess, and I'm not sure about correctness. I wonder if there's some sort of trick I'm missing.
I'm also interested in the more general case of ordering by any monotonic function f(a,b), if such a thing exists.
For illustration, C(m) could be "return true if m2+m+41 is prime, otherwise return false", but I'm really looking for a general approach.
Provided that C(m) is so magical that you cannot use any better technique to find your solution directly and thus you really need to traverse all a*b in decreasing order, this is what I would do:
Initialize a max-heap with all pairs (a, b) such that a = b. This means that the heap contains (0, 0), (1, 1), ... , (1.000.000, 1.000.000). The heap should be based on the a * b value.
Now continuously:
Get the max pair (a, b) from the heap.
Verify if (a, b) satisfies C(a * b). If so, you are done.
Otherwise, add (a, b-1) to the heap (provided b > 0, otherwise do nothing).
This is a very simple O(n log n) time and O(n) space algorithm, provided that you find the answer quickly (in a few iterations). This of course depends on C.
If you run into space problems you can of course easily decrease the space complexity by splitting up the problem in a number of subproblems, for instance 2:
Add only (500.000, 500.000), (500.001, 500.001), ... , (1.000.000, 1.000.000) to the heap and find your best pair (a, b).
Do the same for (0, 0), (1, 1), ... (499.999, 499.999).
Take the best of the two solutions.
Here's a not particularly efficient way to do this with a heap in Python. This is probably the same thing as the BFS you mentioned, but it's fairly clean. (If someone comes up with a direct algorithm, that would of course be better.)
import heapq # <-this module's API is gross. why no PriorityQueue class?
def pairs_by_reverse_prod(n):
# put n things in heap, since of course i*j > i*(j-1); only do i <= j
# first entry is negative of product, since this is a min heap
to_do = [(-i * n, i, n) for i in xrange(1, n+1)]
heapq.heapify(to_do)
while to_do:
# first elt of heap has the highest product
_, i, j = to_do[0]
yield i, j
# remove it from the heap, replacing if we want to replace
if j > i:
heapq.heapreplace(to_do, (-i * (j-1), i, j-1))
else:
heapq.heappop(to_do)
Below code will generate (and print):
[(5, 5), (4, 5), (4, 4), (3, 5), (3, 4), (2, 5), (3, 3), (2, 4), (2, 3), (1, 5), (1, 4), (2, 2), (1, 3), (1, 2), (1, 1)]
which is basically what you want, since the code can break early if your condition is satisfied. I think the whole point of this question is NOT to generate all possible combinations of (a, b).
The key point of the algorithm is that in each iteration, we need to consider (a - 1, b) and (a, b - 1). If a == b, however, since a <= b, we only need to consider (a - 1, b). The rest is about maintaining order in the queue of tuples, Q, based on their product, m.
In terms of efficiency, when inserting into Q, the code performs linear search from index 0. Performing binary search instead of this linear search may or may not make things faster for larger values of a and b.
Also to further optimize the code, we can store m alongside (a, b) in Q so that we do not have to calculate a * b many times. Also using the 1D bucket structure with m as the key to implement Q would be interesting.
#!/usr/bin/python
def insert_into_Q((a, b), Q):
if (a == 0) or (b == 0):
return
pos = 0
for (x, y) in Q:
if (x == a) and (y == b):
return
if x * y < a * b:
break
pos = pos + 1
Q.insert(pos, (a, b))
def main(a, b):
Q = [(a, b)]
L = []
while True:
if len(Q) == 0:
break
(a, b) = Q.pop(0)
L.append((a, b)) # Replace this with C(a * b) and break if satisfied.
a1 = a - 1
b1 = b - 1
if (a == b):
insert_into_Q((a1, b), Q)
else:
insert_into_Q((a1, b), Q)
insert_into_Q((a, b1), Q)
print(L)
if __name__ == "__main__":
main(5, 5)
Note: this is a test of the function C(m) where m <= some target. It will not work for OP's general situation, but is a side case.
First find the highest number that satisfies C, and then find the pair that matches that high number. Finding the initial target number takes almost no time since its a binary search from 1 to 1E12. Finding the pair that matches is a bit harder, but is still not as bad as factoring.
Code:
public class TargetPractice {
private static final long MAX = 1000000L;
private long target;
public static void main(String[] args) {
Random r = new Random();
for (int i = 0; i < 5; i++) {
TargetPractice tp = new TargetPractice(r.nextInt((int) MAX), r.nextInt((int) MAX));
System.out.println("Trying to find " + tp.target);
System.gc();
long start = System.currentTimeMillis();
long foundTarget = tp.findTarget();
long end = System.currentTimeMillis();
System.out.println("Found " + foundTarget);
System.out.println("Elapsed time " + (end - start) + "\n");
}
}
public TargetPractice(long a, long b) {
target = a * b + 1;
}
private long binSearch() {
double delta = MAX * MAX / 2;
double target = delta;
while (delta != 0) {
if (hit((long) target)) {
target = target + delta / 2;
} else {
target = target - delta / 2;
}
delta = delta / 2;
}
long longTarget = (long) target;
for (int i = 10; i >= -10; i--) {
if (hit(longTarget + i)) {
return longTarget + i;
}
}
return -1;
}
private long findTarget() {
long target = binSearch();
long b = MAX;
while (target / b * b != target || target / b > MAX) {
b--;
if (b == 0 || target / b > MAX) {
b = MAX;
target--;
}
}
System.out.println("Found the pair " + (target/b) + ", " + b);
return target;
}
public boolean hit(long n) {
return n <= target;
}
}
It prints:
Trying to find 210990777760 Found the pair 255976, 824260
Found 210990777760 Elapsed time 5 Trying to find
414698196925 Found the pair 428076, 968749 Found
414698196924 Elapsed time 27
Trying to find 75280777586 Found the pair 78673, 956882 Found
75280777586 Elapsed time 1
Trying to find 75327435877 Found the pair 82236, 915991 Found
75327435876 Elapsed time 19
Trying to find 187413015763 Found the pair 243306, 770277
Found 187413015762 Elapsed time 23

Find all possible combinations from 4 input numbers which can add up to 24

Actually, this question can be generalized as below:
Find all possible combinations from a given set of elements, which meets
a certain criteria.
So, any good algorithms?
There are only 16 possibilities (and one of those is to add together "none of them", which ain't gonna give you 24), so the old-fashioned "brute force" algorithm looks pretty good to me:
for (unsigned int choice = 1; choice < 16; ++choice) {
int sum = 0;
if (choice & 1) sum += elements[0];
if (choice & 2) sum += elements[1];
if (choice & 4) sum += elements[2];
if (choice & 8) sum += elements[3];
if (sum == 24) {
// we have a winner
}
}
In the completely general form of your problem, the only way to tell whether a combination meets "certain criteria" is to evaluate those criteria for every single combination. Given more information about the criteria, maybe you could work out some ways to avoid testing every combination and build an algorithm accordingly, but not without those details. So again, brute force is king.
There are two interesting explanations about the sum problem, both in Wikipedia and MathWorld.
In the case of the first question you asked, the first answer is good for a limited number of elements. You should realize that the reason Mr. Jessop used 16 as the boundary for his loop is because this is 2^4, where 4 is the number of elements in your set. If you had 100 elements, the loop limit would become 2^100 and your algorithm would literally take forever to finish.
In the case of a bounded sum, you should consider a depth first search, because when the sum of elements exceeds the sum you are looking for, you can prune your branch and backtrack.
In the case of the generic question, finding the subset of elements that satisfy certain criteria, this is known as the Knapsack problem, which is known to be NP-Complete. Given that, there is no algorithm that will solve it in less than exponential time.
Nevertheless, there are several heuristics that bring good results to the table, including (but not limited to) genetic algorithms (one I personally like, for I wrote a book on them) and dynamic programming. A simple search in Google will show many scientific papers that describe different solutions for this problem.
Find all possible combinations from a given set of elements, which
meets a certain criteria
If i understood you right, this code will helpful for you:
>>> from itertools import combinations as combi
>>> combi.__doc__
'combinations(iterable, r) --> combinations object\n\nReturn successive r-length
combinations of elements in the iterable.\n\ncombinations(range(4), 3) --> (0,1
,2), (0,1,3), (0,2,3), (1,2,3)'
>>> set = range(4)
>>> set
[0, 1, 2, 3]
>>> criteria = range(3)
>>> criteria
[0, 1, 2]
>>> for tuple in list(combi(set, len(criteria))):
... if cmp(list(tuple), criteria) == 0:
... print 'criteria exists in tuple: ', tuple
...
criteria exists in tuple: (0, 1, 2)
>>> list(combi(set, len(criteria)))
[(0, 1, 2), (0, 1, 3), (0, 2, 3), (1, 2, 3)]
Generally for a problem as this you have to try all posebilities, the thing you should do have the code abort the building of combiantion if you know it will not satesfie the criteria (if you criteria is that you do not have more then two blue balls, then you have to abort calculation that has more then two). Backtracing
def perm(set,permutation):
if lenght(set) == lenght(permutation):
print permutation
else:
for element in set:
if permutation.add(element) == criteria:
perm(sett,permutation)
else:
permutation.pop() //remove the element added in the if
The set of input numbers matters, as you can tell as soon as you allow e.g. negative numbers, imaginary numbers, rational numbers etc in your start set. You could also restrict to e.g. all even numbers, all odd number inputs etc.
That means that it's hard to build something deductive. You need brute force, a.k.a. try every combination etc.
In this particular problem you could build an algoritm that recurses - e.g. find every combination of 3 Int ( 1,22) that add up to 23, then add 1, every combination that add to 22 and add 2 etc. Which can again be broken into every combination of 2 that add up to 21 etc. You need to decide if you can count same number twice.
Once you have that you have a recursive function to call -
combinations( 24 , 4 ) = combinations( 23, 3 ) + combinations( 22, 3 ) + ... combinations( 4, 3 );
combinations( 23 , 3 ) = combinations( 22, 2 ) + ... combinations( 3, 2 );
etc
This works well except you have to be careful around repeating numbers in the recursion.
private int[][] work()
{
const int target = 24;
List<int[]> combos = new List<int[]>();
for(int i = 0; i < 9; i++)
for(int x = 0; x < 9; x++)
for(int y = 0; y < 9; y++)
for (int z = 0; z < 9; z++)
{
int res = x + y + z + i;
if (res == target)
{
combos.Add(new int[] { x, y, z, i });
}
}
return combos.ToArray();
}
It works instantly, but there probably are better methods rather than 'guess and check'. All I am doing is looping through every possibility, adding them all together, and seeing if it comes out to the target value.
If i understand your question correctly, what you are asking for is called "Permutations" or the number (N) of possible ways to arrange (X) numbers taken from a set of (Y) numbers.
N = Y! / (Y - X)!
I don't know if this will help, but this is a solution I came up with for an assignment on permutations.
You have an input of : 123 (string) using the substr functions
1) put each number of the input into an array
array[N1,N2,N3,...]
2)Create a swap function
function swap(Number A, Number B)
{
temp = Number B
Number B = Number A
Number A = temp
}
3)This algorithm uses the swap function to move the numbers around until all permutations are done.
original_string= '123'
temp_string=''
While( temp_string != original_string)
{
swap(array element[i], array element[i+1])
if (i == 1)
i == 0
temp_string = array.toString
i++
}
Hopefully you can follow my pseudo code, but this works at least for 3 digit permutations
(n X n )
built up a square matrix of nxn
and print all together its corresponding crossed values
e.g.
1 2 3 4
1 11 12 13 14
2 .. .. .. ..
3 ..
4 .. ..

Find the maximum possible area

Given n non-negative integers a1, a2, ..., an, where each represents a
point at coordinate (i, ai). n vertical lines are drawn such that the
two endpoints of line i is at (i, ai) and (i, 0). Find two lines,
which together with x-axis forms a container, such that the container
contains the most water.
Note: You may not slant the container.
One solution could be that we take each and every line and find area with every line. This takes O(n^2). Not time efficient.
Another solution could be using DP to find the maximum area for every index, and then at index n, we will get the maximum area.
I think it's O(n).
Could there be more better solutions?
int maxArea(vector<int> &height) {
int ret = 0;
int left = 0, right = height.size() - 1;
while (left < right) {
ret = max(ret, (right - left) * min(height[left], height[right]));
if (height[left] <= height[right])
left++;
else
right--;
}
return ret;
}
Many people here are mistaking this problem to maximal rectangle problem, which is not the case.
Solution
Delete all the elements aj such that ai >= aj =< ak and i > j < k. This can be done in linear time.
Find the maximum value am
Let as = a1
For j = 2 through m-1, if as >= aj, delete aj, else as = aj
Let as = an
For j = n-1 through m+1, if as >= aj, delete aj, else as = aj
Notice that the resulting values look like a pyramid, that is, all the elements on the left of the maximum are strictly increasing and on the right are strictly decreasing.
i=1, j=n. m is location of max.
While i<=m and j>=m
Find area between ai and aj and keep track of the max
If ai < aj, i+=1, else j-=1
Complexity is linear (O(n))
Here is an implementation with Java:
Basic idea is to use two pointers from front and back, and calculate the area along the way.
public int maxArea(int[] height) {
int i = 0, j = height.length-1;
int max = Integer.MIN_VALUE;
while(i < j){
int area = (j-i) * Math.min(height[i], height[j]);
max = Math.max(max, area);
if(height[i] < height[j]){
i++;
}else{
j--;
}
}
return max;
}
Here is a clean Python3 solution. The runtime for this solution is O(n). It is important to remember that the area formed between two lines is determined by the height of the shorter line and the distance between the lines.
def maxArea(height):
"""
:type height: List[int]
:rtype: int
"""
left = 0
right = len(height) - 1
max_area = 0
while (left < right):
temp_area = ((right - left) * min(height[left], height[right]))
if (temp_area > max_area):
max_area = temp_area
elif (height[right] > height[left]):
left = left + 1
else:
right = right - 1
return max_area
This problem can be solved in linear time.
Construct a list of possible left walls (position+height pairs), in order from highest to lowest. This is done by taking the leftmost possible wall and adding it to the list, then going through all possible walls, from left to right, and taking every wall that is larger than the last wall added to the list. For example, for the array
2 5 4 7 3 6 2 1 3
your possible left walls would be (pairs are (pos, val)):
(3, 7) (1, 5) (0, 2)
Construct a list of possible right walls in the same way, but going from right to left. For the above array the possible right walls would be:
(3, 7) (5, 6) (8, 3)
Start your water level as high as possible, that is the minimum of heights of the walls at the front of the two lists. Calculate the total volume of water using those walls (it might be negative or zero, but that is ok), then drop the water level by popping an element off of one of the lists such that the water level drops the least. Calculate the possible water volume at each of these heights and take the max.
Running this algorithm on these lists would look like this:
L: (3, 7) (1, 5) (0, 2) # if we pop this one then our water level drops to 5
R: (3, 7) (5, 6) (8, 3) # so we pop this one since it will only drop to 6
Height = 7
Volume = (3 - 3) * 7 = 0
Max = 0
L: (3, 7) (1, 5) (0, 2) # we pop this one now so our water level drops to 5
R: (5, 6) (8, 3) # instead of 3, like if we popped this one
Height = 6
Volume = (5 - 3) * 6 = 12
Max = 12
L: (1, 5) (0, 2)
R: (5, 6) (8, 3)
Height = 5
Volume = (5 - 1) * 5 = 20
Max = 20
L: (1, 5) (0, 2)
R: (8, 3)
Height = 3
Volume = (8 - 1) * 3 = 21
Max = 21
L: (0, 2)
R: (8, 3)
Height = 2
Volume = (8 - 0) * 2 = 16
Max = 21
Steps 1, 2, and 3 all run in linear time, so the complete solution also takes linear time.
The best answer is by Black_Rider, however they did not provide an explanation.
I've found a very clear explanation on this blog. Shortly, it goes as follows:
Given array height of length n:
Start with the widest container you can, i.e. from left side at 0 to right side at n-1.
If a better container exists it will be narrower, so its both sides must be higher than the lower of currently chosen sides.
So, change left to (left+1) if height[left] < height[right], otherwise change right to (right-1).
Calculate new area, if it's better than what you have so far, replace.
If left < right, start over from 2.
My implementation in C++:
int maxArea(vector<int>& height) {
auto current = make_pair(0, height.size() - 1);
auto bestArea = area(height, current);
while (current.first < current.second) {
current = height[current.first] < height[current.second]
? make_pair(current.first + 1, current.second)
: make_pair(current.first, current.second - 1);
auto nextArea = area(height, current);
bestArea = max(bestArea, nextArea);
}
return bestArea;
}
inline int area(const vector<int>& height, const pair<int, int>& p) {
return (p.second - p.first) * min(height[p.first], height[p.second]);
}
This problem is a simpler version of The Maximal Rectangle Problem. The given situation can be view as a binary matrix. Consider the rows of the matrix as X-axis and columns as Y-axis. For every element a[i] in the array, set
Matrix[i][0] = Matrix[i][1] = ..... = Matrix[i][a[i]] = 1
For e.g - For a[] = { 5, 3, 7, 1}, our binary matrix is given by:
1111100
1110000
1111111
1000000

Randomly Generate a set of numbers of n length totaling x

I'm working on a project for fun and I need an algorithm to do as follows:
Generate a list of numbers of Length n which add up to x
I would settle for list of integers, but ideally, I would like to be left with a set of floating point numbers.
I would be very surprised if this problem wasn't heavily studied, but I'm not sure what to look for.
I've tackled similar problems in the past, but this one is decidedly different in nature. Before I've generated different combinations of a list of numbers that will add up to x. I'm sure that I could simply bruteforce this problem but that hardly seems like the ideal solution.
Anyone have any idea what this may be called, or how to approach it? Thanks all!
Edit: To clarify, I mean that the list should be length N while the numbers themselves can be of any size.
edit2: Sorry for my improper use of 'set', I was using it as a catch all term for a list or an array. I understand that it was causing confusion, my apologies.
This is how to do it in Python
import random
def random_values_with_prescribed_sum(n, total):
x = [random.random() for i in range(n)]
k = total / sum(x)
return [v * k for v in x]
Basically you pick n random numbers, compute their sum and compute a scale factor so that the sum will be what you want it to be.
Note that this approach will not produce "uniform" slices, i.e. the distribution you will get will tend to be more "egalitarian" than it should be if it was picked at random among all distribution with the given sum.
To see the reason you can just picture what the algorithm does in the case of two numbers with a prescribed sum (e.g. 1):
The point P is a generic point obtained by picking two random numbers and it will be uniform inside the square [0,1]x[0,1]. The point Q is the point obtained by scaling P so that the sum is required to be 1. As it's clear from the picture the points close to the center of the have an higher probability; for example the exact center of the squares will be found by projecting any point on the diagonal (0,0)-(1,1), while the point (0, 1) will be found projecting only points from (0,0)-(0,1)... the diagonal length is sqrt(2)=1.4142... while the square side is only 1.0.
Actually, you need to generate a partition of x into n parts. This is usually done the in following way: The partition of x into n non-negative parts can be represented in the following way: reserve n + x free places, put n borders to some arbitrary places, and stones to the rest. The stone groups add up to x, thus the number of possible partitions is the binomial coefficient (n + x \atop n).
So your algorithm could be as follows: choose an arbitrary n-subset of (n + x)-set, it determines uniquely a partition of x into n parts.
In Knuth's TAOCP the chapter 3.4.2 discusses random sampling. See Algortihm S there.
Algorithm S: (choose n arbitrary records from total of N)
t = 0, m = 0;
u = random, uniformly distributed on (0, 1)
if (N - t)*u >= n - m, skip t-th record and increase t by 1; otherwise include t-th record in the sample, increase m and t by 1
if M < n, return to 2, otherwise, algorithm finished
The solution for non-integers is algorithmically trivial: you just select arbitrary n numbers that don't sum up to 0, and norm them by their sum.
If you want to sample uniformly in the region of N-1-dimensional space defined by x1 + x2 + ... + xN = x, then you're looking at a special case of sampling from a Dirichlet distribution. The sampling procedure is a little more involved than generating uniform deviates for the xi. Here's one way to do it, in Python:
xs = [random.gammavariate(1,1) for a in range(N)]
xs = [x*v/sum(xs) for v in xs]
If you don't care too much about the sampling properties of your results, you can just generate uniform deviates and correct their sum afterwards.
Here is a version of the above algorithm in Javascript
function getRandomArbitrary(min, max) {
return Math.random() * (max - min) + min;
};
function getRandomArray(min, max, n) {
var arr = [];
for (var i = 0, l = n; i < l; i++) {
arr.push(getRandomArbitrary(min, max))
};
return arr;
};
function randomValuesPrescribedSum(min, max, n, total) {
var arr = getRandomArray(min, max, n);
var sum = arr.reduce(function(pv, cv) { return pv + cv; }, 0);
var k = total/sum;
var delays = arr.map(function(x) { return k*x; })
return delays;
};
You can call it with
var myarray = randomValuesPrescribedSum(0,1,3,3);
And then check it with
var sum = myarray.reduce(function(pv, cv) { return pv + cv;},0);
This code does a reasonable job. I think it produces a different distribution than 6502's answer, but I am not sure which is better or more natural. Certainly his code is clearer/nicer.
import random
def parts(total_sum, num_parts):
points = [random.random() for i in range(num_parts-1)]
points.append(0)
points.append(1)
points.sort()
ret = []
for i in range(1, len(points)):
ret.append((points[i] - points[i-1]) * total_sum)
return ret
def test(total_sum, num_parts):
ans = parts(total_sum, num_parts)
assert abs(sum(ans) - total_sum) < 1e-7
print ans
test(5.5, 3)
test(10, 1)
test(10, 5)
In python:
a: create a list of (random #'s 0 to 1) times total; append 0 and total to the list
b: sort the list, measure the distance between each element
c: round the list elements
import random
import time
TOTAL = 15
PARTS = 4
PLACES = 3
def random_sum_split(parts, total, places):
a = [0, total] + [random.random()*total for i in range(parts-1)]
a.sort()
b = [(a[i] - a[i-1]) for i in range(1, (parts+1))]
if places == None:
return b
else:
b.pop()
c = [round(x, places) for x in b]
c.append(round(total-sum(c), places))
return c
def tick():
if info.tick == 1:
start = time.time()
alpha = random_sum_split(PARTS, TOTAL, PLACES)
end = time.time()
log('alpha: %s' % alpha)
log('total: %.7f' % sum(alpha))
log('parts: %s' % PARTS)
log('places: %s' % PLACES)
log('elapsed: %.7f' % (end-start))
yields:
[2014-06-13 01:00:00] alpha: [0.154, 3.617, 6.075, 5.154]
[2014-06-13 01:00:00] total: 15.0000000
[2014-06-13 01:00:00] parts: 4
[2014-06-13 01:00:00] places: 3
[2014-06-13 01:00:00] elapsed: 0.0005839
to the best of my knowledge this distribution is uniform

Resources