I am trying to rewrite a fairness ranking algorithm (source: https://arxiv.org/abs/1802.07281) from Python to Rust. The objective is finding a document-ranking probability matrix that is doubly stochastic and, by use of an utility vector (i.e. the document relevance in this case) gives fair exposure to all document types.
The objective is thus to maximise the expected utility under the following constraints:
sum of probabilities for each position equals 1;
sum of probabilities for each document equals 1;
every probibility is valid (i.e. 0 <= P[i,j] <= 1);
P is fair (disparate treatment constraints).
In Python we have done this using CVXPY:
u = documents[['rel']].iloc[:n].values.ravel() # utility vector
v = np.array([1.0 / (np.log(2 + i)) for i in range(n)]) # position discount vector
P = cp.Variable((n, n)) # linear maximizing problem uͭPv s.t. P is doubly stochastic and fair.
# Construct f in fͭPv such that for P every group's exposure divided by mean utility should be
# equal (i.e. enforcing DTC). Do this for the set of every individual two groups:
# example: calculated f for three groups {a, b, c}
# resulting constraints: [a - b == 0, a - c == 0, b - c == 0]
groups = {k: group.index.values for k, group in documents.iloc[:n].groupby('document_type')}
fairness_constraints = []
for k0, k1 in combinations(groups, 2):
g0, g1 = groups[k0], groups[k1]
f_i = np.zeros(n)
f_i[g0] = 1 / u[g0].sum()
f_i[g1] = -1 / u[g1].sum()
fairness_constraints.append(f_i)
# Create convex problem to solve for finding the probabilities that
# a document is at a certain position/rank, matching the fairness criteria
objective = cp.Maximize(cp.matmul(cp.matmul(u, P), v))
constraints = ([cp.matmul(np.ones((1, n)), P) == np.ones((1, n)), # ┤
cp.matmul(P, np.ones((n,))) == np.ones((n,)), # ┤
0.0 <= P, P <= 1] + # └┤ doubly stochastic matrix constraints
[cp.matmul(cp.matmul(c, P), v) == 0 for c in fairness_constraints]) # DTC
prob = cp.Problem(objective, constraints)
prob.solve(solver=cp.CBC)
This works great for multiple solvers, including SCS, ECOS and CBC.
Now trying to implement the algorithm above to Rust, I have resolved to crates like good_lp and lp_modeler. These should both be able to solve linear problems using CBC as also demonstrated in the Python example above. I am struggling however to find examples on how to define the needed constraints on my matrix variable P.
The code below is my work in progress for rewriting the Python code in Rust, using in this case the lp_modeler crate as an example. The code below compiles but panics when run. Furthermore I don't know how to add the disparate treatment constraints in a way Rust likes, as no package seems to be able to accept equality constraints on two vectors.
let n = cmp::min(u.len(), 25);
let u: Array<f32, Ix1> = array![...]; // utility vector filled with dummy data
// position discount vector
let v: Array<f32, Ix1> = (0..n)
.map(|i| 1.0 / ((2 + i) as f32).ln())
.collect();
let P: Array<f32, Ix2> = Array::ones((n, n));
// dummy data for document indices and their types
let groups = vec![
vec![23], // type A
vec![8, 10, 16, 19], // type B
vec![0, 1, 2, 3, 4, 5, 6, 7, 9, 11, 12, 13, 15, 21, 24], // type C
vec![14, 17, 18, 20, 22] // type D
];
let mut fairness_contraints: Vec<Vec<f32>> = Vec::new();
for combo in groups.iter().combinations(2).unique() {
let mut f_i: Vec<f32> = vec![0f32; n];
{ // f_i[g0] = 1 / u[g0].sum()
let usum_g0: f32 = combo[0].iter()
.map(|&i| u[i])
.sum();
for &i in combo[0].iter() {
f_i[i] = 1f32 / usum_g0;
}
}
{ // f_i[g1] = -1 / u[g1].sum()
let usum_g1: f32 = combo[1].iter()
.map(|&i| u[i])
.sum();
for &i in combo[1].iter() {
f_i[i] = -1.0 / usum_g1;
}
}
fairness_contraints.push(f_i);
}
let mut problem = LpProblem::new("Fairness", LpObjective::Maximize);
problem += u.dot(&P).dot(&v); // Expected utility objective
// Doubly stochastic constraints
for col in P.columns() { // Sum of probabilities for each position
problem += sum(&col.to_vec(), |&el| el).equal(1);
}
for row in P.rows() { // Sum of probabilities for each document
problem += sum(&row.to_vec(), |&el| el).equal(1);
}
// Valid probability constraints
for el in P.iter() {
problem += lp_sum(&vec![el]).ge(0);
problem += lp_sum(&vec![el]).le(1);
}
// TODO: implement DTC fairness constraints
let solver = CbcSolver::new();
let result = solver.run(&problem);
Can anybody give me a nudge in the right direction on this specific problem? Thanks in advance!
Related
I have a one-dimensional array of non-negative integers. When plotted as a histogram, the range is mostly smaller numbers:
Is there an algorithm to redistribute the values more equally while maintaining the original order? By order I mean the minimum and maximum values would retain their original places in the array and everything else in between would scale up or down. When plotted afterwards the histogram would be more or less flat.
Searching the web I came across a "probability integral transform" in statistics, but that required sorting the data.
EDIT - Apologies for omitting why I don't want to sort it. The array is a plot and each integer represents a pixel. If I sort it that would destroy the plot. I'm dividing each integer by the maximum value and using that as an index into a palette. Because there's so much bias towards smaller values, only a small amount of the palette is visible in the final image. I thought if I was able to redistribute the values somehow, it'd use the full range of the palette.
You could apply this algorithm:
Let c be a freely chosen coefficient between 0 and 1: the closer to 1 the more close the resulting values will be to each other. If exactly 1, then all values will be equal. If 0, the result will be the original data set. A candidate value for c could for instance be 0.9
Let avg be the average of the input values
Apply the following transformation to each value in the input set:
new value := value * (1 − c) + avg * c
Here is an interactive implementation in a JavaScript snippet:
let a = [150, 100, 40, 33, 9, 3, 5, 13, 8, 1, 3, 2, 1, 1, 0, 0];
let avg = a.reduce((acc, val) => acc + val) / a.length;
function refresh(c) {
// Apply transformation from a to b using c:
display(a.map(val => val * (1 - c) + avg * c));
}
// I/O management
var display = (function (b) {
this.clearRect(0, 0, this.canvas.width, this.canvas.height); // Clear display
for (let i = 0; i < b.length; i++) {
this.beginPath();
this.rect(20 + i * 20, 150 - b[i], 19, b[i]);
this.fill();
}
}).bind(document.querySelector("canvas").getContext("2d"));
document.querySelector("input").oninput = e => refresh(+e.target.value);
refresh(0);
Coefficient: <input type="number" min="0" max="1" step="0.01" value="0"><br>
<canvas height="150" width="400"></canvas>
Use the Coefficient input box to experiment with different values for it on a sample data set.
In Python the transformation, for a given list a and coefficient c, could look like this:
avg = sum(a) / len(a)
b = [value * (1 - c) + avg * c for value in a]
There are K points on a circle that represent the location of the treasures. N people want to share the treasures. You want to divide the treasure fairly among all of them such that the difference between the person having the maximum value and the person having the minimum value is as minimum as possible.
They all take contiguous set of points on the circle. That is, they
cannot own segmented treasures.
All the treasures must be allocated
Each treasure should only belong to only one person.
For example if there are 4 treasures and 2 people as shown in the figure, then the optimal way of dividing would be
(6, 10) and (11, 3) => with a difference of 2.
1 <= n <= 25
1 <= k <= 50
How do I approach solving this problem? I planned to calculate the mean of all the points and keep adding the resources until they are lesser than the mean for each person. But as obvious as it is, it will not work in all cases.
I'd be glad if someone throws some light.
So say we fix x, y as the min max allowed for the treasure.
I need to figure out if we can get a solution in these constraints.
For that I need to traverse the circle and create exactly N segments with sums between x and y.
This I can solve via dynamic programming, a[i][j][l] = 1 if I can split the elements between i and j into l whose sums are between x and y (see above). To compute it we can evaluate a[i][j][l] = is_there_a_q_such_that(a[i][q - 1][l-1] == 1 and sum(q -> j) between x and y).
To handle the circle look for n-1 segments that cover enough elements and the remaining difference remains between x and y.
So naive solution is O(total_sum^2) to select X and Y plus O(K^3) to iterate over i,j,l and another O(K) to find a q and another O(K) to get the sum. That's a total of O(total_sum^2 * K^5) which likely is too slow.
So we need to compute sums a lot. So let's precompute a partial sums array sums[w] = sum(elements between pos 0 and pos w). So to get the sum between q and j you only need to compute sums[j] - sums[q-1]. This takes care of O(K).
To compute the a[i][j][l].
Since the treasures are always positive, if a partial sum is too small we need to grow the interval, if the sum is too high we need to shrink the interval. Sine we fixed a side of the interval (at j) we can only move the q. We can use binary search to find the closes t and the furthest q that allow us to be between x and y. Let's call them low_q (the closest to j, lowest sum) and high_q (far from j, largest sum). If low_q < i then the interval is too small so the value is 0. So now we need to check if there's a 1 between max(high_q, i) and low_q. The max is to make sure we don't look outside of the interval. To do the check we can precompute again partial sums to count how many 1s are in out interval. We only need to do this once per level so it will be amortized O(1). So, if we did everything right this will be O(K^3 logK).
We still have the total_sum^2 in front. Let's say we fix X. If for a given y we have a solution you might be able to find also a smaller y that still has a solution. If you can't find a solution for a given y then you won't be able to find a solution for any smaller value. So we can now do a binary search on y.
So this is now O(total_sum *log(total_sum) * K^3 * logK).
Other optimization would be to not raise i if the sum(0-> i- 1) > x.
You might not want to check for values of x > total_sum/K since that's the ideal minimum value. This should cancel out one of the K is the complexity.
There might be other things that you can do, but I think this will be fast enough for your constraints.
You can do brute-force for O(k^n) or dp for O(k^{2}*MAXSUM^{k — 1}).
dp[i][val1][val2]...[val k -1] is it possible to distribute first k items, so first have val1, second — val2 and so on. There are k * MAXSUM(k - 1) states and you need O(k) to do step, you simply choose who takes ith item.
I dont think it's possible to solve it faster.
No standard type of algorithm ( greedy, divide and conquer etc ) exists for this problem.
You would have to check each and every combination of (resource, people) and pick the answer. Once you have solved the problem using recursion, you can throw DP to optimize the solution.
The curx of the solution is:
Recuse through all the treasures
If you current treasure is not the last,
set minimum difference to Infinity
for each user
assign the current treasure to the current user
ans = recurse further by going to the next treasure
update minimumDifference if necessary
else
Find and max amount of treasure assigned and minimum amount of treasure assigned
and return the difference
Here is the javascript version of the answer.
I have commented it to try to explain the logic as well:
// value of the treasure
const K = [6, 3, 11, 10];
// number of users
const N = 2;
// Array which track amount of treasure with each user
const U = new Array(N).fill(0);
// 2D array to save whole solution
const bitset = [...new Array(N)].map(() => [...new Array(K.length)]);
const solve = index => {
/**
* The base case:
* We are out of treasure.
* So far, the assigned treasures will be in U array
*/
if (index >= K.length) {
/**
* Take the maximum and minimum and return the difference along with the bitset
*/
const max = Math.max(...U);
const min = Math.min(...U);
const answer = { min: max - min, set: bitset };
return answer;
}
/**
* We have treasures to check
*/
let answer = { min: Infinity, set: undefined };
for (let i = 0; i < N; i++) {
// Let ith user take the treasure
U[i] += K[index];
bitset[i][index] = 1;
/**
* Let us recuse and see what will be the answer if ith user has treasure at `index`
* Note that ith user might also have other treasures for indices > `index`
*/
const nextAnswer = solve(index + 1);
/**
* Did we do better?
* Was the difference bw the distribution of treasure reduced?
* If so, let us update the current answer
* If not, we can assign treasure at `index` to next user (i + 1) and see if we did any better or not
*/
if (nextAnswer.min <= answer.min) {
answer = JSON.parse(JSON.stringify(nextAnswer));
}
/**
* Had we done any better,the changes might already be recorded in the answer.
* Because we are going to try and assign this treasure to the next user,
* Let us remove it from the current user before iterating further
*/
U[i] -= K[index];
bitset[i][index] = 0;
}
return answer;
};
const ans = solve(0);
console.log("Difference: ", ans.min);
console.log("Treasure: [", K.join(", "), "]");
console.log();
ans.set.forEach((x, i) => console.log("User: ", i + 1, " [", x.join(", "), "]"));
Each problem at index i creates exactly N copies of
itself and we have total K indices, the time complexity of the problem
to solve is O(K^N)
We can definitely do better by throwing memoization.
Here comes the tricky part:
If we have a distribution of treasure for one user, the minimum
difference of the distributions of treasure among users will be
the same.
In out case, bitset[i] represents the distribution for ith user.
Thus, we can memoize the results for the bitset of the user.
One you realize that, coding that is easy:
// value of the treasure
const K = [6, 3, 11, 10, 1];
// number of users
const N = 2;
// Array which track amount of treasure with each user
const U = new Array(N).fill(0);
// 2D array to save whole solution
const bitset = [...new Array(N)].map(() => [...new Array(K.length).fill(0)]);
const cache = {};
const solve = (index, userIndex) => {
/**
* Do we have cached answer?
*/
if (cache[bitset[userIndex]]) {
return cache[bitset[userIndex]];
}
/**
* The base case:
* We are out of treasure.
* So far, the assigned treasures will be in U array
*/
if (index >= K.length) {
/**
* Take the maximum and minimum and return the difference along with the bitset
*/
const max = Math.max(...U);
const min = Math.min(...U);
const answer = { min: max - min, set: bitset };
// cache the answer
cache[bitset[userIndex]] = answer;
return answer;
}
/**
* We have treasures to check
*/
let answer = { min: Infinity, set: undefined };
// Help us track the index of the user with optimal answer
let minIndex = undefined;
for (let i = 0; i < N; i++) {
// Let ith user take the treasure
U[i] += K[index];
bitset[i][index] = 1;
/**
* Let us recuse and see what will be the answer if ith user has treasure at `index`
* Note that ith user might also have other treasures for indices > `index`
*/
const nextAnswer = solve(index + 1, i);
/**
* Did we do better?
* Was the difference bw the distribution of treasure reduced?
* If so, let us update the current answer
* If not, we can assign treasure at `index` to next user (i + 1) and see if we did any better or not
*/
if (nextAnswer.min <= answer.min) {
answer = JSON.parse(JSON.stringify(nextAnswer));
minIndex = i;
}
/**
* Had we done any better,the changes might already be recorded in the answer.
* Because we are going to try and assign this treasure to the next user,
* Let us remove it from the current user before iterating further
*/
U[i] -= K[index];
bitset[i][index] = 0;
}
cache[answer.set[minIndex]] = answer;
return answer;
};
const ans = solve(0);
console.log("Difference: ", ans.min);
console.log("Treasure: [", K.join(", "), "]");
console.log();
ans.set.forEach((x, i) => console.log("User: ", i + 1, " [", x.join(", "), "]"));
// console.log("Cache:\n", cache);
We can definitely improve the space used by not caching the whole bitset. Removing the bitset from cahce is trivial.
Consider that for each k, we can pair a sum growing from A[i] to the left (sum A[i-j..i]) with all available intervals recorded for f(k-1, i-j-1) and update them: for each interval, (low, high), if the sum is greater than high, then new_interval = (low, sum) and if the sum is lower than low, then new_interval = (sum, high); otherwise, the interval stays the same. For example,
i: 0 1 2 3 4 5
A: [5 1 1 1 3 2]
k = 3
i = 3, j = 0
The ordered intervals available for f(3-1, 3-0-1) = f(2,2) are:
(2,5), (1,6) // These were the sums, (A[1..2], A[0]) and (A[2], A[0..1])
Sum = A[3..3-0] = 1
Update intervals: (2,5) -> (1,5)
(1,6) -> (1,6) no change
Now, we can make this iteration much more efficient by recognizing and pruning intervals during the previous k round.
Watch:
A: [5 1 1 1 3 2]
K = 1:
N = 0..5; Intervals: (5,5), (6,6), (7,7), (8,8), (11,11), (13,13)
K = 2:
N = 0: Intervals: N/A
N = 1: Intervals: (1,5)
N = 2: (1,6), (2,5)
Prune: remove (1,6) since any sum <= 1 would be better paired with (2,5)
and any sum >= 6 would be better paired with (2,5)
N = 3: (1,7), (2,6), (3,5)
Prune: remove (2,6) and (1,7)
N = 4: (3,8), (4,7), (5,6), (5,6)
Prune: remove (3,8) and (4,7)
N = 5: (2,11), (5,8), (6,7)
Prune: remove (2,11) and (5,8)
For k = 2, we are now left with the following pruned record:
{
k: 2,
n: {
1: (1,5),
2: (2,5),
3: (3,5),
4: (5,6),
5: (6,7)
}
}
We've cut down the iteration of k = 3 from a list of n choose 2 possible splits to n relevant splits!
The general algorithm applied to k = 3:
for k' = 1 to k
for sum A[i-j..i], for i <- [k'-1..n], j <- [0..i-k'+1]:
for interval in record[k'-1][i-j-1]: // records are for [k'][n']
update interval
prune intervals in k'
k' = 3
i = 2
sum = 1, record[2][1] = (1,5) -> no change
i = 3
// sums are accumulating right to left starting from A[i]
sum = 1, record[2][2] = (2,5) -> (1,5)
sum = 2, record[2][1] = (1,5) -> no change
i = 4
sum = 3, record[2][3] = (3,5) -> no change
sum = 4, record[2][2] = (2,5) -> no change
sum = 5, record[2][1] = (1,5) -> no change
i = 5
sum = 2, record[2][4] = (5,6) -> (2,6)
sum = 5, record[2][3] = (3,5) -> no change
sum = 6, record[2][2] = (2,5) -> (2,6)
sum = 7, record[2][1] = (1,5) -> (1,7)
The answer is 5 paired with record[2][3] = (3,5), yielding the updated interval, (3,5). I'll leave the pruning logic for the reader to work out. If we wanted to continue, here's the pruned list for k = 3
{
k: 3
n: {
2: (1,5),
3: (1,5),
4: (3,5),
5: (3,5)
}
}
Consider this cartesian graph where each index represents a weight.
[3, 2, 1, 4, 2
1, 3, 3, 2, 2
S, 3, 4, 1, D
3, 1, 2, 4, 3
4, 2, 3, 1, 4]
A man is standing at source 'S' and he has to reach destination 'D' at minimum cost. Constraints are:
If the man moves from one index to another index where both index share same cost, the cost of moving man is '1'.
If the man moves from one index to another index where both indexes have different cost, the cost of moving man is abs(n-m)*10 + 1.
Last but not the least, man can only move up, down, left & right. No diagonal moves.
Which data structure & algorithm is best suited for this problem. I have thought of representing this problem as a graph and use one of the greedy approaches but could not reach to clean solution in my mind.
I would use A* to solve the problem. The distance can be estimated by dx + dy + 10 * dValue + distance travelled (it is impossible that the way is shorter than that, see example at the bottom). The idea of A* is to expand always the node with the lowest estimated distance, as soon as you find the destination node you are finished. This works if the estimation never over-estimates the distance. Here is an implementation in JS (fiddle):
function solve(matrix, sRow, sCol, eRow, eCol) {
if (sRow == eRow && sCol == eCol)
return 0;
let n = matrix.length, m = matrix[0].length;
let d = [], dirs = [[-1, 0], [0, 1], [1, 0], [0, -1]];
for (let i = 0; i < n; i++) {
d.push([]);
for (let j = 0; j < m; j++)
d[i].push(1000000000);
}
let list = [[sRow, sCol, 0]];
d[sRow][sCol] = 0;
for (;;) {
let pos = list.pop();
for (let i = 0; i < dirs.length; i++) {
let r = pos[0] + dirs[i][0], c = pos[1] + dirs[i][1];
if (r >= 0 && r < n && c >= 0 && c < m) {
let v = d[pos[0]][pos[1]] + 1 + 10 * Math.abs(matrix[pos[0]][pos[1]] - matrix[r][c]);
if (r == eRow && c == eCol)
return v;
if (v < d[r][c]) {
d[r][c] = v;
list.push([r, c, v + Math.abs(r - eRow) + Math.abs(c - eCol) + 10 * Math.abs(matrix[r][c] - matrix[eRow][eCol])]);
}
}
}
list.sort(function(a, b) {
if (a[2] > b[2])
return -1;
if (a[2] < b[2])
return 1;
return 0;
});
}
}
The answer for the example is 46 and only 8 nodes are getting expanded!
Estimation example, from (0,0) to D:
distance from S to (0,0) is 22
dx = abs(0 - 4) = 4
dy = abs(0 - 2) = 2
dValue = abs(3 - 1) = 2
estimation = distance + dx + dy + 10 * dValue = 22 + 4 + 2 + 10 * 2 = 48
Note: the implementation uses rows and columns insted of x and y, so they are swapped, it doesn't really matter it just has to be consistent.
Although not explicitly stated, in the problem formulation there seem to be only positive node weights, which means that a shortest path will have no repetition of nodes. As the cost does not depend on the nodes only, approaches like the Bellman-Ford algorithm or the algorithm by Dijkstra are not suitable.
That being said, apparently the path can be found recursively by using depth-first search, where nodes which are currently occuring in the stack may not be visited. Every time the destination is reached, the current path (which is contained in the stack at each time the destination is reached) along with its associated cost, which could be maintained in an auxiliary variable, could be evaluated against the best previously found path. On termination, a path with minimum cost would be stored.
I'm working on a script in mathematica that will take simulate a string held at either end and plucked, by solving the wave equation via numerical methods. (http://en.wikipedia.org/wiki/Wave_equation#Investigation_by_numerical_methods)
n = 5; (*The number of discreet elements to be used*)
L = 1.0; (*The length of the string that is vibrating*)
a = 1.0/3.0; (*The distance from the left side that the string is \
plucked at*)
T = 1; (*The tension in the string*)
[Rho] = 1; (*The length density of the string*)
y0 = 0.1; (*The vertical distance of the string pluck*)
[CapitalDelta]x = L/n; (*The length of each discreet element*)
m = ([Rho]*L)/n;(*The mass of each individual node*)
c = Sqrt[T/[Rho]];(*The speed at which waves in the string propogate*)
I set all my variables
Y[t] = Array[f[t], {n - 1, 1}];
MatrixForm(*Creates a vector size n-1 by 1 of functions \
representing each node*)
I define my Vector of nodal position functions
K = MatrixForm[
SparseArray[{Band[{1, 1}] -> -2, Band[{2, 1}] -> 1,
Band[{1, 2}] -> 1}, {n - 1,
n - 1}]](*Creates a matrix size n by n governing the coupling \
between each node*)
I create the stiffness matrix relating all the nodal functions to one another
Y0 = MatrixForm[
Table[Piecewise[{{(((i*L)/n)*y0)/a,
0 < ((i*L)/n) < a}, {(-((i*L)/n)*y0)/(L - a) + (y0*L)/(L - a),
a < ((i*L)/n) < L}}], {i, 1, n - 1}]]
I define the initial positions of each node using a piecewise function
NDSolve[{Y''[t] == (c/[CapitalDelta]x)^2 Y[t].K, Y[0] == Y0,
Y'[0] == 0},
Y, {t, 0, 10}];(*Numerically solves the system of second order DE's*)
Finally, This should solve for the values of the individual nodes, but it returns an error:
"NDSolve::ndinnt : Initial condition [Y0 table] is not a number or a rectangular array"
So , it would seem that I don't have a firm grasp on how matrices work in mathematica. I would greatly appreciate it if anyone could help me get this last line of code to run properly.
Thank you,
Brad
I don't think you should use MatrixForm when defining the matrices. MatrixForm is used to format a list of list as a matrix, usually when you display it. Try removing it and see if it works.
I'm working on a project for fun and I need an algorithm to do as follows:
Generate a list of numbers of Length n which add up to x
I would settle for list of integers, but ideally, I would like to be left with a set of floating point numbers.
I would be very surprised if this problem wasn't heavily studied, but I'm not sure what to look for.
I've tackled similar problems in the past, but this one is decidedly different in nature. Before I've generated different combinations of a list of numbers that will add up to x. I'm sure that I could simply bruteforce this problem but that hardly seems like the ideal solution.
Anyone have any idea what this may be called, or how to approach it? Thanks all!
Edit: To clarify, I mean that the list should be length N while the numbers themselves can be of any size.
edit2: Sorry for my improper use of 'set', I was using it as a catch all term for a list or an array. I understand that it was causing confusion, my apologies.
This is how to do it in Python
import random
def random_values_with_prescribed_sum(n, total):
x = [random.random() for i in range(n)]
k = total / sum(x)
return [v * k for v in x]
Basically you pick n random numbers, compute their sum and compute a scale factor so that the sum will be what you want it to be.
Note that this approach will not produce "uniform" slices, i.e. the distribution you will get will tend to be more "egalitarian" than it should be if it was picked at random among all distribution with the given sum.
To see the reason you can just picture what the algorithm does in the case of two numbers with a prescribed sum (e.g. 1):
The point P is a generic point obtained by picking two random numbers and it will be uniform inside the square [0,1]x[0,1]. The point Q is the point obtained by scaling P so that the sum is required to be 1. As it's clear from the picture the points close to the center of the have an higher probability; for example the exact center of the squares will be found by projecting any point on the diagonal (0,0)-(1,1), while the point (0, 1) will be found projecting only points from (0,0)-(0,1)... the diagonal length is sqrt(2)=1.4142... while the square side is only 1.0.
Actually, you need to generate a partition of x into n parts. This is usually done the in following way: The partition of x into n non-negative parts can be represented in the following way: reserve n + x free places, put n borders to some arbitrary places, and stones to the rest. The stone groups add up to x, thus the number of possible partitions is the binomial coefficient (n + x \atop n).
So your algorithm could be as follows: choose an arbitrary n-subset of (n + x)-set, it determines uniquely a partition of x into n parts.
In Knuth's TAOCP the chapter 3.4.2 discusses random sampling. See Algortihm S there.
Algorithm S: (choose n arbitrary records from total of N)
t = 0, m = 0;
u = random, uniformly distributed on (0, 1)
if (N - t)*u >= n - m, skip t-th record and increase t by 1; otherwise include t-th record in the sample, increase m and t by 1
if M < n, return to 2, otherwise, algorithm finished
The solution for non-integers is algorithmically trivial: you just select arbitrary n numbers that don't sum up to 0, and norm them by their sum.
If you want to sample uniformly in the region of N-1-dimensional space defined by x1 + x2 + ... + xN = x, then you're looking at a special case of sampling from a Dirichlet distribution. The sampling procedure is a little more involved than generating uniform deviates for the xi. Here's one way to do it, in Python:
xs = [random.gammavariate(1,1) for a in range(N)]
xs = [x*v/sum(xs) for v in xs]
If you don't care too much about the sampling properties of your results, you can just generate uniform deviates and correct their sum afterwards.
Here is a version of the above algorithm in Javascript
function getRandomArbitrary(min, max) {
return Math.random() * (max - min) + min;
};
function getRandomArray(min, max, n) {
var arr = [];
for (var i = 0, l = n; i < l; i++) {
arr.push(getRandomArbitrary(min, max))
};
return arr;
};
function randomValuesPrescribedSum(min, max, n, total) {
var arr = getRandomArray(min, max, n);
var sum = arr.reduce(function(pv, cv) { return pv + cv; }, 0);
var k = total/sum;
var delays = arr.map(function(x) { return k*x; })
return delays;
};
You can call it with
var myarray = randomValuesPrescribedSum(0,1,3,3);
And then check it with
var sum = myarray.reduce(function(pv, cv) { return pv + cv;},0);
This code does a reasonable job. I think it produces a different distribution than 6502's answer, but I am not sure which is better or more natural. Certainly his code is clearer/nicer.
import random
def parts(total_sum, num_parts):
points = [random.random() for i in range(num_parts-1)]
points.append(0)
points.append(1)
points.sort()
ret = []
for i in range(1, len(points)):
ret.append((points[i] - points[i-1]) * total_sum)
return ret
def test(total_sum, num_parts):
ans = parts(total_sum, num_parts)
assert abs(sum(ans) - total_sum) < 1e-7
print ans
test(5.5, 3)
test(10, 1)
test(10, 5)
In python:
a: create a list of (random #'s 0 to 1) times total; append 0 and total to the list
b: sort the list, measure the distance between each element
c: round the list elements
import random
import time
TOTAL = 15
PARTS = 4
PLACES = 3
def random_sum_split(parts, total, places):
a = [0, total] + [random.random()*total for i in range(parts-1)]
a.sort()
b = [(a[i] - a[i-1]) for i in range(1, (parts+1))]
if places == None:
return b
else:
b.pop()
c = [round(x, places) for x in b]
c.append(round(total-sum(c), places))
return c
def tick():
if info.tick == 1:
start = time.time()
alpha = random_sum_split(PARTS, TOTAL, PLACES)
end = time.time()
log('alpha: %s' % alpha)
log('total: %.7f' % sum(alpha))
log('parts: %s' % PARTS)
log('places: %s' % PLACES)
log('elapsed: %.7f' % (end-start))
yields:
[2014-06-13 01:00:00] alpha: [0.154, 3.617, 6.075, 5.154]
[2014-06-13 01:00:00] total: 15.0000000
[2014-06-13 01:00:00] parts: 4
[2014-06-13 01:00:00] places: 3
[2014-06-13 01:00:00] elapsed: 0.0005839
to the best of my knowledge this distribution is uniform