Getting N random numbers whose sum is M - random

I want to get N random numbers whose sum is a value.
For example, let's suppose I want 5 random numbers that sum to 1.
Then, a valid possibility is:
0.2 0.2 0.2 0.2 0.2
Another possibility is:
0.8 0.1 0.03 0.03 0.04
And so on. I need this for the creation of a matrix of belongings for Fuzzy C-means.

Short Answer:
Just generate N random numbers, compute their sum, divide each one by
the sum and multiply by M.
Longer Answer:
The above solution does not yield a uniform distribution which might be an issue depending on what these random numbers are used for.
Another method proposed by Matti Virkkunen:
Generate N-1 random numbers between 0 and 1, add the numbers 0 and 1
themselves to the list, sort them, and take the differences of
adjacent numbers.
This yields a uniform distribution as is explained here

Generate N-1 random numbers between 0 and 1, add the numbers 0 and 1 themselves to the list, sort them, and take the differences of adjacent numbers.

I think it is worth noting that the currently accepted answer does not give a uniform distribution:
"Just generate N random numbers,
compute their sum, divide each one by
the sum"
To see this let's look at the case N=2 and M=1. This is a trivial case, since we can generate a list [x,1-x], by choosing x uniformly in the range (0,1).
The proposed solution generates a pair [x/(x+y), y/(x+y)] where x and y are uniform in (0,1). To analyze this we choose some z such that 0 < z < 0.5 and compute the probability that
the first element is smaller than z. This probaility should be z if the distribution were uniform. However, we get
Prob(x/(x+y) < z) = Prob(x < z(x+y)) = Prob(x(1-z) < zy) = Prob(x < y(z/(1-z))) = z/(2-2z).
I did some quick calculations and it appears that the only solution so far that appers to result in a uniform distribution was proposed by Matti Virkkunen:
"Generate N-1 random numbers between 0 and 1, add the numbers 0 and 1 themselves to the list, sort them, and take the differences of adjacent numbers."

Unfortunately, a number of the answers here are incorrect if you'd like uniformly random numbers. The easiest (and fastest in many languages) solution that guarantees uniformly random numbers is just
# This is Python, but most languages support the Dirichlet.
import numpy as np
np.random.dirichlet(np.ones(n))*m
where n is the number of random numbers you want to generate and m is the sum of the resulting array. This approach produces positive values and is particularly useful for generating valid probabilities that sum to 1 (let m = 1).

To generate N positive numbers that sum to a positive number M at random, where each possible combination is equally likely:
Generate N exponentially-distributed random variates. One way to generate such a number can be written as—
number = -ln(1.0 - RNDU())
where ln(x) is the natural logarithm of x and RNDU() is a method that returns a uniform random variate greater than 0 and less than 1. Note that generating the N variates with a uniform distribution is not ideal because a biased distribution of random variate combinations will result. However, the implementation given above has several problems, such as being ill-conditioned at large values because of the distribution's right-sided tail, especially when the implementation involves floating-point arithmetic. Another implementation is given in another answer.
Divide the numbers generated this way by their sum.
Multiply each number by M.
The result is N numbers whose sum is approximately equal to M (I say "approximately" because of rounding error). See also the Wikipedia article Dirichlet distribution.
This problem is also equivalent to the problem of generating random variates uniformly from an N-dimensional unit simplex.
However, for better accuracy (compared to the alternative of using floating-point numbers, which often occurs in practice), you should consider generating n random integers that sum to an integer m * x, and treating those integers as the numerators to n rational numbers with denominator x (and will thus sum to m assuming m is an integer). You can choose x to be a large number such as 232 or 264 or some other number with the desired precision. If x is 1 and m is an integer, this solves the problem of generating random integers that sum to m.
The following pseudocode shows two methods for generating n uniform random integers with a given positive sum, in random order. (The algorithm for this was presented in Smith and Tromble, "Sampling Uniformly from the Unit Simplex", 2004.) In the pseudocode below—
the method PositiveIntegersWithSum returns n integers greater than 0 that sum to m, in random order,
the method IntegersWithSum returns n integers 0 or greater that sum to m, in random order, and
Sort(list) sorts the items in list in ascending order (note that sort algorithms are outside the scope of this answer).
 
METHOD PositiveIntegersWithSum(n, m)
if n <= 0 or m <=0: return error
ls = [0]
ret = NewList()
while size(ls) < n
c = RNDINTEXCRANGE(1, m)
found = false
for j in 1...size(ls)
if ls[j] == c
found = true
break
end
end
if found == false: AddItem(ls, c)
end
Sort(ls)
AddItem(ls, m)
for i in 1...size(ls): AddItem(ret,
ls[i] - ls[i - 1])
return ret
END METHOD
METHOD IntegersWithSum(n, m)
if n <= 0 or m <=0: return error
ret = PositiveIntegersWithSum(n, m + n)
for i in 0...size(ret): ret[i] = ret[i] - 1
return ret
END METHOD
Here, RNDINTEXCRANGE(a, b) returns a uniform random integer in the interval [a, b).

In Java:
private static double[] randSum(int n, double m) {
Random rand = new Random();
double randNums[] = new double[n], sum = 0;
for (int i = 0; i < randNums.length; i++) {
randNums[i] = rand.nextDouble();
sum += randNums[i];
}
for (int i = 0; i < randNums.length; i++) {
randNums[i] /= sum * m;
}
return randNums;
}

Generate N-1 random numbers.
Compute the sum of said numbers.
Add the difference between the computed sum and the desired sum to the set.
You now have N random numbers, and their sum is the desired sum.

Just generate N random numbers, compute their sum, divide each one by
the sum.
Expanding on Guillaume's accepted answer, here's a Java function that does exactly that.
public static double[] getRandDistArray(int n, double m)
{
double randArray[] = new double[n];
double sum = 0;
// Generate n random numbers
for (int i = 0; i < randArray.length; i++)
{
randArray[i] = Math.random();
sum += randArray[i];
}
// Normalize sum to m
for (int i = 0; i < randArray.length; i++)
{
randArray[i] /= sum;
randArray[i] *= m;
}
return randArray;
}
In a test run, getRandDistArray(5, 1.0) returned the following:
[0.38106150346121903, 0.18099632814238079, 0.17275044310377025, 0.01732932296660358, 0.24786240232602647]

You're a little slim on constraints. Lots and lots of procedures will work.
For example, are numbers normally distributed? Uniform?
I'l assume that all the numbers must be positive and uniformly distributed around the mean, M/N.
Try this.
mean= M/N.
Generate N-1 values between 0 and 2*mean. This can be a standard number between 0 and 1, u, and the random value is (2*u-1)*mean to create a value in an appropriate range.
Compute the sum of the N-1 values.
The remaining value is N-sum.
If the remaining value does not fit the constraints (0 to 2*mean) repeat the procedure.

Related

Algorithm to generate positive random integers smaller than given integer n

There are algorithms to generate random numbers like:
number = (previous_number * constant + other_constant) mod third_constant
for carefully selected constants.
But I need algorithm to generate random integers that are in range of 0 to n-1. (Obviously not running loop and getting the counter, I need randomness). How is this possible? Thank you.
Use third_constant = n
The multiply and add operations give you some number, then when you do the mod you get an integer from 0 to third_constant -1, so just use n for the third_constant and you're done.
This is achieved by generating a pseudo-random number in range [0,1) (which exists in most languages), and then :
r = rand()
res = floor(rand*m)
One way to generate a random in [0,1) is to generate random integer in [0,MAX_RAND), and divide by MAX_RAND.
You can also use a random int generation from [0,MAX_RAND) (let it be r, and return r % m - but beware of bias if MAX_RAND is not significantly larger than m (by 2-3 scales)

Limited space average computation

Let's say I have N integers, where N can get huge, but each int is guaranteed to be between 0 and some cap M, where M fits easily in a signed 32-bit field.
If I want to compute the average of these N integers, I can't always just sum and divide them all in the same signed 32-bit space - the numerator carries a risk of overflow if N is too large. One solution to this problem is to just use 64-bit fields for the computation, to hold for larger N, but this solution doesn't scale - If M were a large 64-bit integer instead, the same problem would arise.
Does anyone know of an algorithm (preferably O(N)) that can compute the average of a list of positive integers in the same bit-space? Without doing something cheap like using two integers to simulate a larger one.
Supposing you know M initially, you can keep two variables, one is the answer so far divided by M, and the other is the remainder.
For example, in C++:
int ans = 0, remainder = 0;
for (int i=0;i<N;i++) {
remainder += input[i]; // update remainder so far
ans += remainder/N; // move what we can from remainder into ans
remainder%=N; // calculate what's left of remainder
}
At the end of the loop, the answer is found in ans, with a remainder in remainder (if you need a rounding method other than truncation).
This example works where the maximum input number M+N fits in a 32-bit int.
Note that this should work for positive and negative integers, because in C++, the / operator is the division operator, and % is actually a remainder operator (not really a modulo operator).
You can calculate a running average. If you have the average A of N elements, and you add another element E, the new average is (A*N+E)/(N+1). By the distributive property of division over addition, this is equivalent to (A*N)/(N+1) + E/(N+1). But if A*N overflows, you can use the associative property of multiplication and division, you can convert the first term to A*(N/N+1).
So the algorithm is:
n = 0
avg = 0
for each i in list
avg = avg*(n/(n+1)) + i/(n+1)
n = n+1

Generating random sublist from ordered list that maintains ordering

Consider a problem where a random sublist of k items, Y, must be selected from X, a list of n items, where the items in Y must appear in the same order as they do in X. The selected items in Y need not be distinct. One solution is this:
for i = 1 to k
A[i] = floor(rand * n) + 1
Y[i] = X[A[i]]
sort Y according to the ordering of A
However, this has running time O(k log k) due to the sort operation. To remove this it's tempting to
high_index = n
for i = 1 to k
index = floor(rand * high_index) + 1
Y[k - i + 1] = X[index]
high_index = index
But this gives a clear bias to the returned list due to the uniform index selection. It feels like a O(k) solution is attainable if the indices in the second solution were distributed non-uniformly. Does anyone know if this is the case, and if so what properties the distribution the marginal indices are drawn from has?
Unbiased O(n+k) solution is trivial, high-level pseudo code.
create an empty histogram of size n [initialized with all elements as zeros]
populate it with k uniformly distributed variables at range. (do k times histogram[inclusiveRand(1,n)]++)
iterate the initial list [A], while decreasing elements in the histogram and appending elements to the result list.
Explanation [edit]:
The idea is to chose k elements out of n at random, with uniform
distribution for each, and create a histogram out of it.
This histogram now contains for each index i, how many times A[i] will appear in the resulting Y list.
Now, iterate the list A in-order, and for each element i, insert A[i] into the resulting Y list histogram[i] times.
This guarantees you maintain the order because you insert elements in order, and "never go back".
It also guarantees unbiased solution since for each i,j,K: P(histogram[i]=K) = P(histogram[j]=K), so for each K, each element has the same probability to appear in the resulting list K times.
I believe it can be done in O(k) using the order statistics [X(i)] but I cannot figure it out though :\
By your first algorithm, it suffices to generate k uniform random samples of [0, 1) in sorted order.
Let X1, ..., Xk be these samples. Given that Xk = x, the conditional distribution of X1, ..., Xk-1 is k - 1 uniform random samples of [0, x) in sorted order, so it suffices to sample Xk and recurse.
What's the probability that Xk < x? Each of k independent samples of [0, 1) must be less than x, so the answer (the cumulative distribution function for Xk) is x^k. To sample according to the cdf, all we have to do is invert it on a uniform random sample of [0, 1): pow(random(), 1.0 / k).
Here's an (expected) O(k) algorithm I actually would consider implementing. The idea is to dump the samples into k bins, sort each bin, and concatenate. Here's some untested Python:
def samples(n, k):
bins = [[] for i in range(k)]
for i in range(k):
x = randrange(n)
bins[(x * k) // n].append(x)
result = []
for bin in bins:
bin.sort()
result.extend(bin)
return result
Why is this efficient in expectation? Let's suppose we use insertion sort on each bin (each bin has expected size O(1)!). On top of operations that are O(k), we're going to pay proportionally to the number of sum of the squares of the bin sizes, which is basically the number of collisions. Since the probability of two samples colliding is at most something like 4/k and we have O(k^2) pairs of samples, the expected number of collisions is O(k).
I suspect rather strongly that the O(k) guarantee can be made with high probability.
You can use counting sort to sort Y and thus make the sorting linear with respect to k. However for that you need one additional array of length n. If we assume you have already allocated that, you may execute the code you are asking for arbitrary many times with complexity O(k).
The idea is just as you describe, but I will use one more array cnt of size n that I assume is initialized to 0, and another "stack" st that I assume is empty.
for i = 1 to k
A[i] = floor(rand * n) + 1
cnt[A[i]]+=1
if cnt[A[i]] == 1 // Needed to be able to traverse the inserted elements faster
st.push(A[i])
for elem in st
for i = 0 to cnt[elem]
Y.add(X[elem])
for elem in st
cnt[elem] = 0
EDIT: as mentioned by oldboy what I state in the post is not true - I still have to sort st, which might be a bit better then the original proposition but not too much. So This approach will only be good if k is comparable to n and then we just iterate trough cnt linearly and construct Y this way. This way st is not needed:
for i = 1 to k
A[i] = floor(rand * n) + 1
cnt[A[i]]+=1
for i = 1 to k
for j = 0 to cnt[i]
Y.add(X[i])
cnt[i] =0
For the first index in Y, the distribution of indices in X is given by:
P(x; n, k) = binomial(n - x + k - 2, k - 1) / norm
where binomial denotes calculation of the binomial coefficient, and norm is a normalisation factor, equal to the total number of possible sublist configurations.
norm = binomial(n + k - 1, k)
So for k = 5 and n = 10 we have:
norm = 2002
P(x = 0) = 0.357, P(x <= 0) = 0.357
P(x = 1) = 0.245, P(x <= 1) = 0.604
P(x = 2) = 0.165, P(x <= 2) = 0.769
P(x = 3) = 0.105, P(x <= 3) = 0.874
P(x = 4) = 0.063, P(x <= 4) = 0.937
... (we can continue this up to x = 10)
We can sample the X index of the first item in Y from this distribution (call it x1). The distribution of the second index in Y can then be sampled in the same way with P(x; (n - x1), (k - 1)), and so on for all subsequent indices.
My feeling now is that the problem is not solvable in O(k), because in general we are unable to sample from the distribution described in constant time. If k = 2 then we can solve in constant time using the quadratic formula (because the probability function simplifies to 0.5(x^2 + x)) but I can't see a way to extend this to all k (my maths isn't great though).
The original list X has n items. There are 2**n possible sublists, since every item will or will not appear in the resulting sublist: each item adds a bit to the enumeration of the possible sublists. You could view this enumeration of a bitword of n bits.
Since your are only want sublists with k items, you are interested in bitwords with exactly k bits set.
A practical algorithm could pick (or pick not) the first element from X, and then recurse into the rightmost n-1 substring of X, taking into account the accumulated number of chosen items. Since the X list is processed in order, the Y list will also be in order.
The original list X has n items. There are 2**n possible sublists, since every item will or will not appear in a sublist: each item adds a bit to the enumeration of the possible sublists. You could view this enumeration of a bitword of n bits.
Since your are only want sublists with k items, you are interested in bitwords with exactly k bits set. A practical algorithm could pick (or pick not) the first element from X, and then recurse into the rightmost n-1 substring of X, taking into account the accumulated number of chosen items. Since the X list is processed in order, the Y list will also be in order.
#include <stdio.h>
#include <string.h>
unsigned pick_k_from_n(char target[], char src[], unsigned k, unsigned n, unsigned done);
unsigned pick_k_from_n(char target[], char src[]
, unsigned k, unsigned n, unsigned done)
{
unsigned count=0;
if (k>n) return 0;
if (k==0) {
target[done] = 0;
puts(target);
return 1;
}
if (n > 0) {
count += pick_k_from_n(target, src+1, k, n-1, done);
target[done] = *src;
count += pick_k_from_n(target, src+1, k-1, n-1, done+1);
}
return count;
}
int main(int argc, char **argv) {
char result[20];
char *domain = "OmgWtf!";
unsigned cnt ,len, want;
want = 3;
switch (argc) {
default:
case 3:
domain = argv[2];
case 2:
sscanf(argv[1], "%u", &want);
case 1:
break;
}
len = strlen(domain);
cnt = pick_k_from_n(result, domain, want, len, 0);
fprintf(stderr, "Count=%u\n", cnt);
return 0;
}
Removing the recursion is left as an exercise to the reader.
Some output:
plasser#pisbak:~/hiero/src$ ./a.out 3 ABBA
BBA
ABA
ABA
ABB
Count=4
plasser#pisbak:~/hiero/src$

How to solve project euler #21 faster?

Original Problem
Let d(n) be defined as the sum of proper divisors of n (numbers less than n which divide evenly into n).
If d(a) = b and d(b) = a, where a b, then a and b are an amicable pair and each of a and b are called amicable numbers.
For example, the proper divisors of 220 are 1, 2, 4, 5, 10, 11, 20, 22, 44, 55 and 110; therefore d(220) = 284. The proper divisors of 284 are 1, 2, 4, 71 and 142; so d(284) = 220.
Evaluate the sum of all the amicable numbers under 10000.
I solved the problem by generating a hash of all the numbers between 1 - 10000 and their corresponding divisors sum (ie hash[220] = 284). I then compared the items in the hash with a copy of the hash... anyways, it works, but it takes a long time. How can I make this faster?
def proper_divs_sum num
divs = [1]
for i in 2..((num/2) + 1)
if num % i == 0
divs.push i
end
end
divs_sum = 0
divs.each do |div|
divs_sum += div
end
return divs_sum
end
def n_d_hash_gen num
nd_hash = {}
for i in 1..num
nd_hash[i] = proper_divs_sum(i)
end
return nd_hash
end
def amicables num
amicable_list = []
hash1 = n_d_hash_gen(num)
hash2 = n_d_hash_gen(num)
hash1.each do |item1|
hash2.each do |item2|
if item1 != item2 && (item1[0] == item2[1] && item2[0] == item1[1])
amicable_list.push item1
end
end
end
return amicable_list
end
Also, I am new to Ruby, so any tips on how to make this more Ruby-like would also be much appreciated.
The function d(n) (more commonly known as σ(n)) is a variant of the divisor function, and it has an important property which lets you calculate it much more efficiently. It is a multiplicative function, which means that if n = ab, where a and b are coprime, then d(n) = d(a) d(b).
This means that if you can calculate d(pk) where p is prime, then d(n) = d(p1k1) ... d(prkr), where n = p1k1...prkr is the prime factorization of n. In fact, it turns out that d(pk) = (pk+1 - 1) / (p - 1), so d(n) = ∏i (piki+1 - 1) / (pi - 1).
So to calculate d(n) efficiently for all 1 &leq; n &leq; 10000, you can use a sieve to calculate the prime factorizations of all n, and then use the formula above to calculate d(n) using the prime factorization.
Once you've done that, all you need is a simple loop to calculate the sum of all n for which d(d(n)) = n.
This can even be optimized further, by combining the sieving step with the calculation of d(n), but I'll leave that as an exercise for the interested. It is not necessary for the size of this particular problem.
There are a couple of things you can do to improve your algorithm:
1) There is no need to loop to n/2 when you compute the divisors. Stop at sqrt(2) instead. By that point you have found half the divisors; the other half are computed as n divided by the first half.
2) When you enter a number in the hash table, you can immediately check if its amicable twin is already in the hash table. No need for two hash tables, or for two nested loops comparing them.
Analysis of your approach
The approach you are taking is to start with a dividing, find its divisors, sum them up, and store them. You'll notice that the method you are using to find the divisors is a naïve one—I don't say this as an insult; it's only to say that your approach doesn't use any information it may have available, and only tries every number to see if it is a divisor. It does this by using modular division, and, in almost every case, the majority of candidates fail the test.
Something more constructive
Consider if you never had to try numbers that could fail a test like this. In fact, starting with the divisors and building up the dividends from there would skirt the issue altogether.
You can do this by looping through every number <= 5000. These are your divisors, the multiples of which are your dividends. Then add the divisor to the sum of divisors for each multiple.
This approach works up the sums bit-by-bit; by the time you've worked through every divisor, you'll have an array mapping dividend to divisor. From there, you can use a method like you already have to search for amicable numbers in this list.
Division is a slow process. In your approach you are doing a lot of it, therefor your program is slow.
First of all in trying to find all divisors of a number you are trying all divisors not larger than half that number as potential divisors. You can improve on that by not going further than the square root of the number. If a number is divisible by a number larger than it's square root, the result of the division will be smaller than the square root. This will eliminate some unnecessary divisions.
Also if a number is not divisble by 2 it will also be not divisble by 4, 6, 8 etc. It is better to just divide by primes and build the possible divisors from those.
However, the problem can be solved by doing no divisions at all.
Another solution in Java:
static int sum_Of_Divisors(int n){
int limit = n;
int sum = 0;
for(int i=1;i<limit;i++){
if(n%i==0){
if(i!=1)
sum += (i + n/i);
else
sum += i;
limit = n/i;
}
}
return sum;
}
static boolean isAmicable(int n, HashSet<Integer> set){
int sum = sum_Of_Divisors(n);
if(sum_Of_Divisors(sum)==n && n!=sum){
set.add(sum);
return true;
}
return false;
}
static long q21(){
long sum = 0;
HashSet<Integer> set = new HashSet<Integer>();
for(int i=1;i<10000;i++){
if(!set.contains(i)){
if(isAmicable(i,set)){
set.add(i);
}
}
}
for(Integer i: set) sum+=i;
return sum;
}
You can "cheat" and use Ruby's stdlib prime stuff: https://rbjl.janlelis.com/37/euler-021.rb

Rescale a vector of integers

Assume that I have a vector, V, of positive integers. If the sum of the integers are larger than a positive integer N, I want to rescale the integers in V so that the sum is <= N. The elements in V must remain above zero. The length of V is guaranteed to be <= N.
Is there an algorithm to perform this rescaling in linear time?
This is not homework, BTW :). I need to rescale a map from symbols to symbol frequencies to use range encoding.
Some quick thinking and googling has not given a solution to the problem.
EDIT:
Ok, the question was somewhat unclear. "Rescale" means "normalize". That is, transform the integers in V, for example by multiplying them by a constant, to smaller positive integers so the criterion of sum(V) <= N is fulfilled. The better the ratios between the integers are preserved, the better the compression will be.
The problem is open-ended in that way, the method does not need to find the optimal (in, say, a least squares fit sense) way to preserve the ratios, but a "good" one. Setting the entire vector to 1, as suggested, is not acceptable (unless forced). "Good" enough would for example be finding the smallest divisor (defined below) that fulfills the sum criterion.
The following naive algorithm does not work.
Find the current sum(V), Sv
divisor := int(ceil(Sv/N))
Divide each integer in V by divisor, rounding down, but not to less than 1.
This fails on v = [1,1,1,10] with N = 5.
divisor = ceil(13 / 5) = 3.
V := [1,1,1, max(1, floor(10/3)) = 3]
Sv is now 6 > 5.
In this case, the correct normalization is [1,1,1,2]
One algorithm that would work is to do a binary search for divisor (defined above) until the smallest divisor in [1,N] fulfilling the sum criterion is found. Starting with the ceil(Sv/N) guess. This is however, not linear in number of operations, but proportional to len(V)*log(len(V)).
I am starting to think that it is impossible to do well, in linear time, in the general case. I might resort to some sort of heuristic.
Just divide all the integers by their Greatest Common Divisor. You can find the GCD efficiently with multiple applications of Euclid's Algorithm.
d = 0
for x in xs:
d = gcd(d, x)
xs = [x/d for x in xs]
The positive point is that you always have a small as possible representation this way, without throwing away any precision and without needing to choose a specific N. The downside is that if your frequencies are large coprime numbers you will have no choice but to sacrifice precision (and you didn't specify what should be done in this case).
How about this:
Find the current sum(V), Sv
divisor := int(ceil(Sv/(N - |V| + 1))
Divide each integer in V by divisor, rounding up
On v = [1,1,1,10] with N = 5:
divisor = ceil(13 / 2) = 7.
V := [1,1,1, ceil(10/7)) = 2]
I think you should just rescale the part above 1. So, subtract 1 from all values, and V.length from N. Then rescale normally, then add 1 back. You can even do slightly better if you keep running totals as you go along, instead of choosing just one factor, which will usually waste some "number space". Something like this:
public static void rescale(int[] data, int N) {
int sum = 0;
for (int d : data)
sum += d;
if (sum > N) {
int n = N - data.length;
sum -= data.length;
for (int a = 0; a < data.length; a++) {
int toScale = data[a] - 1;
int scaled = Math.round(toScale * (float) n / sum);
data[a] = scaled + 1;
n -= scaled;
sum -= toScale;
}
}
}
This is a problem of 'range normalization', but it's very easy. Suppose that S is the sum of the elements of the vector, and S>=N, then S=dN, for some d>=1. Therefore d=S/N. So just multiply every element of the vector by N/S (i.e. divide by d). The result is a vector with rescaled components which sum is exactly N. This procedure is clearly linear :)

Resources