So I have 10 numbers. Lets say each number represents the skill of an individual. If I were to create 2 teams of 5 , how would i make 2 teams such that the difference of their teams sum is minimal.
With 10 numbers, the easiest way would be to go over all combinations and calculate the difference.
This is similar to the Knapsack problem: You try to put individuals in one of the teams so that this team's sum is the biggest value not larger than half of the total sum. It would be the same if team size was not restricted.
Here's a crazy idea I came up with.
Time Complexity : O(N log N)
Sort the numbers.
Find the target sum for the set(T) that we would like to hit(Sum of all values/2)
Let Q=set of first 5 numbers in sorted list.Q will be our final set , which we will iteratively improve.
for(each element q from last element to first element of Q)
{
Find a number p that is not currently used
which if swapped with the current element q
makes the sum closer to T but not more than T.
Remove q from Q
Add p to Q
}
return Q as best set.
Though the for loop looks as though it's O(N2), one can do binary search to find the number p.So it's O(N*log N)
Disclaimer:I have only described the algorithm.I don't know how to formally prove it.
Generate all combination of 5 elements. You will have those 5 in a a team and the remaining in the other team. Compare all results and choose the one with the smallest difference. You can create all those combination with 5 for loops.
I just tried it out - unfortunately I had to program that permutation thing (function next) and call result.fit for every element.
Can be done nicer, but for demonstration it should be good enough.
var all = [ 3, 4, 5, 8 , 2, 1, 1, 4, 9, 10 ];
function sumArray(a) {
var asum = 0;
a.forEach(function(v){ asum += v });
return asum;
}
var next = function(start, rest, nbr, result) {
if (nbr < 0) {
result.fit(start);
return;
}
for (var i = 0; i < rest.length - nbr; ++i) {
var clone = start.slice(0);
clone.push(rest[i]);
next(clone, rest.slice(i + 1), nbr - 1, result);
}
};
var result = {
target: sumArray(all) / 2,
best: [],
bestfit: Math.pow(2,63), // really big
fit: function(a) {
var asum = sumArray(a);
var fit = Math.abs(asum - this.target);
if (fit < this.bestfit) {
this.bestfit = fit;
this.best = a;
}
}
}
next([], all, all.length / 2, result);
console.log(JSON.stringify(result.best));
Same algorithm as most -- compare 126 combinations. Code in Haskell:
inv = [1,2,3,4,5,6,7,8,9,10]
best (x:xs) (a,b)
| length a == 5 = [(abs (sum a - sum (x:xs ++ b)),(a,x:xs ++ b))]
| length b == 5 = [(abs (sum (x:xs ++ a) - sum b),(x:xs ++ a,b))]
| otherwise = let s = best xs (x:a,b)
s' = best xs (a,x:b)
in if fst (head s) < fst (head s') then s
else if fst (head s') < fst (head s) then s'
else s ++ s'
main = print $ best (tail inv) ([head inv],[])
Output:
*Main> main
[(1,([9,10,5,2,1],[8,7,6,4,3])),(1,([10,8,6,2,1],[9,7,5,4,3]))
,(1,([9,10,6,2,1],[8,7,5,4,3])),(1,([9,8,7,2,1],[10,6,5,4,3]))
,(1,([10,8,7,2,1],[9,6,5,4,3])),(1,([9,10,4,3,1],[8,7,6,5,2]))
,(1,([10,8,5,3,1],[9,7,6,4,2])),(1,([9,10,5,3,1],[8,7,6,4,2]))
,(1,([10,7,6,3,1],[9,8,5,4,2])),(1,([9,8,6,3,1],[10,7,5,4,2]))
,(1,([10,8,6,3,1],[9,7,5,4,2])),(1,([9,8,7,3,1],[10,6,5,4,2]))
,(1,([10,7,5,4,1],[9,8,6,3,2])),(1,([9,8,5,4,1],[10,7,6,3,2]))
,(1,([10,8,5,4,1],[9,7,6,3,2])),(1,([9,7,6,4,1],[10,8,5,3,2]))
,(1,([10,7,6,4,1],[9,8,5,3,2])),(1,([9,8,6,4,1],[10,7,5,3,2]))
,(1,([8,7,6,5,1],[9,10,4,3,2])),(1,([9,7,6,5,1],[10,8,4,3,2]))]
This is an instance of the Partition problem, but for your tiny instance testing all combinations should be fast enough.
Related
I was asked in an interview today below question. I gave O(nlgn) solution but I was asked to give O(n) solution. I could not come up with O(n) solution. Can you help?
An input array is given like [1,2,4] then every element of it is doubled and
appended into the array. So the array now looks like [1,2,4,2,4,8]. How
this array is randomly shuffled. One possible random arrangement is
[4,8,2,1,2,4]. Now we are given this random shuffled array and we want to
get original array [1,2,4] in O(n) time.
The original array can be returned in any order. How can I do it?
Here's an O(N) Java solution that could be improved by first making sure that the array is of the proper form. For example it shouldn't accept [0] as an input:
import java.util.*;
class Solution {
public static int[] findOriginalArray(int[] changed) {
if (changed.length % 2 != 0)
return new int[] {};
// set Map size to optimal value to avoid rehashes
Map<Integer,Integer> count = new HashMap<>(changed.length*100/75);
int[] original = new int[changed.length/2];
int pos = 0;
// count frequency for each number
for (int n : changed) {
count.put(n, count.getOrDefault(n,0)+1);
}
// now decide which go into the answer
for (int n : changed) {
int smallest = n;
for (int m=n; m > 0 && count.getOrDefault(m,0) > 0; m = m/2) {
//System.out.println(m);
smallest = m;
if (m % 2 != 0) break;
}
// trickle up from smallest to largest while count > 0
for (int m=smallest, mm = 2*m; count.getOrDefault(mm,0) > 0; m = mm, mm=2*mm){
int ct = count.getOrDefault(mm,0);
while (count.get(m) > 0 && ct > 0) {
//System.out.println("adding "+m);
original[pos++] = m;
count.put(mm, ct -1);
count.put(m, count.get(m) - 1);
ct = count.getOrDefault(mm,0);
}
}
}
// check for incorrect format
if (count.values().stream().anyMatch(x -> x > 0)) {
return new int[] {};
}
return original;
}
public static void main(String[] args) {
int[] changed = {1,2,4,2,4,8};
System.out.println(Arrays.toString(changed));
System.out.println(Arrays.toString(findOriginalArray(changed)));
}
}
But I've tried to keep it simple.
The output is NOT guaranteed to be sorted. If you want it sorted it's going to cost O(NlogN) inevitably unless you use a Radix sort or something similar (which would make it O(NlogE) where E is the max value of the numbers you're sorting and logE the number of bits needed).
Runtime
This may not look that it is O(N) but you can see that it is because for every loop it will only find the lowest number in the chain ONCE, then trickle up the chain ONCE. Or said another way, in every iteration it will do O(X) iterations to process X elements. What will remain is O(N-X) elements. Therefore, even though there are for's inside for's it is still O(N).
An example execution can be seen with [64,32,16,8,4,2].
If this where not O(N) if you print out each value that it traverses to find the smallest you'd expect to see the values appear over and over again (for example N*(N+1)/2 times).
But instead you see them only once:
finding smallest 64
finding smallest 32
finding smallest 16
finding smallest 8
finding smallest 4
finding smallest 2
adding 2
adding 8
adding 32
If you're familiar with the Heapify algorithm you'll recognize the approach here.
def findOriginalArray(self, changed: List[int]) -> List[int]:
size = len(changed)
ans = []
left_elements = size//2
#IF SIZE IS ODD THEN RETURN [] NO SOLN. IS POSSIBLE
if(size%2 !=0):
return ans
#FREQUENCY DICTIONARY given array [0,0,2,1] my map will be: {0:2,2:1,1:1}
d = {}
for i in changed:
if(i in d):
d[i]+=1
else:
d[i] = 1
# CHECK THE EDGE CASE OF 0
if(0 in d):
count = d[0]
half = count//2
if((count % 2 != 0) or (half > left_elements)):
return ans
left_elements -= half
ans = [0 for i in range(half)]
#CHECK REST OF THE CASES : considering the values will be 10^5
for i in range(1,50001):
if(i in d):
if(d[i] > 0):
count = d[i]
if(count > left_elements):
ans = []
break
left_elements -= d[i]
for j in range(count):
ans.append(i)
if(2*i in d):
if(d[2*i] < count):
ans = []
break
else:
d[2*i] -= count
else:
ans = []
break
return ans
I have a simple idea which might not be the best, but I could not think of a case where it would not work. Having the array A with the doubled elements and randomly shuffled, keep a helper map. Process each element of the array and, each time you find a new element, add it to the map with the value 0. When an element is processed, increment map[i] and decrement map[2*i]. Next you iterate over the map and print the elements that have a value greater than zero.
A simple example, say that the vector is:
[1, 2, 3]
And the doubled/shuffled version is:
A = [3, 2, 1, 4, 2, 6]
When processing 3, first add the keys 3 and 6 to the map with value zero. Increment map[3] and decrement map[6]. This way, map[3] = 1 and map[6] = -1. Then for the next element map[2] = 1 and map[4] = -1 and so forth. The final state of the map in this example would be map[1] = 1, map[2] = 1, map[3] = 1, map[4] = -1, map[6] = 0, map[8] = -1, map[12] = -1.
Then you just process the keys of the map and, for each key with a value greater than zero, add it to the output. There are certainly more efficient solutions, but this one is O(n).
In C++, you can try this.
With time is O(N + KlogK) where N is the length of input, and K is the number of unique elements in input.
class Solution {
public:
vector<int> findOriginalArray(vector<int>& input) {
if (input.size() % 2) return {};
unordered_map<int, int> m;
for (int n : input) m[n]++;
vector<int> nums;
for (auto [n, cnt] : m) nums.push_back(n);
sort(begin(nums), end(nums));
vector<int> out;
for (int n : nums) {
if (m[2 * n] < m[n]) return {};
for (int i = 0; i < m[n]; ++i, --m[2 * n]) out.push_back(n);
}
return out;
}
};
Not so clear about the space complexity required in the question, so this is my top-of-the-mind attempt to this question if this requires O(n) time complexity.
If the length of the input array is not even, then its wrong !!
Create a map, add the elements of the input array to it.
Divide each element in the input array by 2 and check if that value exists in the map. If it exists, add it to the array (slice) orig.
There is a chance we have added duplicate values to this original array, clean it!!
Here is a sample go code:
https://go.dev/play/p/w4mm-rloHyi
I am sure we can optimize this code in a lot of ways for space complexities. But its O(n) time complexity.
The puzzle
For every input number n (n < 10) there is an output number m such that:
m's first digit is n
m is an n digit number
every 2 digit sequence inside m must be a different prime number
The output should be m where m is the smallest number that fulfils the conditions above. If there is no such number, the output should be -1;
Examples
n = 3 -> m = 311
n = 4 -> m = 4113 (note that this is not 4111 as that would be repeating 11)
n = 9 -> m = 971131737
My somewhat working solution
Here's my first stab at this, the "brute force" approach. I am looking for a more elegant solution as this is very inefficient as n grows larger.
public long GetM(int n)
{
long start = n * (long)Math.Pow((double)10, (double)n - 1);
long end = n * (long)Math.Pow((double)10, (double)n);
for (long x = start; x < end; x++)
{
long xCopy = x;
bool allDigitsPrime = true;
List<int> allPrimeNumbers = new List<int>();
while (xCopy >= 10)
{
long lastDigitsLong = xCopy % 100;
int lastDigits = (int)lastDigitsLong;
bool lastDigitsSame = allPrimeNumbers.Count != 0 && allPrimeNumbers.Contains(lastDigits);
if (!IsPrime(lastDigits) || lastDigitsSame)
{
allDigitsPrime = false;
break;
}
xCopy /= 10;
allPrimeNumbers.Add(lastDigits);
}
if (n != 1 && allDigitsPrime)
{
return x;
}
}
return -1;
}
Initial thoughts on how this could be made more efficient
So, clearly the bottleneck here is traversing through the whole list of numbers that could fulfil this condition from n.... to (n+1).... . Instead of simply incrementing the number of every iteration of the loop, there must be some clever way of skipping numbers based on the requirement that the 2 digit sequences must be prime. For instance for n = 5, there is no point going through 50000 - 50999 (50 isn't prime), 51200 - 51299 (12 isn't prime), but I wasn't quite sure how this could be implemented or if it would be enough of an optimization to make the algorithm run for n=9.
Any ideas on this approach or a different optimization approach?
You don't have to try all numbers. You can instead use a different strategy, summed up as "try appending a digit".
Which digit? Well, a digit such that
it forms a prime together with your current last digit
the prime formed has not occurred in the number before
This should be done recursively (not iteratively), because you may run out of options and then you'd have to backtrack and try a different digit earlier in the number.
This is still an exponential time algorithm, but it avoids most of the search space because it never tries any numbers that don't fit the rule that every pair of adjacent digits must form a prime number.
Here's a possible solution, in R, using recursion . It would be interesting to build a tree of all the possible paths
# For every input number n (n < 10)
# there is an output number m such that:
# m's first digit is n
# m is an n digit number
# every 2 digit sequence inside m must be a different prime number
# Need to select the smallest m that meets the criteria
library('numbers')
mNumHelper <- function(cn,n,pr,cm=NULL) {
if (cn == 1) {
if (n==1) {
return(1)
}
firstDigit <- n
} else {
firstDigit <- mod(cm,10)
}
possibleNextNumbers <- pr[floor(pr/10) == firstDigit]
nPossible = length(possibleNextNumbers)
if (nPossible == 1) {
nextPrime <- possibleNextNumbers
} else{
# nextPrime <- sample(possibleNextNumbers,1)
nextPrime <- min(possibleNextNumbers)
}
pr <- pr[which(pr!=nextPrime)]
if (is.null(cm)) {
cm <- nextPrime
} else {
cm = cm * 10 + mod(nextPrime,10)
}
cn = cn + 1
if (cn < n) {
cm = mNumHelper(cn,n,pr,cm)
}
return(cm)
}
mNum <- function(n) {
pr<-Primes(10,100)
m <- mNumHelper(1,n,pr)
}
for (i in seq(1,9)) {
print(paste('i',i,'m',mNum(i)))
}
Sample output
[1] "i 1 m 1"
[1] "i 2 m 23"
[1] "i 3 m 311"
[1] "i 4 m 4113"
[1] "i 5 m 53113"
[1] "i 6 m 611317"
[1] "i 7 m 7113173"
[1] "i 8 m 83113717"
[1] "i 9 m 971131737"
Solution updated to select the smallest prime from the set of available primes, and remove bad path check since it's not required.
I just made a list of the two-digit prime numbers, then solved the problem by hand; it took only a few minues. Not every problem requires a computer!
I have a list and I need to find and extract all numbers in close proximity to a new list.
for example I have a list:
1,5,10,8,11,14,15,11,14,1,4,7,5,9
so if I want to extract all numbers that are close by 3(only 3, the gap must be 3, so 11,14 is correct, 11,13 is not.) near each other how can I design this without hard-coding the whole thing?
the result should look like:
8,11,14,11,14,1,4,7
This doesn't look too hard ,but I'm kind stuck, all I can come up with is a loop that checks n+1 member of the loop if it's more than n by 3 and include the n+1 member in a new list, however I don't know how to include the n member without making it appear on the new list twice if there is a string of needed numbers.
any ideas?
Just loop through the list, checking the next and previous element, and save the current one if it differs by 3 from either one. In Python, that's
>>> l = [1,5,10,8,11,14,15,11,14,1,4,7,5,9]
>>> # pad with infinities to ease the final loop
>>> l = [float('-inf')] + l + [float('inf')]
>>> [x for i, x in enumerate(l[1:-1], 1)
... if 3 in (abs(x - l[i-1]), abs(x - l[i+1]))]
[8, 11, 14, 11, 14, 1, 4, 7]
In Matlab
list = [1,5,10,8,11,14,15,11,14,1,4,7,5,9]
then
list(or([diff([0 diff(list)==3]) 0],[0 diff(list)==3]))
returns
8 11 14 11 14 1 4 7
For those who don't understand Matlab diff(list) returns the first (forward) differences of the elements in list. The expression [0 diff(list)] pads the first differences with a leading 0 to make the result the same length as the original list. The rest should be obvious.
In a nutshell: take forward differences and backward differences, select the elements where either difference is 3.
A simple C++ code below:
assuming ar is the array of the initial integers and mark is a boolean array
for(int i=1;i<N;i++){
if(ar[i]-ar[i-1]==3){
mark[i]=1;
mark[i-1]=1;
}
}
Now to print the interesting numbers,
for(int i=0;i<N;i++){
if(mark[i]==1)cout<<ar[i]<<" ";
}
The idea behind the implementation is, we mark a number as interesting if the difference from it to its previous one is 3 or if the difference between it and its next number is 3.
that's a single loop:
public List<int> CloseByN(int n, List<int> oldL)
{
bool first = true;
int last = 0;
bool isLstAdded = false;
List<int> newL = new List<int>();
foreach(int curr in oldL)
{
if(first)
{
first = false;
last = curr;
continue;
}
if(curr - last == n)
{
if(isLstAdded == false)
{
newL.Add(last);
isLstAdded = true;
}
newL.Add(curr);
}
else
{
isLstAdded = false;
}
last = curr;
}
return newL;
}
tested on your input and got your output
And a Haskell version:
f g xs = dropWhile (null . drop 1) $ foldr comb [[last xs]] (init xs) where
comb a bbs#(b:bs)
| abs (a - head b) == g = (a:head bbs) : bs
| otherwise =
if null (drop 1 b) then [a] : bs else [a] : bbs
Output:
*Main> f 3 [5,10,8,11,14,15,11,14,1,4,7,5,9]
[[8,11,14],[11,14],[1,4,7]]
*Main> f 5 [5,10,8,11,14,15,11,14,1,4,7,5,9]
[[5,10]]
The Problem
I need an algorithm that does this:
Find all the unique ways to partition a given sum across 'buckets' not caring about order
I hope I was clear reasonably coherent in expressing myself.
Example
For the sum 5 and 3 buckets, what the algorithm should return is:
[5, 0, 0]
[4, 1, 0]
[3, 2, 0]
[3, 1, 1]
[2, 2, 1]
Disclaimer
I'm sorry if this question might be a dupe, but I don't know exactly what these sort of problems are called. Still, I searched on Google and SO using all wordings that I could think of, but only found results for distributing in the most even way, not all unique ways.
Its bit easier for me to code few lines than writing a 5-page essay on algorithm.
The simplest version to think of:
vector<int> ans;
void solve(int amount, int buckets, int max){
if(amount <= 0) { printAnswer(); return;}
if(amount > buckets * max) return; // we wont be able to fulfill this request anymore
for(int i = max; i >= 1; i--){
ans.push_back(i);
solve(amount-i, buckets-1, i);
ans.pop_back();
}
}
void printAnswer(){
for(int i = 0; i < ans.size(); i++) printf("%d ", ans[i]);
for(int i = 0; i < all_my_buckets - ans.size(); i++) printf("0 ");
printf("\n");
}
Its also worth improving to the point where you stack your choices like solve( amount-k*i, buckets-k, i-1) - so you wont create too deep recurrence. (As far as I know the stack would be of size O(sqrt(n)) then.
Why no dynamic programming?
We dont want to find count of all those possibilities, so even if we reach the same point again, we would have to print every single number anyway, so the complexity will stay the same.
I hope it helps you a bit, feel free to ask me any question
Here's something in Haskell that relies on this answer:
import Data.List (nub, sort)
parts 0 = []
parts n = nub $ map sort $ [n] : [x:xs | x <- [1..n`div`2], xs <- parts(n - x)]
partitions n buckets =
let p = filter (\x -> length x <= buckets) $ parts n
in map (\x -> if length x == buckets then x else addZeros x) p
where addZeros xs = xs ++ replicate (buckets - length xs) 0
OUTPUT:
*Main> partitions 5 3
[[5,0,0],[1,4,0],[1,1,3],[1,2,2],[2,3,0]]
If there are only three buckets this wud be the simplest code.
for(int i=0;i<=5;i++){
for(int j=0;j<=5-i&&j<=i;j++){
if(5-i-j<=i && 5-i-j<=j)
System.out.println("["+i+","+j+","+(5-i-j)+"]");
}
}
A completely different method, but if you don't care about efficiency or optimization, you could always use the old "bucket-free" partition algorithms. Then, you could filter the search by checking the number of zeroes in the answers.
For example [1,1,1,1,1] would be ignored since it has more than 3 buckets, but [2,2,1,0,0] would pass.
This is called an integer partition.
Fast Integer Partition Algorithms is a comprehensive paper describing all of the fastest algorithms for performing an integer partition.
Just adding my approach here along with the others'. It's written in Python, so it's practically like pseudocode.
My first approach worked, but it was horribly inefficient:
def intPart(buckets, balls):
return uniqify(_intPart(buckets, balls))
def _intPart(buckets, balls):
solutions = []
# base case
if buckets == 1:
return [[balls]]
# recursive strategy
for i in range(balls + 1):
for sol in _intPart(buckets - 1, balls - i):
cur = [i]
cur.extend(sol)
solutions.append(cur)
return solutions
def uniqify(seq):
seen = set()
sort = [list(reversed(sorted(elem))) for elem in seq]
return [elem for elem in sort if str(elem) not in seen and not seen.add(str(elem))]
Here's my reworked solution. It completely avoids the need to 'uniquify' it by the tracking the balls in the previous bucket using the max_ variable. This sorts the lists and prevents any dupes:
def intPart(buckets, balls, max_ = None):
# init vars
sols = []
if max_ is None:
max_ = balls
min_ = max(0, balls - max_)
# assert stuff
assert buckets >= 1
assert balls >= 0
# base cases
if (buckets == 1):
if balls <= max_:
sols.append([balls])
elif balls == 0:
sol = [0] * buckets
sols.append(sol)
# recursive strategy
else:
for there in range(min_, balls + 1):
here = balls - there
ways = intPart(buckets - 1, there, here)
for way in ways:
sol = [here]
sol.extend(way)
sols.append(sol)
return sols
Just for comprehensiveness, here's another answer stolen from MJD written in Perl:
#!/usr/bin/perl
sub part {
my ($n, $b, $min) = #_;
$min = 0 unless defined $min;
# base case
if ($b == 0) {
if ($n == 0) { return ([]) }
else { return () }
}
my #partitions;
for my $first ($min .. $n) {
my #sub_partitions = part($n - $first, $b-1, $first);
for my $sp (#sub_partitions) {
push #partitions, [$first, #$sp];
}
}
return #partitions;
}
I'm working on a project for fun and I need an algorithm to do as follows:
Generate a list of numbers of Length n which add up to x
I would settle for list of integers, but ideally, I would like to be left with a set of floating point numbers.
I would be very surprised if this problem wasn't heavily studied, but I'm not sure what to look for.
I've tackled similar problems in the past, but this one is decidedly different in nature. Before I've generated different combinations of a list of numbers that will add up to x. I'm sure that I could simply bruteforce this problem but that hardly seems like the ideal solution.
Anyone have any idea what this may be called, or how to approach it? Thanks all!
Edit: To clarify, I mean that the list should be length N while the numbers themselves can be of any size.
edit2: Sorry for my improper use of 'set', I was using it as a catch all term for a list or an array. I understand that it was causing confusion, my apologies.
This is how to do it in Python
import random
def random_values_with_prescribed_sum(n, total):
x = [random.random() for i in range(n)]
k = total / sum(x)
return [v * k for v in x]
Basically you pick n random numbers, compute their sum and compute a scale factor so that the sum will be what you want it to be.
Note that this approach will not produce "uniform" slices, i.e. the distribution you will get will tend to be more "egalitarian" than it should be if it was picked at random among all distribution with the given sum.
To see the reason you can just picture what the algorithm does in the case of two numbers with a prescribed sum (e.g. 1):
The point P is a generic point obtained by picking two random numbers and it will be uniform inside the square [0,1]x[0,1]. The point Q is the point obtained by scaling P so that the sum is required to be 1. As it's clear from the picture the points close to the center of the have an higher probability; for example the exact center of the squares will be found by projecting any point on the diagonal (0,0)-(1,1), while the point (0, 1) will be found projecting only points from (0,0)-(0,1)... the diagonal length is sqrt(2)=1.4142... while the square side is only 1.0.
Actually, you need to generate a partition of x into n parts. This is usually done the in following way: The partition of x into n non-negative parts can be represented in the following way: reserve n + x free places, put n borders to some arbitrary places, and stones to the rest. The stone groups add up to x, thus the number of possible partitions is the binomial coefficient (n + x \atop n).
So your algorithm could be as follows: choose an arbitrary n-subset of (n + x)-set, it determines uniquely a partition of x into n parts.
In Knuth's TAOCP the chapter 3.4.2 discusses random sampling. See Algortihm S there.
Algorithm S: (choose n arbitrary records from total of N)
t = 0, m = 0;
u = random, uniformly distributed on (0, 1)
if (N - t)*u >= n - m, skip t-th record and increase t by 1; otherwise include t-th record in the sample, increase m and t by 1
if M < n, return to 2, otherwise, algorithm finished
The solution for non-integers is algorithmically trivial: you just select arbitrary n numbers that don't sum up to 0, and norm them by their sum.
If you want to sample uniformly in the region of N-1-dimensional space defined by x1 + x2 + ... + xN = x, then you're looking at a special case of sampling from a Dirichlet distribution. The sampling procedure is a little more involved than generating uniform deviates for the xi. Here's one way to do it, in Python:
xs = [random.gammavariate(1,1) for a in range(N)]
xs = [x*v/sum(xs) for v in xs]
If you don't care too much about the sampling properties of your results, you can just generate uniform deviates and correct their sum afterwards.
Here is a version of the above algorithm in Javascript
function getRandomArbitrary(min, max) {
return Math.random() * (max - min) + min;
};
function getRandomArray(min, max, n) {
var arr = [];
for (var i = 0, l = n; i < l; i++) {
arr.push(getRandomArbitrary(min, max))
};
return arr;
};
function randomValuesPrescribedSum(min, max, n, total) {
var arr = getRandomArray(min, max, n);
var sum = arr.reduce(function(pv, cv) { return pv + cv; }, 0);
var k = total/sum;
var delays = arr.map(function(x) { return k*x; })
return delays;
};
You can call it with
var myarray = randomValuesPrescribedSum(0,1,3,3);
And then check it with
var sum = myarray.reduce(function(pv, cv) { return pv + cv;},0);
This code does a reasonable job. I think it produces a different distribution than 6502's answer, but I am not sure which is better or more natural. Certainly his code is clearer/nicer.
import random
def parts(total_sum, num_parts):
points = [random.random() for i in range(num_parts-1)]
points.append(0)
points.append(1)
points.sort()
ret = []
for i in range(1, len(points)):
ret.append((points[i] - points[i-1]) * total_sum)
return ret
def test(total_sum, num_parts):
ans = parts(total_sum, num_parts)
assert abs(sum(ans) - total_sum) < 1e-7
print ans
test(5.5, 3)
test(10, 1)
test(10, 5)
In python:
a: create a list of (random #'s 0 to 1) times total; append 0 and total to the list
b: sort the list, measure the distance between each element
c: round the list elements
import random
import time
TOTAL = 15
PARTS = 4
PLACES = 3
def random_sum_split(parts, total, places):
a = [0, total] + [random.random()*total for i in range(parts-1)]
a.sort()
b = [(a[i] - a[i-1]) for i in range(1, (parts+1))]
if places == None:
return b
else:
b.pop()
c = [round(x, places) for x in b]
c.append(round(total-sum(c), places))
return c
def tick():
if info.tick == 1:
start = time.time()
alpha = random_sum_split(PARTS, TOTAL, PLACES)
end = time.time()
log('alpha: %s' % alpha)
log('total: %.7f' % sum(alpha))
log('parts: %s' % PARTS)
log('places: %s' % PLACES)
log('elapsed: %.7f' % (end-start))
yields:
[2014-06-13 01:00:00] alpha: [0.154, 3.617, 6.075, 5.154]
[2014-06-13 01:00:00] total: 15.0000000
[2014-06-13 01:00:00] parts: 4
[2014-06-13 01:00:00] places: 3
[2014-06-13 01:00:00] elapsed: 0.0005839
to the best of my knowledge this distribution is uniform