Best way to distribute a given resource (eg. budget) for optimal output - algorithm

I am trying to find a solution in which a given resource (eg. budget) will be best distributed to different options which yields different results on the resource provided.
Let's say I have N = 1200 and some functions. (a, b, c, d are some unknown variables)
f1(x) = a * x
f2(x) = b * x^c
f3(x) = a*x + b*x^2 + c*x^3
f4(x) = d^x
f5(x) = log x^d
...
And also, let's say there n number of these functions that yield different results based on its input x, where x = 0 or x >= m, where m is a constant.
Although I am not able to find exact formula for the given functions, I am able to find the output. This means that I can do:
X = f1(N1) + f2(N2) + f3(N3) + ... + fn(Nn) where (N1 + ... Nn) = N as many times as there are ways of distributing N into n numbers, and find a specific case where X is the greatest.
How would I actually go about finding the best distribution of N with the least computation power, using whatever libraries currently available?

If you are happy with allocations constrained to be whole numbers then there is a dynamic programming solution of cost O(Nn) - so you can increase accuracy by scaling if you want, but this will increase cpu time.
For each i=1 to n maintain an array where element j gives the maximum yield using only the first i functions giving them a total allowance of j.
For i=1 this is simply the result of f1().
For i=k+1 consider when working out the result for j consider each possible way of splitting j units between f_{k+1}() and the table that tells you the best return from a distribution among the first k functions - so you can calculate the table for i=k+1 using the table created for k.
At the end you get the best possible return for n functions and N resources. It makes it easier to find out what that best answer is if you maintain of a set of arrays telling the best way to distribute k units among the first i functions, for all possible values of i and k. Then you can look up the best allocation for f100(), subtract off the value this allocated to f100() from N, look up the best allocation for f99() given the resulting resources, and carry on like this until you have worked out the best allocations for all f().
As an example suppose f1(x) = 2x, f2(x) = x^2 and f3(x) = 3 if x>0 and 0 otherwise. Suppose we have 3 units of resource.
The first table is just f1(x) which is 0, 2, 4, 6 for 0,1,2,3 units.
The second table is the best you can do using f1(x) and f2(x) for 0,1,2,3 units and is 0, 2, 4, 9, switching from f1 to f2 at x=2.
The third table is 0, 3, 5, 9. I can get 3 and 5 by using 1 unit for f3() and the rest for the best solution in the second table. 9 is simply the best solution in the second table - there is no better solution using 3 resources that gives any of them to f(3)
So 9 is the best answer here. One way to work out how to get there is to keep the tables around and recalculate that answer. 9 comes from f3(0) + 9 from the second table so all 3 units are available to f2() + f1(). The second table 9 comes from f2(3) so there are no units left for f(1) and we get f1(0) + f2(3) + f3(0).
When you are working the resources to use at stage i=k+1 you have a table form i=k that tells you exactly the result to expect from the resources you have left over after you have decided to use some at stage i=k+1. The best distribution does not become incorrect because that stage i=k you have worked out the result for the best distribution given every possible number of remaining resources.

Related

Best way to generate U(1,5) from U(1,3)?

I am given a uniform integer random number generator ~ U3(1,3) (inclusive). I would like to generate integers ~ U5(1,5) (inclusive) using U3. What is the best way to do this?
This simplest approach I can think of is to sample twice from U3 and then use rejection sampling. I.e., sampling twice from U3 gives us 9 possible combinations. We can assign the first 5 combinations to 1,2,3,4,5, and reject the last 4 combinations.
This approach expects to sample from U3 9/5 * 2 = 18/5 = 3.6 times.
Another approach could be to sample three times from U3. This gives us a sample space of 27 possible combinations. We can make use of 25 of these combinations and reject the last 2. This approach expects to use U3 27/25 * 3.24 times. But this approach would be a little more tedious to write out since we have a lot more combinations than the first, but the expected number of sampling from U3 is better than the first.
Are there other, perhaps better, approaches to doing this?
I have this marked as language agnostic, but I'm primarily looking into doing this in either Python or C++.
You do not need combinations. A slight tweak using base 3 arithmetic removes the need for a table. Rather than using the 1..3 result directly, subtract 1 to get it into the range 0..2 and treat it as a base 3 digit. For three samples you could do something like:
function sample3()
result <- 0
result <- result + 9 * (randU3() - 1) // High digit: 9
result <- result + 3 * (randU3() - 1) // Middle digit: 3
result <- result + 1 * (randU3() - 1) // Units digit: 1
return result
end function
That will give you a number in the range 0..26, or 1..27 if you add one. You can use that number directly in the rest of your program.
For the range [1, 3] to [1, 5], this is equivalent to rolling a 5-sided die with a 3-sided one.
However, this can't be done without "wasting" randomness (or running forever in the worst case), since all the prime factors of 5 (namely 5) don't divide 3. Thus, the best that can be done is to use rejection sampling to get arbitrarily close to no "waste" of randomness (such as by batching multiple rolls of the 3-sided die until 3^n is "close enough" to a power of 5). In other words, the approaches you give in your question are as good as they can get.
More generally, an algorithm to roll a k-sided die with a p-sided die will inevitably "waste" randomness (and run forever in the worst case) unless "every prime number dividing k also divides p", according to Lemma 3 in "Simulating a dice with a dice" by B. Kloeckner. For example:
Take the much more practical case that p is a power of 2 (and any block of random bits is the same as rolling a die with a power of 2 number of faces) and k is arbitrary. In this case, this "waste" and indefinite running time are inevitable unless k is also a power of 2.
This result applies to any case of rolling a n-sided die with a m-sided die, where n and m are prime numbers. For example, look at the answers to a question for the case n = 7 and m = 5.
See also this question: Frugal conversion of uniformly distributed random numbers from one range to another.
Peter O. is right, you cannot escape to loose some randomness. So the only choice is between how expensive calls to U(1,3) are, code clarity, simplicity etc.
Here is my variant, making bits from U(1,3) and combining them together with rejection
C/C++ (untested!)
int U13(); // your U(1,3)
int getBit() { // single random bit
return (U13()-1)&1;
}
int U15() {
int r;
for(;;) {
int q = getBit() + 2*getBit() + 4*getBit(); // uniform in [0...8)
if (q < 5) { // need range [0...5)
r = q + 1; // q accepted, make it in [1...5]
break;
}
}
return r;
}

Number of partitions with a given constraint

Consider a set of 13 Danish, 11 Japanese and 8 Polish people. It is well known that the number of different ways of dividing this set of people to groups is the 13+11+8=32:th Bell number (the number of set partitions). However we are asked to find the number of possible set partitions under a given constraint. The question is as follows:
A set partition is said to be good if it has no group consisting of at least two people that only includes a single nationality. How many good partitions there are for this set? (A group may include only one person.)
The brute force approach requires going though about 10^26 partitions and checking which ones are good. This seems pretty unfeasible, especially if the groups are larger or one introduces other nationalities. Is there a smart way instead?
EDIT: As a side note. There probably is no hope for a really nice solution. A highly esteemed expert in combinatorics answered a related question, which, I think, basically says that the related problem, and thus this problem also, is very difficult to solve exactly.
Here's a solution using dynamic programming.
It starts from an empty set, then adds one element at a time and calculates all the valid partitions.
The state space is huge, but notice that to be able to calculate the next step we only need to know about a partition the following things:
For each nationality, how many sets it contains that consists of only a single member of that nationality. (e.g.: {a})
How many sets it contains with mixed elements. (e.g.: {a, b, c})
For each of these configurations I only store the total count. Example:
[0, 1, 2, 2] -> 3
{a}{b}{c}{mixed}
e.g.: 3 partitions that look like: {b}, {c}, {c}, {a,c}, {b,c}
Here's the code in python:
import collections
from operator import mul
from fractions import Fraction
def nCk(n,k):
return int( reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1) )
def good_partitions(l):
n = len(l)
i = 0
prev = collections.defaultdict(int)
while l:
#any more from this kind?
if l[0] == 0:
l.pop(0)
i += 1
continue
l[0] -= 1
curr = collections.defaultdict(int)
for solution,total in prev.iteritems():
for idx,item in enumerate(solution):
my_solution = list(solution)
if idx == i:
# add element as a new set
my_solution[i] += 1
curr[tuple(my_solution)] += total
elif my_solution[idx]:
if idx != n:
# add to a set consisting of one element
# or merge into multiple sets that consist of one element
cnt = my_solution[idx]
c = cnt
while c > 0:
my_solution = list(solution)
my_solution[n] += 1
my_solution[idx] -= c
curr[tuple(my_solution)] += total * nCk(cnt, c)
c -= 1
else:
# add to a mixed set
cnt = my_solution[idx]
curr[tuple(my_solution)] += total * cnt
if not prev:
# one set with one element
lone = [0] * (n+1)
lone[i] = 1
curr[tuple(lone)] = 1
prev = curr
return sum(prev.values())
print good_partitions([1, 1, 1, 1]) # 15
print good_partitions([1, 1, 1, 1, 1]) # 52
print good_partitions([2, 1]) # 4
print good_partitions([13, 11, 8]) # 29811734589499214658370837
It produces correct values for the test cases. I also tested it against a brute-force solution (for small values), and it produces the same results.
An exact analytic solution is hard, but a polynomial time+space dynamic programming solution is straightforward.
First of all, we need an absolute order on the size of groups. We do that by comparing how many Danes, Japanese, and Poles we have.
Next, the function to write is this one.
m is the maximum group size we can emit
p is the number of people of each nationality that we have left to split
max_good_partitions_of_maximum_size(m, p) is the number of "good partitions"
we can form from p people, with no group being larger than m
Clearly you can write this as a somewhat complicated recursive function that always select the next partition to use, then call itself with that as the new maximum size, and subtract the partition from p. If you had this function, then your answer is simply max_good_partitions_of_maximum_size(p, p) with p = [13, 11, 8]. But that is going to be a brute force search that won't run in reasonable time.
Finally apply https://en.wikipedia.org/wiki/Memoization by caching every call to this function, and it will run in polynomial time. However you will also have to cache a polynomial number of calls to it.

Compare two arrays of points [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I'm trying to find a way to find similarities in two arrays of different points. I drew circles around points that have similar patterns and I would like to do some kind of auto comparison in intervals of let's say 100 points and tell what coefficient of similarity is for that interval. As you can see it might not be perfectly aligned also so point-to-point comparison would not be a good solution also (I suppose). Patterns that are slightly misaligned could also mean that they are matching the pattern (but obviously with a smaller coefficient)
What similarity could mean (1 coefficient is a perfect match, 0 or less - is not a match at all):
Points 640 to 660 - Very similar (coefficient is ~0.8)
Points 670 to 690 - Quite similar (coefficient is ~0.5-~0.6)
Points 720 to 780 - Let's say quite similar (coefficient is ~0.5-~0.6)
Points 790 to 810 - Perfectly similar (coefficient is 1)
Coefficient is just my thoughts of how a final calculated result of comparing function could look like with given data.
I read many posts on SO but it didn't seem to solve my problem. I would appreciate your help a lot. Thank you
P.S. Perfect answer would be the one that provides pseudo code for function which could accept two data arrays as arguments (intervals of data) and return coefficient of similarity.
Click here to see original size of image
I also think High Performance Mark has basically given you the answer (cross-correlation). In my opinion, most of the other answers are only giving you half of what you need (i.e., dot product plus compare against some threshold). However, this won't consider a signal to be similar to a shifted version of itself. You'll want to compute this dot product N + M - 1 times, where N, M are the sizes of the arrays. For each iteration, compute the dot product between array 1 and a shifted version of array 2. The amount you shift array 2 increases by one each iteration. You can think of array 2 as a window you are passing over array 1. You'll want to start the loop with the last element of array 2 only overlapping the first element in array 1.
This loop will generate numbers for different amounts of shift, and what you do with that number is up to you. Maybe you compare it (or the absolute value of it) against a threshold that you define to consider two signals "similar".
Lastly, in many contexts, a signal is considered similar to a scaled (in the amplitude sense, not time-scaling) version of itself, so there must be a normalization step prior to computing the cross-correlation. This is usually done by scaling the elements of the array so that the dot product with itself equals 1. Just be careful to ensure this makes sense for your application numerically, i.e., integers don't scale very well to values between 0 and 1 :-)
i think HighPerformanceMarks's suggestion is the standard way of doing the job.
a computationally lightweight alternative measure might be a dot product.
split both arrays into the same predefined index intervals.
consider the array elements in each intervals as vector coordinates in high-dimensional space.
compute the dot product of both vectors.
the dot product will not be negative. if the two vectors are perpendicular in their vector space, the dot product will be 0 (in fact that's how 'perpendicular' is usually defined in higher dimensions), and it will attain its maximum for identical vectors.
if you accept the geometric notion of perpendicularity as a (dis)similarity measure, here you go.
caveat:
this is an ad hoc heuristic chosen for computational efficiency. i cannot tell you about mathematical/statistical properties of the process and separation properties - if you need rigorous analysis, however, you'll probably fare better with correlation theory anyway and should perhaps forward your question to math.stackexchange.com.
My Attempt:
Total_sum=0
1. For each index i in the range (m,n)
2. sum=0
3. k=Array1[i]*Array2[i]; t1=magnitude(Array1[i]); t2=magnitude(Array2[i]);
4. k=k/(t1*t2)
5. sum=sum+k
6. Total_sum=Total_sum+sum
Coefficient=Total_sum/(m-n)
If all values are equal, then sum would return 1 in each case and total_sum would return (m-n)*(1). Hence, when the same is divided by (m-n) we get the value as 1. If the graphs are exact opposites, we get -1 and for other variations a value between -1 and 1 is returned.
This is not so efficient when the y range or the x range is huge. But, I just wanted to give you an idea.
Another option would be to perform an extensive xnor.
1. For each index i in the range (m,n)
2. sum=1
3. k=Array1[i] xnor Array2[i];
4. k=k/((pow(2,number_of_bits))-1) //This will scale k down to a value between 0 and 1
5. sum=(sum+k)/2
Coefficient=sum
Is this helpful ?
You can define a distance metric for two vectors A and B of length N containing numbers in the interval [-1, 1] e.g. as
sum = 0
for i in 0 to 99:
d = (A[i] - B[i])^2 // this is in range 0 .. 4
sum = (sum / 4) / N // now in range 0 .. 1
This now returns distance 1 for vectors that are completely opposite (one is all 1, another all -1), and 0 for identical vectors.
You can translate this into your coefficient by
coeff = 1 - sum
However, this is a crude approach because it does not take into account the fact that there could be horizontal distortion or shift between the signals you want to compare, so let's look at some approaches for coping with that.
You can sort both your arrays (e.g. in ascending order) and then calculate the distance / coefficient. This returns more similarity than the original metric, and is agnostic towards permutations / shifts of the signal.
You can also calculate the differentials and calculate distance / coefficient for those, and then you can do that sorted also. Using differentials has the benefit that it eliminates vertical shifts. Sorted differentials eliminate horizontal shift but still recognize different shapes better than sorted original data points.
You can then e.g. average the different coefficients. Here more complete code. The routine below calculates coefficient for arrays A and B of given size, and takes d many differentials (recursively) first. If sorted is true, the final (differentiated) array is sorted.
procedure calc(A, B, size, d, sorted):
if (d > 0):
A' = new array[size - 1]
B' = new array[size - 1]
for i in 0 to size - 2:
A'[i] = (A[i + 1] - A[i]) / 2 // keep in range -1..1 by dividing by 2
B'[i] = (B[i + 1] - B[i]) / 2
return calc(A', B', size - 1, d - 1, sorted)
else:
if (sorted):
A = sort(A)
B = sort(B)
sum = 0
for i in 0 to size - 1:
sum = sum + (A[i] - B[i]) * (A[i] - B[i])
sum = (sum / 4) / size
return 1 - sum // return the coefficient
procedure similarity(A, B, size):
sum a = 0
a = a + calc(A, B, size, 0, false)
a = a + calc(A, B, size, 0, true)
a = a + calc(A, B, size, 1, false)
a = a + calc(A, B, size, 1, true)
return a / 4 // take average
For something completely different, you could also run Fourier transform using FFT and then take a distance metric on the returning spectra.

How to implement Random(a,b) with only Random(0,1)? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
how to get uniformed random between a, b by a known uniformed random function RANDOM(0,1)
In the book of Introduction to algorithms, there is an excise:
Describe an implementation of the procedure Random(a, b) that only makes calls to Random(0,1). What is the expected running time of your procedure, as a function of a and b? The probability of the result of Random(a,b) should be pure uniformly distributed, as Random(0,1)
For the Random function, the results are integers between a and b, inclusively. For e.g., Random(0,1) generates either 0 or 1; Random(a, b) generates a, a+1, a+2, ..., b
My solution is like this:
for i = 1 to b-a
r = a + Random(0,1)
return r
the running time is T=b-a
Is this correct? Are the results of my solutions uniformly distributed?
Thanks
What if my new solution is like this:
r = a
for i = 1 to b - a //including b-a
r += Random(0,1)
return r
If it is not correct, why r += Random(0,1) makes r not uniformly distributed?
Others have explained why your solution doesn't work. Here's the correct solution:
1) Find the smallest number, p, such that 2^p > b-a.
2) Perform the following algorithm:
r=0
for i = 1 to p
r = 2*r + Random(0,1)
3) If r is greater than b-a, go to step 2.
4) Your result is r+a
So let's try Random(1,3).
So b-a is 2.
2^1 = 2, so p will have to be 2 so that 2^p is greater than 2.
So we'll loop two times. Let's try all possible outputs:
00 -> r=0, 0 is not > 2, so we output 0+1 or 1.
01 -> r=1, 1 is not > 2, so we output 1+1 or 2.
10 -> r=2, 2 is not > 2, so we output 2+1 or 3.
11 -> r=3, 3 is > 2, so we repeat.
So 1/4 of the time, we output 1. 1/4 of the time we output 2. 1/4 of the time we output 3. And 1/4 of the time we have to repeat the algorithm a second time. Looks good.
Note that if you have to do this a lot, two optimizations are handy:
1) If you use the same range a lot, have a class that computes p once so you don't have to compute it each time.
2) Many CPUs have fast ways to perform step 1 that aren't exposed in high-level languages. For example, x86 CPUs have the BSR instruction.
No, it's not correct, that method will concentrate around (a+b)/2. It's a binomial distribution.
Are you sure that Random(0,1) produces integers? it would make more sense if it produced floating point values between 0 and 1. Then the solution would be an affine transformation, running time independent of a and b.
An idea I just had, in case it's about integer values: use bisection. At each step, you have a range low-high. If Random(0,1) returns 0, the next range is low-(low+high)/2, else (low+high)/2-high.
Details and complexity left to you, since it's homework.
That should create (approximately) a uniform distribution.
Edit: approximately is the important word there. Uniform if b-a+1 is a power of 2, not too far off if it's close, but not good enough generally. Ah, well it was a spontaneous idea, can't get them all right.
No, your solution isn't correct. This sum'll have binomial distribution.
However, you can generate a pure random sequence of 0, 1 and treat it as a binary number.
repeat
result = a
steps = ceiling(log(b - a))
for i = 0 to steps
result += (2 ^ i) * Random(0, 1)
until result <= b
KennyTM: my bad.
I read the other answers. For fun, here is another way to find the random number:
Allocate an array with b-a elements.
Set all the values to 1.
Iterate through the array. For each nonzero element, flip the coin, as it were. If it is came up 0, set the element to 0.
Whenever, after a complete iteration, you only have 1 element remaining, you have your random number: a+i where i is the index of the nonzero element (assuming we start indexing on 0). All numbers are then equally likely. (You would have to deal with the case where it's a tie, but I leave that as an exercise for you.)
This would have O(infinity) ... :)
On average, though, half the numbers would be eliminated, so it would have an average case running time of log_2 (b-a).
First of all I assume you are actually accumulating the result, not adding 0 or 1 to a on each step.
Using some probabilites you can prove that your solution is not uniformly distibuted. The chance that the resulting value r is (a+b)/2 is greatest. For instance if a is 0 and b is 7, the chance that you get a value 4 is (combination 4 of 7) divided by 2 raised to the power 7. The reason for that is that no matter which 4 out of the 7 values are 1 the result will still be 4.
The running time you estimate is correct.
Your solution's pseudocode should look like:
r=a
for i = 0 to b-a
r+=Random(0,1)
return r
As for uniform distribution, assuming that the random implementation this random number generator is based on is perfectly uniform the odds of getting 0 or 1 are 50%. Therefore getting the number you want is the result of that choice made over and over again.
So for a=1, b=5, there are 5 choices made.
The odds of getting 1 involves 5 decisions, all 0, the odds of that are 0.5^5 = 3.125%
The odds of getting 5 involves 5 decisions, all 1, the odds of that are 0.5^5 = 3.125%
As you can see from this, the distribution is not uniform -- the odds of any number should be 20%.
In the algorithm you created, it is really not equally distributed.
The result "r" will always be either "a" or "a+1". It will never go beyond that.
It should look something like this:
r=0;
for i=0 to b-a
r = a + r + Random(0,1)
return r;
By including "r" into your computation, you are including the "randomness" of all the previous "for" loop runs.

Number of ways to add up to a sum S with N numbers

Say S = 5 and N = 3 the solutions would look like - <0,0,5> <0,1,4> <0,2,3> <0,3,2> <5,0,0> <2,3,0> <3,2,0> <1,2,2> etc etc.
In the general case, N nested loops can be used to solve the problem. Run N nested loop, inside them check if the loop variables add upto S.
If we do not know N ahead of time, we can use a recursive solution. In each level, run a loop starting from 0 to N, and then call the function itself again. When we reach a depth of N, see if the numbers obtained add up to S.
Any other dynamic programming solution?
Try this recursive function:
f(s, n) = 1 if s = 0
= 0 if s != 0 and n = 0
= sum f(s - i, n - 1) over i in [0, s] otherwise
To use dynamic programming you can cache the value of f after evaluating it, and check if the value already exists in the cache before evaluating it.
There is a closed form formula : binomial(s + n - 1, s) or binomial(s+n-1,n-1)
Those numbers are the simplex numbers.
If you want to compute them, use the log gamma function or arbitrary precision arithmetic.
See https://math.stackexchange.com/questions/2455/geometric-proof-of-the-formula-for-simplex-numbers
I have my own formula for this. We, together with my friend Gio made an investigative report concerning this. The formula that we got is [2 raised to (n-1) - 1], where n is the number we are looking for how many addends it has.
Let's try.
If n is 1: its addends are o. There's no two or more numbers that we can add to get a sum of 1 (excluding 0). Let's try a higher number.
Let's try 4. 4 has addends: 1+1+1+1, 1+2+1, 1+1+2, 2+1+1, 1+3, 2+2, 3+1. Its total is 7.
Let's check with the formula. 2 raised to (4-1) - 1 = 2 raised to (3) - 1 = 8-1 =7.
Let's try 15. 2 raised to (15-1) - 1 = 2 raised to (14) - 1 = 16384 - 1 = 16383. Therefore, there are 16383 ways to add numbers that will equal to 15.
(Note: Addends are positive numbers only.)
(You can try other numbers, to check whether our formula is correct or not.)
This can be calculated in O(s+n) (or O(1) if you don't mind an approximation) in the following way:
Imagine we have a string with n-1 X's in it and s o's. So for your example of s=5, n=3, one example string would be
oXooXoo
Notice that the X's divide the o's into three distinct groupings: one of length 1, length 2, and length 2. This corresponds to your solution of <1,2,2>. Every possible string gives us a different solution, by counting the number of o's in a row (a 0 is possible: for example, XoooooX would correspond to <0,5,0>). So by counting the number of possible strings of this form, we get the answer to your question.
There are s+(n-1) positions to choose for s o's, so the answer is Choose(s+n-1, s).
There is a fixed formula to find the answer. If you want to find the number of ways to get N as the sum of R elements. The answer is always:
(N+R-1)!/((R-1)!*(N)!)
or in other words:
(N+R-1) C (R-1)
This actually looks a lot like a Towers of Hanoi problem, without the constraint of stacking disks only on larger disks. You have S disks that can be in any combination on N towers. So that's what got me thinking about it.
What I suspect is that there is a formula we can deduce that doesn't require the recursive programming. I'll need a bit more time though.

Resources