How to point on non-linear range to linear and back? - algorithm

I have a list of linear ranges which represent one big range:
X'
100 200 300 400 500 600 700 | 900 (X)
|----------|----------|----------|--------+----------|
0 | 100 (Y)
Y'
X consists of the following ranges (even and round numbers are just examples for ease of comprehension, they could be anything, no proportions here at all):
From 100 to 200
From 300 to 400
From 500 to 600
From 700 to 900
On the flip side, Y has just one range:
From 0 to 100
Both X and Y are of the same length, just different units. Let's say one is dollars and another is percents (or any other similarly unrelated units). So Y'0 == X'100 and Y'100 == X'900.
Given any point in Y, what is equivalent point in X and vise-versa, given a point in X - what is it in Y?
Is this a typical math problem? Does it have a name?

How many ranges do you have? Is it acceptable that the algorithm is O(number of ranges)?
If so, below is the description of the algorithm. Let me explain it on your (original) example.
100 200 300 400 500 600 700 800
|----------|----------|----------|----------|
0% 100%
1) What you're doing to do is to map the value X in range A (100-800) to the value Y in continous range B (0-399) (as the total number of elements in your range is 400). Then it's easy to change position in B to percents, I will omit this part.
2) Create a list of records, where each records represents one range mapping.
struct RangeRecord {
int start_in_a;
int start_in_b;
};
In your case, you will get the following list:
{100, 0}, {300, 100}, {500, 200}, {700, 300}
3) When you need to map a number X from A to B, you iterate the list to find first record with start_in_a <= X.Then your value Y is
Y = X + start_in_b - start_in_a;
4) The algorithm is symmettric, you just iterate the list to find the first record with start_in_b <= Y, and then
X = Y + start_in_a - start_in_b.
Note 1. For error checking purposes, you might keep the range size in RangeRecord, as well.
Note 2. If O(number of ranges) is not good enough, keep the records as a tree instead of a list. You will need O(log(number of ranges)) operations then,

Say you have one range (a, b) and another one (c, d). Now you have a number i for which a < i < b. You can "normalize" it by subtracting a and dividing by b - a - this gives you a value between 0 and 1. You can then use this value to transfer it into the other range by reversing this calculation with the other bounds, so to speak multiply it by (d - c) and add c.
Say the corresponding point in the other range is i'. Then,
i' = (i - a) / (b - a) * (d - c) + c
The term you are searching for is scaling and translation.

This is not really solvable because the problem is underspecified. Even for the same ranges, there can be different sliders like this:
1 100 101 1000
|-----|-----------|
1 100 101 1000
|-----------|-----|
For each range like [1..100] you need to know how which percent points on the slider correspond to it. In the above examples this could be something like [0%..33%] or [0%..66%]. Once you have this information, it's easy to determine in which of the ranges and at which position of that range a given data point is and to what value it corresponds.

You have three things you need to adjust for in converting from some X' to Y' and vice versa:
The ranges start at different places.
One of them is discontinuous.
The size of each step is different between the two ranges.
It might be helpful (at least while developing your solutions) to consider a similar range Z, which is the range 0 to 503 and has a one-to-one mapping with the 504 possible values in X. That is, for each discontinuity, if the X value is greater than the upper end of the discontinuity, subtract 99 (the size of the discontinuity). Then X'100 = Z'0, X'200 = Z'100, X'300 = Z'101, X'400 = Z'201, X'500 = Z'202, etc. The introduction of the Z range resolves problems 1 and 2 in the list above.
To convert from Z to Y, you just multiply by 101/504, which scales Z onto Y.

Assuming the piecewise linear arrangement you imply, you can find X by:
X = 4*Y + 100*int(1 + Y/25.)
and the reverse for Y:
X2 = int(X/100.)
X3 = X2-int(X2/2.)
Y = (X-100*X3)/4.
edit: This solution works for the original range you gave:
100 200 300 400 500 600 700 800
|----------|----------|----------|----------|
0% 100%
And of course, the reverse formula only holds for valid values of X.
Here's a figure of the two curves. The green is your original specification and the blue is the reverse curve (again, only valid for the valid x-values).
alt text http://img523.imageshack.us/img523/8858/66945008.png

Related

How to fix skew trapezoidal distribution sampling output sample size

I am trying to generate a skewed trapezoidal distribution using inverse transform sampling.
The inputs are the values where the ramps start and end (a, b, c, d) and the sample size.
a=-3;b=-1;c=1;d=8;
SampleSize=10e4;
h=2/(d+c-a-b);
Then I calculate the ratio of the length of ramps and flat components to get sample size for each:
firstramp=round(((b-a)/(d-a)),3);
flat=round((c-b)/(d-a),3);
secondramp=round((d-c)/(d-a),3);
n1=firstramp*SampleSize; %sample size for first ramp
n3=secondramp*SampleSize; %sample size for second ramp
n2=flat*SampleSize;
And then finally I get the histogram from the following code:
quartile1=h/2*(b-a);
quartile2=1-h/2*(d-c);
y1=linspace(0,quartile1,n1);
y2=linspace(quartile1,quartile2,n2);
y3=linspace(quartile2,1,n3);
%inverse cumulative distribution functions
invcdf1=a+sqrt(2*(b-a)/h)*sqrt(y1);
invcdf2=(a+b)/2+y2/h;
invcdf3=d-sqrt(2*(d-c)/h)*sqrt(1-y3);
distr=[invcdf1 invcdf2 invcdf3];
histogram(distr,100)
However the sampling of ramps and flat components are not equal, looks like this:
I fixed this by trial and error, by reducing the sample size of the ramps by half:
n1=0.5*firstramp*SampleSize; %sample size for first ramp
n3=0.5*secondramp*SampleSize; %sample size for second ramp
n2=flat*SampleSize;
This made the distribution look like this:
However this makes the output sample less than what is given in input.
I've also tried different combinations of changing the sample sizes of ramps and flat.
This also works:
n1=0.75*firstramp*SampleSize; %sample size for first ramp
n3=0.75*secondramp*SampleSize; %sample size for second ramp
n2=1.5*flat*SampleSize;
It increases the output samples, but it's still not close.
Any help will be appreciated.
Full code:
a=-3;b=-1;c=1;d=8;
SampleSize=10e4;%*1.33333333333333;
h=2/(d+c-a-b);
firstramp=round(((b-a)/(d-a)),3);
flat=round((c-b)/(d-a),3);
secondramp=round((d-c)/(d-a),3);
n1=firstramp*SampleSize; %sample size for first ramp
n3=secondramp*SampleSize; %sample size for second ramp
n2=flat*SampleSize;
quartile1=h/2*(b-a);
quartile2=1-h/2*(d-c);
y1=linspace(0,quartile1,.75*n1);
y2=linspace(quartile1,quartile2,1.5*n2);
y3=linspace(quartile2,1,.75*n3);
%inverse cumulative distribution functions
invcdf1=a+sqrt(2*(b-a)/h)*sqrt(y1);
invcdf2=(a+b)/2+y2/h;
invcdf3=d-sqrt(2*(d-c)/h)*sqrt(1-y3);
distr=[invcdf1 invcdf2 invcdf3];
histogram(distr,100)
%end
I don't know Matlab so I was hoping somebody else would jump in on this, but since nobody did here goes.
If I'm reading your code correctly what you did is not an inversion. Inversion is 1-1, i.e., one uniform input produces one outcome. You seem to be using a technique known as the "composition method". In composition the overall distribution is comprised of component pieces, each of which is straightforward to generate. You choose which component to generate from based on their proportions/probabilities relative to the whole. For density functions, probability is found as the area under the density curve, so your first mistake was in sampling the components relative to the width of each component rather than using their areas. The correct sampling proportions are 2/13, 4/13, and 7/13 for what you designated the firstramp, flat, and secondramp components, respectively. A second mistake (which is relatively minor) was to assign exact sample sizes to each of the components. Having probability 2/13 does not mean that exactly 2*SampleSize/13 of your samples will be from the firstramp, it means that's the expected sample size for that component. The expected value of a random variate is not necessarily (or even likely to be) the outcome you will actually get.
In pseudocode, the composition approach would be
generate U ~ Uniform(0,1)
if U <= 2/13:
generate and return a value from firstramp
else if U <= 6/13:
generate and return a value from flat
else:
generate and return a value from secondramp
Note that since each of the generate options will use one or more uniforms, and choosing between the options requires a uniform U, this is not an inversion.
If you want an actual inversion, you need to quantify your density, integrate it to get the cumulative distribution function, then apply the inversion technique by setting F(X) = U and solving for X. Since your distribution is made of distinct components, both the density and cumulative density will be piecewise functions.
After deriving the height based on the requirement that the areas of the two triangles and the flat section must add up to 1, I came up with the following for your density:
| (x + 3) / 13 -3 <= x <= -1
|
f(x) = | 2 / 13 -1 <= x <= 1
|
| 2 * (8 - x) / 91 1 <= x <= 8
Integrating this and collecting terms produces the CDF:
| (x + 3)**2 / 26 -3 <= x <= -1
|
F(x) = | (2 + x) * 2 / 13 -1 <= x <= 1
|
| 6 / 13 + [49 - (x - 8)**2] / 91 1 <= x <= 8
Finally, determining the values of F(x) at the break points between the segments and applying inversion yields the following pseudocode algorithm:
generate U ~ Uniform(0,1)
if U <= 2 / 13:
return 2 * sqrt( (13 * U) / 2 ) - 3
else if U <= 6 / 13:
return (13 * U) / 2 - 2:
else:
return 8 - sqrt( 91 * (1 - U) )
Note that this is a true inversion. The outcome is determined by generating a single U, and transforming it in different ways depending on which range it falls in.

What is the meaning of min_distance and min_angle in hough_line_peaks()?

Can someone explain min_distance and min_angle optional parameters, please ?
http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.hough_line_peaks
For min_angle=n, I thought it would check if the next angle's line was minimum superior to n element in my theta array for being accepted.
import numpy as np
from skimage.transform import hough_line,hough_line_peaks
iden = np.identity(200)
hspace, angles, dists = hough_line(iden,theta=np.linspace(-np.pi/2,np.pi/2,1800)) # 0.1 degree resolution
hspace, angles, dists = hough_line_peaks(hspace, angles, dists,min_distance=0,min_angle=20) # 2 degree minimum before accepting as new line?
print(hspace, angles*180/np.pi, dists)
output : [200 126 124] [-44.9749861 -45.27515286 -44.67481934] [ 0.50088496 -0.50088496 1.50265487]
The angle array shows that i'm getting wrong with this. The parameter accepts only integer, i'm not sure of what it could be ...
I don't think there is anything wrong with the function hough_line_peaks() itself.
min_angle and min_distance define a zone around an already found peak, in which no other peak can be found (i.e., you consider that a peak that close from another peak is actually a single unique peak)
In the accumulator of the Hough transform, the 2 dimensions are: angles and distances. You basically set by an integer the number of bins in the accumulator that have to be ignored around an already found peak.
By setting min_distance to 0, you are only avoiding to get 2 peaks that have the exact same distance parameter AND an angle parameter difference less than 20 * angle_resolution ~= 20 * 0.1 = .2. None of the 3 peaks that are returned have the same distance parameter and therefore, the condition you set is respected.
Also, be aware that your angle resolution is not exactly 0.1 degrees unless the third parameter in np.linspace is 1801. This is the way np.linspace behaves, you give it the total number of points. hough_line_peaks just takes the returned vector as an input argument. You could also use np.arange which allows you to pass the step as an argument.
Edit
The angle array returned is in degrees ?!?. I would expect radians as for the input... The values should correspond to some of the values of np.linspace(-np.pi/2, np.pi/2, 1800).
End of edit
Basically, it works this way:
Find the highest value in accumulator -> 200, -44.9749861, 0.50088496 (200 means that 200 pixels have been assigned to this bin of the accumulator)
Set the bins of the accumulator that are around the peak bin [bin - min_dist: bin + min_dist, bin - min_angle:bin + min_angle] to 0
Find the second biggest value in the accumulator and so on.
Edit 2:
Why the results accumulator_value = [200 126 124], angle_params = [a b c] and dist_params = [d e f] (for all d, e, f such as d != e and e != f) are not incoherent with the parameters min_angle = X and min_distance = 0
The strongest peak in the accumulator is found at the binangle_param = a and dist_param = d.
The search for the second peak will be carried out by discarding this bin in the accumulator as well as the bins that are located at a number of bins <= X (side note: it is possible that it is X/2 but this does not change the reasoning here) on the angle "direction" and at a number of bins <= 0 on the distance "direction" from the "peak's" bin.
Only this. So, the other peaks found in your case are located in a bin whose distance parameter is different that any other peak found. Therefore there is no reason for discarding them.
The accumulator is simply a 2-dimensional table of bins, one direction representing the angles and the other the distances.

convert real number to radicals

Suppose I have a real number. I want to approximate it with something of the form a+sqrt(b) for integers a and b. But I don't know the values of a and b. Of course I would prefer to get a good approximation with small values of a and b. Let's leave it undefined for now what is meant by "good" and "small". Any sensible definitions of those terms will do.
Is there a sane way to find them? Something like the continued fraction algorithm for finding fractional approximations of decimals. For more on the fractions problem, see here.
EDIT: To clarify, it is an arbitrary real number. All I have are a bunch of its digits. So depending on how good of an approximation we want, a and b might or might not exist. Brute force is naturally not a particularly good algorithm. The best I can think of would be to start adding integers to my real, squaring the result, and seeing if I come close to an integer. Pretty much brute force, and not a particularly good algorithm. But if nothing better exists, that would itself be interesting to know.
EDIT: Obviously b has to be zero or positive. But a could be any integer.
No need for continued fractions; just calculate the square-root of all "small" values of b (up to whatever value you feel is still "small" enough), remove everything before the decimal point, and sort/store them all (along with the b that generated it).
Then when you need to approximate a real number, find the radical whose decimal-portion is closet to the real number's decimal-portion. This gives you b - choosing the correct a is then a simple matter of subtraction.
This is actually more of a math problem than a computer problem, but to answer the question I think you are right that you can use continued fractions. What you do is first represent the target number as a continued fraction. For example, if you want to approximate pi (3.14159265) then the CF is:
3: 7, 15, 1, 288, 1, 2, 1, 3, 1, 7, 4 ...
The next step is create a table of CFs for square roots, then you compare the values in the table to the fractional part of the target value (here: 7, 15, 1, 288, 1, 2, 1, 3, 1, 7, 4...). For example, let's say your table had square roots for 1-99 only. Then you would find the closest match would be sqrt(51) which has a CF of 7: 7,14 repeating. The 7,14 is the closest to pi's 7,15. Thus your answer would be:
sqrt(51)-4
As the closest approximation given a b < 100 which is off by 0.00016. If you allow larger b's then you could get a better approximation.
The advantage of using CFs is that it is faster than working in, say, doubles or using floating point. For example, in the above case you only have to compare two integers (7 and 15), and you can also use indexing to make finding the closest entry in the table very fast.
This can be done using mixed integer quadratic programming very efficiently (though there are no run-time guarantees as MIQP is NP-complete.)
Define:
d := the real number you wish to approximate
b, a := two integers such that a + sqrt(b) is as "close" to d as possible
r := (d - a)^2 - b, is the residual of the approximation
The goal is to minimize r. Setup your quadratic program as:
x := [ s b t ]
D := | 1 0 0 |
| 0 0 0 |
| 0 0 0 |
c := [0 -1 0]^T
with the constraint that s - t = f (where f is the fractional part of d)
and b,t are integers (s is not)
This is a convex (therefore optimally solvable) mixed integer quadratic program since D is positive semi-definite.
Once s,b,t are computed, simply derive the answer using b=b, s=d-a and t can be ignored.
Your problem may be NP-complete, it would be interesting to prove if so.
Some of the previous answers use methods that are of time or space complexity O(n), where n is the largest “small number” that will be accepted. By contrast, the following method is O(sqrt(n)) in time, and O(1) in space.
Suppose that positive real number r = x + y, where x=floor(r) and 0 ≤ y < 1. We want to approximate r by a number of the form a + √b. If x+y ≈ a+√b then x+y-a ≈ √b, so √b ≈ h+y for some integer offset h, and b ≈ (h+y)^2. To make b an integer, we want to minimize the fractional part of (h+y)^2 over all eligible h. There are at most √n eligible values of h. See following python code and sample output.
import math, random
def findb(y, rhi):
bestb = loerror = 1;
for r in range(2,rhi):
v = (r+y)**2
u = round(v)
err = abs(v-u)
if round(math.sqrt(u))**2 == u: continue
if err < loerror:
bestb, loerror = u, err
return bestb
#random.seed(123456) # set a seed if testing repetitively
f = [math.pi-3] + sorted([random.random() for i in range(24)])
print (' frac sqrt(b) error b')
for frac in f:
b = findb(frac, 12)
r = math.sqrt(b)
t = math.modf(r)[0] # Get fractional part of sqrt(b)
print ('{:9.5f} {:9.5f} {:11.7f} {:5.0f}'.format(frac, r, t-frac, b))
(Note 1: This code is in demo form; the parameters to findb() are y, the fractional part of r, and rhi, the square root of the largest small number. You may wish to change usage of parameters. Note 2: The
if round(math.sqrt(u))**2 == u: continue
line of code prevents findb() from returning perfect-square values of b, except for the value b=1, because no perfect square can improve upon the accuracy offered by b=1.)
Sample output follows. About a dozen lines have been elided in the middle. The first output line shows that this procedure yields b=51 to represent the fractional part of pi, which is the same value reported in some other answers.
frac sqrt(b) error b
0.14159 7.14143 -0.0001642 51
0.11975 4.12311 0.0033593 17
0.12230 4.12311 0.0008085 17
0.22150 9.21954 -0.0019586 85
0.22681 11.22497 -0.0018377 126
0.25946 2.23607 -0.0233893 5
0.30024 5.29150 -0.0087362 28
0.36772 8.36660 -0.0011170 70
0.42452 8.42615 0.0016309 71
...
0.93086 6.92820 -0.0026609 48
0.94677 8.94427 -0.0024960 80
0.96549 11.95826 -0.0072333 143
0.97693 11.95826 -0.0186723 143
With the following code added at the end of the program, the output shown below also appears. This shows closer approximations for the fractional part of pi.
frac, rhi = math.pi-3, 16
print (' frac sqrt(b) error b bMax')
while rhi < 1000:
b = findb(frac, rhi)
r = math.sqrt(b)
t = math.modf(r)[0] # Get fractional part of sqrt(b)
print ('{:11.7f} {:11.7f} {:13.9f} {:7.0f} {:7.0f}'.format(frac, r, t-frac, b,rhi**2))
rhi = 3*rhi/2
frac sqrt(b) error b bMax
0.1415927 7.1414284 -0.000164225 51 256
0.1415927 7.1414284 -0.000164225 51 576
0.1415927 7.1414284 -0.000164225 51 1296
0.1415927 7.1414284 -0.000164225 51 2916
0.1415927 7.1414284 -0.000164225 51 6561
0.1415927 120.1415831 -0.000009511 14434 14641
0.1415927 120.1415831 -0.000009511 14434 32761
0.1415927 233.1415879 -0.000004772 54355 73441
0.1415927 346.1415895 -0.000003127 119814 164836
0.1415927 572.1415909 -0.000001786 327346 370881
0.1415927 911.1415916 -0.000001023 830179 833569
I do not know if there is any kind of standard algorithm for this kind of problem, but it does intrigue me, so here is my attempt at developing an algorithm that finds the needed approximation.
Call the real number in question r. Then, first I assume that a can be negative, in that case we can reduce the problem and now only have to find a b such that the decimal part of sqrt(b) is a good approximation of the decimal part of r. Let us now write r as r = x.y with x being the integer and y the decimal part.
Now:
b = r^2
= (x.y)^2
= (x + .y)^2
= x^2 + 2 * x * .y + .y^2
= 2 * x * .y + .y^2 (mod 1)
We now only have to find an x such that 0 = .y^2 + 2 * x * .y (mod 1) (approximately).
Filling that x into the formulas above we get b and can then calculate a as a = r - b. (All of these calculations have to be carefully rounded of course.)
Now, for the time being I am not sure if there is a way to find this x without brute forcing it. But even then, one can simple use a simple loop to find an x good enough.
I am thinking of something like this(semi pseudo code):
max_diff_low = 0.01 // arbitrary accuracy
max_diff_high = 1 - max_diff_low
y = r % 1
v = y^2
addend = 2 * y
x = 0
while (v < max_diff_high && v > max_diff_low)
x++;
v = (v + addend) % 1
c = (x + y) ^ 2
b = round(c)
a = round(r - c)
Now, I think this algorithm is fairly efficient, while even allowing you to specify the wished accuracy of the approximation. One thing that could be done that would turn it into an O(1) algorithm is calculating all the x and putting them into a lookup table. If one only cares about the first three decimal digits of r(for example), the lookup table would only have 1000 values, which is only 4kb of memory(assuming that 32bit integers are used).
Hope this is helpful at all. If anyone finds anything wrong with the algorithm, please let me know in a comment and I will fix it.
EDIT:
Upon reflection I retract my claim of efficiency. There is in fact as far as I can tell no guarantee that the algorithm as outlined above will ever terminate, and even if it does, it might take a long time to find a very large x that solves the equation adequately.
One could maybe keep track of the best x found so far and relax the accuracy bounds over time to make sure the algorithm terminates quickly, at the possible cost of accuracy.
These problems are of course non-existent, if one simply pre-calculates a lookup table.

Split a number into three buckets with constraints

Is there a good algorithm to split a randomly generated number into three buckets, each with constraints as to how much of the total they may contain.
For example, say my randomly generated number is 1,000 and I need to split it into buckets a, b, and c.
These ranges are only an example. See my edit for possible ranges.
Bucket a may only be between 10% - 70% of the number (100 - 700)
Bucket b may only be between 10% - 50% of the number (100 - 500)
Bucket c may only be between 5% - 25% of the number (50 - 250)
a + b + c must equal the randomly generated number
You want the amounts assigned to be completely random so there's just as equal a chance of bucket a hitting its max as bucket c in addition to as equal a chance of all three buckets being around their percentage mean.
EDIT: The following will most likely always be true: low end of a + b + c < 100%, high end of a + b + c > 100%. These percentages are only to indicate acceptable values of a, b, and c. In a case where a is 10% while b and c are their max (50% and 25% respectively) the numbers would have to be reassigned since the total would not equal 100%. This is the exact case I'm trying to avoid by finding a way to assign these numbers in one pass.
I'd like to find a way to pick these number randomly within their range in one pass.
The problem is equivalent to selecting a random point in an N-dimensional object (in your example N=3), the object being defined by the equations (in your example):
0.1 <= x <= 0.7
0.1 <= y <= 0.5
0.05 <= z <= 0.25
x + y + z = 1 (*)
Clearly because of the last equation (*) one of the coordinates is redundant, i.e. picking values for x and y dictates z.
Eliminating (*) and one of the other equations leaves us with an (N-1)-dimensional box, e.g.
0.1 <= x <= 0.7
0.1 <= y <= 0.5
that is cut by the inequality
0.05 <= (1 - x - y) <= 0.25 (**)
that derives from (*) and the equation for z. This is basically a diagonal stripe through the box.
In order for the results to be uniform, I would just repeatedly sample the (N-1)-dimensional box, and accept the first sampled point that fulfills (**). Single-pass solutions might end up having biased distributions.
Update: Yes, you're right, the result is not uniformly distributed.
Let's say your percent values are natural numbers (if this assumption is wrong, you don't have to read further :) In that case I don't have a solution).
Let's define an event e as a tuple of 3 values (percentage of each bucket): e = (pa, pb, pc). Next, create all possible events en. What you have here is a tuple space consisting of a discrete number of events. All of the possible events should have the same possibility to occur.
Let's say we have a function f(n) => en. Then, all we have to do is take a random number n and return en in a single pass.
Now, the problem remains to create such a function f :)
In pseudo code, a very slow method (just for illustration):
function f(n) {
int c = 0
for i in [10..70] {
for j in [10..50] {
for k in [5..25] {
if(i + j + k == 100) {
if(n == c) {
return (i, j, k) // found event!
} else {
c = c + 1
}
}
}
}
}
}
What you have know is a single pass solution, but problem is only moved away. The function f is very slow. But you can do better: I think you can calculate everything a bit faster if you set your ranges correctly and calculate offsets instead of iterating through your ranges.
Is this clear enough?
First of all you probably have to adjust your ranges. 10% in bucket a is not possible, since you can't get condition a+b+c = number to hold.
Concerning your question: (1) Pick a random number for bucket a inside your range, then (2) update the range for bucket b with minimum and maximum percentage (you should only narrow the range). Then (3) pick a random number for bucket b. In the end c should be calculated that your condition holds (4).
Example:
n = 1000
(1) a = 40%
(2) range b [35,50], because 40+35+25 = 100%
(3) b = 45%
(4) c = 100-40-45 = 15%
Or:
n = 1000
(1) a = 70%
(2) range b [10,25], because 70+25+5 = 100%
(3) b = 20%
(4) c = 100-70-20 = 10%
It is to check whether all the events are uniformly distributed. If that should be a problem you might want to randomize the range update in step 2.

Derive integer factors of float value?

I have a difficult mathematical question that is breaking my brain, my whiteboard, and all my pens. I am working with a file that expresses 2 values, a multiplicand and a percentage. Both of those values must be integers. These two values are multiplied together to produce a range. Range is a float value.
My users edit the range, and I have to calculate a new percentage and multiplicand value. Confused yet? Here's an example:
Multiplicand: 25000 Apples
Percentage: 400 (This works out to .4% or .004)
Range: 100.0 Apples (Calculated by Multiplicand * Percentage)
To complicate things, the allowable values for Percentage are 0-100000. (Meaning 0-100%) Multiplicand is a value between 1 and 32bit int max (presumably unsigned).
I need to allow for users to input a range, like so:
Range: .04 Apples
And calculate the appropriate Percentage and Multiplicand. Using the first example:
OriginalMultiplicand: 25000 Apples
OriginalPercentage: 400 (This works out to .4% or .004)
OriginalRange: 100.0 Apples (Calculated by Multiplicand * Percentage)
NewRange: .01 Apples
NewPercentage: 40
NewMultiplicand: 25 Apples
The example calculation is easy, all that was required was adjusting down the multiplicand and percentage down by the scale factor of the new and old range. The problem arises when the user changes the value to something like 1400.00555. Suddenly I don't have a clean way to adjust the two values.
I need an algorithmic approach to getting values for M & P that produce the closest possible value to the desired range. Any suggestions?
To maximize the numbers of decimal points stored, you should use a P of 1, or 0.1%. If that overflows M, then increment P.
So for your example of 1400.00555, P is 1 and M is 1400006
Your algorithm would search for the lowest P such that M does not overflow. And you can do a binary search here.
public int binarySearch(int P0, int P1) {
P = (P1 - P0)/2;
if(P == P0) {
if(R/(P0/100f) does not overflows 32-bit int) {
return P0;
} else {
return P1;
}
}
if(R/(P/100f) does not overflows 32-bit int) {
return binarySearch(P0, P);
} else {
return binarSearch(P, P1);
}
}
P = binarySearch(1, 100000);
M = round(R/(P/100f));
(I had a bad method here, but I erased it because it sucked.)
EDIT:
There's got to be a better way than that. Let's rephrase the problem:
What you have is an arbitrary floating-point number. You want to represent this floating-point number with two integers. The integers, when multiplied together and then divided by 100000.0, are equal to the floating-point number. The only other constraint is that one of the integers must be equal to or less than 100000.
It's clear that you can't actually represent floating-point numbers accurately. In fact, you can ONLY represent numbers that are expressible in 1/100000s accurately, even if you have an infinite number of digits of precision in "multiplicand". You can represent 333.33333 accurately, with 33333333 as one number and 1 as the other; you just can't get any more 3s.
Given this limitation, I think your best bet is the following:
Multiply your float by 100000 in an integer format, probably a long or some variant of BigNumber.
Factor it. Record all the factors. It doesn't matter if you store them as 2^3 or 2*2*2 or what.
Grab as many factors as you can without the multiplication of them all exceeding 100000. That becomes your percent. (Don't try to do this perfectly; finding the optimal solution is an NP-hard problem.)
Take the rest of the factors and multiply them together. That's your multiplicand.
As I understand from your example, you could represent the range in 100000 different multiplicand * percentage. any choice of multiplicand will give you a satisfying value of percentage, and vice versa. So you have this equation in two variables:
Multiplicand * Percentage = 100.0
You should figure out another equation(constraint), to get a specific value of Multiplicand OR Percentage to solve this equation. Otherwise, you could choose Percentage to be any number between 0-100000 and just substitute it in the first equation to get the value of Multiplicand. I hope I understood the question correctly :)
Edit: OK, then you should factorize the range easily. Get the range, then try to factorize it by dividing range by percentage(2-100000). Once the reminder of division is zero you got the factors. This is a quick pseudo-code:
get range;
percentage = 2;
while(range % percentage != 0)
{
percentage++;
}
multiplicand = range / percentage;
All what you have to do now is to calculate your limits:
max of percentage = 100000;
max of multiplicand = 4294967295;
Max of range = 4294967295 * 100000 = 429496729500000 (15-digit);
your Max range consists of 15 digit at a maximum. double data types in most programming languages can represent it. Do the calculation using doubles and just convert the Multiplicand & Percentage to int at the end.
It seems you want to choose M and P such that R = (M * P) / 100000.
So M * P = 100000 * R, where you have to round the right-hand side to an integer.
I'd multiply the range by 100000, and then choose M and P as factors of the result so that they don't overflow their allowed ranges.
say you have
1) M * P = A
then you have a second value for A, so also new values for M and P, lets call then M2, P2 and A2:
2) M2 * P2 = A2
This I dont know for sure, but that is what you seem to be saying imho: the ratio has to stay the same, then
3) M/P = M2/P2
Now we have 3 equations and 2 unknowns M2 and P2
One way to solve it:
3) becomes
M/P = M2/P2
=>M2 = (M/P)*P2
than substitute that in 2)
(M/P)*P2*P2 = A2
=> P2*P2 = A2 * (P/M)
=> P2 = sqrt(A2 * (P/M))
so first solve P2, then M2 if i didn't make any mistakes
There will have to be some rounding if M2 and P2 have to be integers.
EDIT: i forgot about the integer percentage, so say
P = percentage/100000 or P*100000 = percentage
P2 = percentage2/100000 or P2*100000 = percentage2
so just solve for P2 and M2, and multiply P2 with 100000

Resources