This generates a random float with a certain precision level in Ruby:
def generate(min,max,precision)
number = rand * (max - min) + min
factor = 10.0 ** precision
return (number * factor).to_i / factor
end
Someone recently suggested me that it may be simpler to do this:
var = rand(100) * 1.0
var /= 10
Which generates a random float from 0.0 to 10.0.
This sounds good, but I'm not sure about how to control the precision level with that method. How may I make the equivalent to my first method, but using this suggestion?
If you have a range x1 to x2, and you want a minimum increment ("precision") of delta, then you need
n = (x2 - x1)/delta +1
Different integers, which you scale with
rand(n) * delta + x1
To give random numbers between x1 and x2 inclusive, with an increment of delta
How do you define "precision"? The relationship between delta and precision is given ( according to you formula) by
Delta = 10**(-precision)
Or
delta = 1.0 / 10**precision
Related
I have a handful of functions, which are given some variables x, y, z, etc., and evaluate their respective multinomial with some constant coefficients at these points, returning the values. My variables are then set equal to these values, and the process is repeated. In psuedo-code:
repeat N times:
x_new = f1(x, y, z)
y_new = f2(x, y, z)
z_new = f3(x, y, z)
x = x_new
y = y_new
z = z_new
and functions are something like
f1(x, y, z)
return c0 + c1*x + c2*y + c3*z + c4*x*x + c5*x*y + c6*x*z ...
Every time I save the intermediate values of my variables, and later graph the result. For the sake of having enough points to make the graph of sufficient quality, I need around 10 million data points. I have code that does this fairly fast for 3 variables and multinomials of degree 3, but I'm seeking to expand this to support an arbitrary number of variables and multinomials of any degree. This is where things get slow, and I think I need a different approach.
My original (hard coded) c evaluator looked like:
double evaluateStep(double x, double y, double z, double *coeffs) {
double sum;
sum = coeffs[0];
sum += coeffs[1] * x;
sum += coeffs[2] * y;
sum += coeffs[3] * z;
sum += coeffs[4] * x * y;
sum += coeffs[5] * x * z;
sum += coeffs[6] * y * z;
sum += coeffs[7] * x * x;
sum += coeffs[8] * y * y;
sum += coeffs[9] * z * z;
sum += coeffs[10] * x * y * z;
sum += coeffs[11] * x * x * y;
sum += coeffs[12] * x * x * z;
sum += coeffs[13] * y * y * x;
sum += coeffs[14] * y * y * z;
sum += coeffs[15] * z * z * x;
sum += coeffs[16] * z * z * y;
sum += coeffs[17] * x * x * x;
sum += coeffs[18] * y * y * y;
sum += coeffs[19] * z * z * z;
return sum;
}
The generalized code:
double recursiveEval(double factor, double *position, int ndim, double **coeffs, int order) {
if (!order)
return factor * *((*coeffs)++);
double sum = 0;
for (int i = 0; i < ndim; i++)
sum += recursiveEval(factor * position[i], position + i, ndim - i, coeffs, order - 1);
return sum;
}
double evaluateNDStep(double *position, double *coeffs) {
double *coefficients = coeffs;
return recursiveEval(1, position, NUM_DIMENSIONS + 1, &coefficients, ORDER);
}
Of course, I'm sure this has just destroyed my compiler (gcc)'s ability to do common subexpression elimination, and other similar optimizations. I'm wondering if there's a better approach I could take. There is one bit of information that I haven't been able to take advantage of yet- all the coefficients are drawn at the start of the program from a pool of 25 (with repetition). These span [-1.2, 1.2] inclusive if that makes any difference.
I've considered computing a single term of the multinomial, and then transforming it into each of the other terms by simple variableA/variableB multiplications, but the amount of error that introduces may be a bit too much. I've also considered computing all terms up to degree n, and using those to construct the n+1 through 2n degree terms. One other option that I'm not sure how to implement is would be to factor these terms taking advantage of the fact that many terms will share the same coefficient (pidginhole principle), so it may be possible to bring it down to fewer required multiplications & additions than there are terms in the expanded expression. I have no clue how to go about doing this though, since it would have to happen at run time, and that'd cross into the area of JIT compilation of the expression or something.
As it stands, 3 variables at degree 3 (20 terms total) takes almost 6 seconds, and 4 variables at degree 3 (35 terms total) takes a bit over 13 seconds. These will not scale well into larger degrees or more variables (the formula for the number of distinct terms is nck(degree + variables, degree) where nck is "n choose k"). I'm looking for an way to improve the performance of the generalized code, ideally asymptotically (I suspect via partial factorization). I don't really care about the language; I'm writing this code in C, but if you prefer to present an algorithm in some other language that will not be a problem for me.
I want to calculate an equation within a controller(Arduino)
y = -0.0000000104529251928664x^3 + 0.0000928316793270531x^2 - 0.282333029643959x + 297.661280719026
Now the decimal values of the coefficients are important because "x" varies in thousands so cube term cannot be ignored. I have tried manipulating the equation in excel to reduce the coefficients but R^2 is lost in the process and I would like to avoid that.
Max variable size available in Arduino is 4byte. And on google search, I was not able to find an appropriate solution.
Thank you for your time.
Since
-0.0000000104529251928664 ^ (1/3) = - 0.0021864822
0.0000928316793270531 ^ (1/2) = 0.00963491978
The formula
y = -0.0000000104529251928664x^3 + 0.0000928316793270531x^2 - 0.282333029643959x + 297.661280719026
Can be rewritten:
y = -(0.0021864822 * x)^3 + (0.00963491978 * x)^2 - 0.282333029643959 * x + 297.661280719026
Rounding all coefficients to 10 decimal places, we get:
y = -(0.0021864822 * x)^3 + (0.00963491978 * x)^2 - 0.2823330296 * x + 297.6612807
But I don't know Arduino, I'm not sure what the correct number of decimal places is, nor do I know what the compiler will accept or refuse.
I have a moving graphic whose velocity decays geometrically every frame. I want to find the initial velocity that will make the graphic travel a desired distance in a given number of frames.
Using these variables:
v initial velocity
r rate
d distance
I can come up with d = v * (r0 + r1 + r2 + ...)
So if I want to find the v to travel 200 pixels in 3 frames with a decay rate of 90%, I would adapt to:
d = 200
r = .9
v = d / (r0 + r1 + r2)
That doesn't translate well to code, since I have to edit the expression if the number of frames changes. The only solution I can think of is this (in no specific language):
r = .9
numFrames = 3
d = 200
sum = 1
for (i = 1; i < numFrames; i++) {
sum = sum + power(r, i);
}
v = d / sum;
Is there a better way to do this without using a loop?
(I wouldn't be surprised if there is a mistake in there somewhere... today is just one of those days..)
What you have here is a geometric sequence. See the link:
http://www.mathsisfun.com/algebra/sequences-sums-geometric.html
To find the sum of a geometric sequence, you use this formula:
sum = a * ((1 - r^n) / (1 - r))
Since you are looking for a, the initial velocity, move the terms around:
a = sum * ((1-r) / (1 - r^n))
In Java:
int distanceInPixels = SOME_INTEGER;
int decayRate = SOME_DECIMAl;
int numberOfFrames = SOME_INTEGER;
int initialVelocity; //this is what we need to find
initialVelocity = distanceinPixel * ((1-decayRate) / (1-Math.pow(decayRate, NumberOfFrames)));
Using this formula you can get any one of the four variables if you know the values of the other three. Enjoy!
According to http://mikestoolbox.com/powersum.html, you should be able to reduce your for loop to:
F(x) = (x^n - 1)/(x-1)
I have an interesting math/CS problem. I need to sample from a possibly infinite random sequence of increasing values, X, with X(i) > X(i-1), with some distribution between them. You could think of this as the sum of a different sequence D of uniform random numbers in [0,d). This is easy to do if you start from the first one and go from there; you just add a random amount to the sum each time. But the catch is, I want to be able to get any element of the sequence in faster than O(n) time, ideally O(1), without storing the whole list. To be concrete, let's say I pick d=1, so one possibility for D (given a particular seed) and its associated X is:
D={.1, .5, .2, .9, .3, .3, .6 ...} // standard random sequence, elements in [0,1)
X={.1, .6, .8, 1.7, 2.0, 2.3, 2.9, ...} // increasing random values; partial sum of D
(I don't really care about D, I'm just showing one conceptual way to construct X, my sequence of interest.) Now I want to be able to compute the value of X[1] or X[1000] or X[1000000] equally fast, without storing all the values of X or D. Can anyone point me to some clever algorithm or a way to think about this?
(Yes, what I'm looking for is random access into a random sequence -- with two different meanings of random. Makes it hard to google for!)
Since D is pseudorandom, there’s a space-time tradeoff possible:
O(sqrt(n))-time retrievals using O(sqrt(n)) storage locations (or,
in general, O(n**alpha)-time retrievals using O(n**(1-alpha))
storage locations). Assume zero-based indexing and that
X[n] = D[0] + D[1] + ... + D[n-1]. Compute and store
Y[s] = X[s**2]
for all s**2 <= n in the range of interest. To look up X[n], let
s = floor(sqrt(n)) and return
Y[s] + D[s**2] + D[s**2+1] + ... + D[n-1].
EDIT: here's the start of an approach based on the following idea.
Let Dist(1) be the uniform distribution on [0, d) and let Dist(k) for k > 1 be the distribution of the sum of k independent samples from Dist(1). We need fast, deterministic methods to (i) pseudorandomly sample Dist(2**p) and (ii) given that X and Y are distributed as Dist(2**p), pseudorandomly sample X conditioned on the outcome of X + Y.
Now imagine that the D array constitutes the leaves of a complete binary tree of size 2**q. The values at interior nodes are the sums of the values at their two children. The naive way is to fill the D array directly, but then it takes a long time to compute the root entry. The way I'm proposing is to sample the root from Dist(2**q). Then, sample one child according to Dist(2**(q-1)) given the root's value. This determines the value of the other, since the sum is fixed. Work recursively down the tree. In this way, we look up tree values in time O(q).
Here's an implementation for Gaussian D. I'm not sure it's working properly.
import hashlib, math
def random_oracle(seed):
h = hashlib.sha512()
h.update(str(seed).encode())
x = 0.0
for b in h.digest():
x = ((x + b) / 256.0)
return x
def sample_gaussian(variance, seed):
u0 = random_oracle((2 * seed))
u1 = random_oracle(((2 * seed) + 1))
return (math.sqrt((((- 2.0) * variance) * math.log((1.0 - u0)))) * math.cos(((2.0 * math.pi) * u1)))
def sample_children(sum_outcome, sum_variance, seed):
difference_outcome = sample_gaussian(sum_variance, seed)
return (((sum_outcome + difference_outcome) / 2.0), ((sum_outcome - difference_outcome) / 2.0))
def sample_X(height, i):
assert (0 <= i <= (2 ** height))
total = 0.0
z = sample_gaussian((2 ** height), 0)
seed = 1
for j in range(height, 0, (- 1)):
(x, y) = sample_children(z, (2 ** j), seed)
assert (abs(((x + y) - z)) <= 1e-09)
seed *= 2
if (i >= (2 ** (j - 1))):
i -= (2 ** (j - 1))
total += x
z = y
seed += 1
else:
z = x
return total
def test(height):
X = [sample_X(height, i) for i in range(((2 ** height) + 1))]
D = [(X[(i + 1)] - X[i]) for i in range((2 ** height))]
mean = (sum(D) / len(D))
variance = (sum((((d - mean) ** 2) for d in D)) / (len(D) - 1))
print(mean, math.sqrt(variance))
D.sort()
with open('data', 'w') as f:
for d in D:
print(d, file=f)
if (__name__ == '__main__'):
test(10)
If you do not record the values in X, and if you do not remember the values in X you have previously generate, there is no way to guarantee that the elements in X you generate (on the fly) will be in increasing order. It furthermore seems like there is no way to avoid O(n) time worst-case per query, if you don't know how to quickly generate the CDF for the sum of the first m random variables in D for any choice of m.
If you want the ith value X(i) from a particular realization, I can't see how you could do this without generating the sequence up to i. Perhaps somebody else can come up with something clever.
Would you be willing to accept a value which is plausible in the sense that it has the same distribution as the X(i)'s you would observe across multiple realizations of the X process? If so, it should be pretty easy. X(i) will be asymptotically normally distributed with mean i/2 (since it's the sum of the Dk's for k=1,...,i, the D's are Uniform(0,1), and the expected value of a D is 1/2) and variance i/12 (since the variance of a D is 1/12 and the variance of a sum of independent random variables is the sum of their variances).
Because of the asymptotic aspect, I'd pick some threshold value for i to switch over from direct summing to using the normal. For example, if you use i = 12 as your threshold you would use actual summing of uniforms for values of i from 1 to 11, and generate a Normal(i/2, sqrt(i/12)) value for i >. That's an O(1) algorithm since the total work is bounded by your threshold, and the results produced will be distributionally representative of what you would see if you actually went through the summing.
I am trying to compute the IEEE-754 32-bit Floating Point Square Root of various inputs but for one particular input the below algorithm based upon the Newton-Raphson method won't converge, I am wondering what I can do to fix the problem? For the platform I am designing I have a 32-bit floating point adder/subtracter, multiplier, and divider.
For input 0x7F7FFFFF (3.4028234663852886E38)., the algorithm won't converge to the correct answer of 18446743523953729536.000000 This algorithm's answer gives 18446743523953737728.000000.
I am using MATLAB to implement my code before I implement this in hardware. I can only use single precision floating point values, (SO NO DOUBLES).
clc; clear; close all;
% Input
R = typecast(uint32(hex2dec(num2str(dec2hex(((hex2dec('7F7FFFFF'))))))),'single')
% Initial estimate
OneOverRoot2 = single(1/sqrt(2));
Root2 = single(sqrt(2));
% Get low and high bits of input R
hexdata_high = bitand(bitshift(hex2dec(num2hex(single(R))),-16),hex2dec('ffff'));
hexdata_low = bitand(hex2dec(num2hex(single(R))),hex2dec('ffff'));
% Change exponent of input to -1 to get Mantissa
temp = bitand(hexdata_high,hex2dec('807F'));
Expo = bitshift(bitand(hexdata_high,hex2dec('7F80')),-7);
hexdata_high = bitor(temp,hex2dec('3F00'));
b = typecast(uint32(hex2dec(num2str(dec2hex(((bitshift(hexdata_high,16)+ hexdata_low)))))),'single');
% If exponent is odd ...
if (bitand(Expo,1))
% Pretend the mantissa [0.5 ... 1.0) is multiplied by 2 as Expo is odd,
% so it now has the value [1.0 ... 2.0)
% Estimate the sqrt(mantissa) as [1.0 ... sqrt(2))
% IOW: linearly map (0.5 ... 1.0) to (1.0 ... sqrt(2))
Mantissa = (Root2 - 1.0)/(1.0 - 0.5)*(b - 0.5) + 1.0;
else
% The mantissa is in range [0.5 ... 1.0)
% Estimate the sqrt(mantissa) as [1/sqrt(2) ... 1.0)
% IOW: linearly map (0.5 ... 1.0) to (1/sqrt(2) ... 1.0)
Mantissa = (1.0 - OneOverRoot2)/(1.0 - 0.5)*(b - 0.5) + OneOverRoot2;
end
newS = Mantissa*2^(bitshift(Expo-127,-1));
S=newS
% S = (S + R/S)/2 method
for j = 1:6
fprintf('S %u %f %f\n', j, S, (S-sqrt(R)));
S = single((single(S) + single(single(R)/single(S))))/2;
S = single(S);
end
goodaccuracy = (abs((single(S)-single(sqrt(single(R)))))) < 2^-23
difference = (abs((single(S)-single(sqrt(single(R))))))
% Get hexadecimal output
hexdata_high = (bitand(bitshift(hex2dec(num2hex(single(S))),-16),hex2dec('ffff')));
hexdata_low = (bitand(hex2dec(num2hex(single(S))),hex2dec('ffff')));
fprintf('FLOAT: T Input: %e\t\tCorrect: %e\t\tMy answer: %e\n', R, sqrt(R), S);
fprintf('output hex = 0x%04X%04X\n',hexdata_high,hexdata_low);
out = hex2dec(num2hex(single(S)));
I took a whack at this. Here's what I came up with:
float mysqrtf(float f) {
if (f < 0) return 0.0f/0.0f;
if (f == 1.0f / 0.0f) return f;
if (f != f) return f;
// half-ass an initial guess of 1.0.
int expo;
float foo = frexpf(f, &expo);
float s = 1.0;
if (expo & 1) foo *= 2, expo--;
// this is the only case for which what's below fails.
if (foo == 0x0.ffffffp+0) return ldexpf(0x0.ffffffp+0, expo/2);
// do four newton iterations.
for (int i = 0; i < 4; i++) {
float diff = s*s-foo;
diff /= s;
s -= diff/2;
}
// do one last newton iteration, computing s*s-foo exactly.
float scal = s >= 1 ? 4096 : 2048;
float shi = (s + scal) - scal; // high 12 bits of significand
float slo = s - shi; // rest of significand
float diff = shi * shi - foo; // subtraction exact by sterbenz's theorem
diff += 2 * shi * slo; // opposite signs; exact by sterbenz's theorem
diff += slo * slo;
diff /= s; // diff == fma(s, s, -foo) / s.
s -= diff/2;
return ldexpf(s, expo/2);
}
The first thing to analyse is the formula (s*s-foo)/s in floating-point arithmetic. If s is a sufficiently good approximation to sqrt(foo), Sterbenz's theorem tells us that the numerator is within an ulp(foo) of the right answer --- all of that error is approximation error from computing s*s. Then we divide by s; this gives us at worst another half-ulp of approximation error. So, even without a fused multiply-add, diff is within 1.5 ulp of what it should be. And we divide it by two.
Notice that the initial guess doesn't in and of itself matter as long as you follow it up with enough Newton iterations.
Measure the error of an approximation s to sqrt(foo) by abs(s - foo/s). The error of my initial guess of 1 is at most 1. A Newton iteration in exact arithmetic squares the error and divides it by 4. A Newton iteration in floating-point arithmetic --- the kind I do four times --- squares the error, divides it by 4, and kicks in another 0.75 ulp of error. You do this four times and you find you have a relative error at most 0x0.000000C4018384, which is about 0.77 ulp. This means that four Newton iterations yield a faithfully-rounded result.
I do a fifth Newton step to get a correctly-rounded square root. The reason why it works is a little more intricate.
shi holds the "top half" of s while slo holds the "bottom half." The last 12 bits in each significand will be zero. This means, in particular, that shi * shi and shi * slo and slo * slo are exactly representable as floats.
s*s is within two ulps of foo. shi*shi is within 2047 ulps of s*s. Thus shi * shi - foo is within 2049 ulps of zero; in particular, it's exactly representable and less than 2-10.
You can check that you can add 2 * shi * slo and get an exactly-representable result that's within 2-22 of zero and then add slo*slo and get an exactly representable result --- s*s-foo computed exactly.
When you divide by s, you kick in an additional half-ulp of error, which is at most 2-48 here since our error was already so small.
Now we do a Newton step. We've computed the current error correctly to within 2-46. Adding half of it to s gives us the square root to within 3*2-48.
To turn this into a guarantee of correct rounding, we need to prove that there are no floats between 1/2 and 2, other than the one I special-cased, whose square roots are within 3*2-48 of a midpoint between two consecutive floats. You can do some error analysis, get a Diophantine equation, find all of the solutions of that Diophantine equation, find which inputs they correspond to, and work out what the algorithm does on those. (If you do this, there is one "physical" solution and a bunch of "unphysical" solutions. The one real solution is the only thing I special-cased.) There may be a cleaner way, however.