I have a handful of functions, which are given some variables x, y, z, etc., and evaluate their respective multinomial with some constant coefficients at these points, returning the values. My variables are then set equal to these values, and the process is repeated. In psuedo-code:
repeat N times:
x_new = f1(x, y, z)
y_new = f2(x, y, z)
z_new = f3(x, y, z)
x = x_new
y = y_new
z = z_new
and functions are something like
f1(x, y, z)
return c0 + c1*x + c2*y + c3*z + c4*x*x + c5*x*y + c6*x*z ...
Every time I save the intermediate values of my variables, and later graph the result. For the sake of having enough points to make the graph of sufficient quality, I need around 10 million data points. I have code that does this fairly fast for 3 variables and multinomials of degree 3, but I'm seeking to expand this to support an arbitrary number of variables and multinomials of any degree. This is where things get slow, and I think I need a different approach.
My original (hard coded) c evaluator looked like:
double evaluateStep(double x, double y, double z, double *coeffs) {
double sum;
sum = coeffs[0];
sum += coeffs[1] * x;
sum += coeffs[2] * y;
sum += coeffs[3] * z;
sum += coeffs[4] * x * y;
sum += coeffs[5] * x * z;
sum += coeffs[6] * y * z;
sum += coeffs[7] * x * x;
sum += coeffs[8] * y * y;
sum += coeffs[9] * z * z;
sum += coeffs[10] * x * y * z;
sum += coeffs[11] * x * x * y;
sum += coeffs[12] * x * x * z;
sum += coeffs[13] * y * y * x;
sum += coeffs[14] * y * y * z;
sum += coeffs[15] * z * z * x;
sum += coeffs[16] * z * z * y;
sum += coeffs[17] * x * x * x;
sum += coeffs[18] * y * y * y;
sum += coeffs[19] * z * z * z;
return sum;
}
The generalized code:
double recursiveEval(double factor, double *position, int ndim, double **coeffs, int order) {
if (!order)
return factor * *((*coeffs)++);
double sum = 0;
for (int i = 0; i < ndim; i++)
sum += recursiveEval(factor * position[i], position + i, ndim - i, coeffs, order - 1);
return sum;
}
double evaluateNDStep(double *position, double *coeffs) {
double *coefficients = coeffs;
return recursiveEval(1, position, NUM_DIMENSIONS + 1, &coefficients, ORDER);
}
Of course, I'm sure this has just destroyed my compiler (gcc)'s ability to do common subexpression elimination, and other similar optimizations. I'm wondering if there's a better approach I could take. There is one bit of information that I haven't been able to take advantage of yet- all the coefficients are drawn at the start of the program from a pool of 25 (with repetition). These span [-1.2, 1.2] inclusive if that makes any difference.
I've considered computing a single term of the multinomial, and then transforming it into each of the other terms by simple variableA/variableB multiplications, but the amount of error that introduces may be a bit too much. I've also considered computing all terms up to degree n, and using those to construct the n+1 through 2n degree terms. One other option that I'm not sure how to implement is would be to factor these terms taking advantage of the fact that many terms will share the same coefficient (pidginhole principle), so it may be possible to bring it down to fewer required multiplications & additions than there are terms in the expanded expression. I have no clue how to go about doing this though, since it would have to happen at run time, and that'd cross into the area of JIT compilation of the expression or something.
As it stands, 3 variables at degree 3 (20 terms total) takes almost 6 seconds, and 4 variables at degree 3 (35 terms total) takes a bit over 13 seconds. These will not scale well into larger degrees or more variables (the formula for the number of distinct terms is nck(degree + variables, degree) where nck is "n choose k"). I'm looking for an way to improve the performance of the generalized code, ideally asymptotically (I suspect via partial factorization). I don't really care about the language; I'm writing this code in C, but if you prefer to present an algorithm in some other language that will not be a problem for me.
Related
I'm looking for a very fast way to compute (x-sin(x))/(x^3) for all x using IEEE floating point arithmetic and standard trigonometric functions. At 0, it should return 1/6.
For sin(x)/x, it's sufficient to check if x=0 and return 1, otherwise just compute it using standard floating point sin and division. For (1-cos(x))/x^2, if cos(x) <= 0, this expression is fine as is and otherwise express as (sin(x)/x)^2/(1+cos(x))
But I can't figure out how to express (x-sin(x))/x^3.
So far, the best I have is to compute the infinite sum until it converges: $\sum_0^{\infty}{1/4^(n+1)sin(x/2^n)/(x/2^n)(1-cos(x/2^n))/(x/2^n)^2}$
but I'd prefer a closed form
(1 - cos x) / x2 is fundamentally different from (x - sin x) / x3, in that unity can be constructed by trigonometric functions as sin2 x + cos2 x = 1, while the same is not true of x. This means we cannot transform the latter function into a numerically advantageous closed-form trigonometric formula. I thought long about this and also tried manipulating the formula with all trigonometric identities I am aware of. I would love to be proven wrong; that seems like a question for Math Stack Exchange. The easiest and most accurate way to implement the former function is
// (1-cos(x))/x**2
double cosm1_over_xsquared (double x)
{
if (fabs (x) < sqrt (DBL_EPSILON)) {
return 0.5;
} else {
double s = sin (x * 0.5) / x;
return 2.0 * s * s;
}
}
If the standard math library computes sin() with an error just slightly over half an ulp, this implementation computes (1 - cos x) / x2 with an error no larger than 4 ulp. As a side-note, this function also lends itself to the use of Kahan's self-correction technique, which he first demonstrated for the computation of (ex - 1) / x in
William M. Kahan, "Interval arithmetic in the proposed IEEE floating point arithmetic standard." In Karl L. E. Nickel (ed.), Interval Arithmetic 1980, Academic Press 1980, pp. 99-128.
// (1-cos(x))/x**2 on [-3, 3] using Kahan's self-compensation technique
double cosm1_over_xsquared_kahan (double x)
{
double u = cos (x);
double n = 1.0 - u;
if (n == 0.0) {
return 0.5;
}
double d = acos (u);
return n / (d * d);
}
If both cos() and acos() have a maximum error just slightly over half an ulp, this function returns results with an error of less than 5 ulps. Because cos, other than ex is a periodic function, this approach works only on the restricted interval noted in the code comment.
The above suggests that we should shoot for an implementation of (x - sin x) / x3 with a maximum error of about 4 ulp. Characterizing the naive computation, we find that it is adequate for |x| > 1 under this provision. Despite the narrow input domain for an alternate computation, Kahan's self-compensation technique does not work for this function. The old standby of math function implementers, a polynomial minimax approximation, works just fine, however. This results in the following code:
// (x-sin(x))/x**3
double sinmx_over_xcubed (double x)
{
if (fabs(x) < 1.0) { // minimax approximation
double x2 = x * x;
double p = 7.5475867852548673E-13;
p = p * x2 - 1.6057658525730946E-10;
p = p * x2 + 2.5052098906959416E-8;
p = p * x2 - 2.7557319191306421E-6;
p = p * x2 + 1.9841269841218293E-4;
p = p * x2 - 8.3333333333333055E-3;
p = p * x2 + 1.6666666666666666E-1;
return p;
} else {
return (x - sin (x)) / x / x / x;
}
}
Well the Taylor series for this expression is:
1/6 - x^2/120 + x^4/5040 + O(x^6) (converges when x!=0)
Which should be pretty good for most applications.
Addendum
If you are trying to find the limit at 0 for this: then, apply the L'Hôpital's rule since this is of form 0/0 for lim x->0
lim(x->0) (x-sin(x))/x^3 = lim(x->0) (1 - cos(x))/3x^2 = lim(x->0) = sin(x)/6x = lim(x->0) = 1/6 ;
In other words, it's probably best to use an if statement and for the case of (x = 0) and then use Taylor series which will be a LOT faster than doing floating point sin and cos unless you have purpose-built hardware or are using GPUs.
Suppose I have a function phi(x1,x2)=k1*x1+k2*x2 which I have evaluated over a grid where the grid is a square having boundaries at -100 and 100 in both x1 and x2 axis with some step size say h=0.1. Now I want to calculate this sum over the grid with which I'm struggling:
What I was trying :
clear all
close all
clc
D=1; h=0.1;
D1 = -100;
D2 = 100;
X = D1 : h : D2;
Y = D1 : h : D2;
[x1, x2] = meshgrid(X, Y);
k1=2;k2=2;
phi = k1.*x1 + k2.*x2;
figure(1)
surf(X,Y,phi)
m1=-500:500;
m2=-500:500;
[M1,M2,X1,X2]=ndgrid(m1,m2,X,Y)
sys=#(m1,m2,X,Y) (k1*h*m1+k2*h*m2).*exp((-([X Y]-h*[m1 m2]).^2)./(h^2*D))
sum1=sum(sys(M1,M2,X1,X2))
Matlab says error in ndgrid, any idea how I should code this?
MATLAB shows:
Error using repmat
Requested 10001x1001x2001x2001 (298649.5GB) array exceeds maximum array size preference. Creation of arrays greater
than this limit may take a long time and cause MATLAB to become unresponsive. See array size limit or preference
panel for more information.
Error in ndgrid (line 72)
varargout{i} = repmat(x,s);
Error in new_try1 (line 16)
[M1,M2,X1,X2]=ndgrid(m1,m2,X,Y)
Judging by your comments and your code, it appears as though you don't fully understand what the equation is asking you to compute.
To obtain the value M(x1,x2) at some given (x1,x2), you have to compute that sum over Z2. Of course, using a numerical toolbox such as MATLAB, you could only ever hope to compute over some finite range of Z2. In this case, since (x1,x2) covers the range [-100,100] x [-100,100], and h=0.1, it follows that mh covers the range [-1000, 1000] x [-1000, 1000]. Example: m = (-1000, -1000) gives you mh = (-100, -100), which is the bottom-left corner of your domain. So really, phi(mh) is just phi(x1,x2) evaluated on all of your discretised points.
As an aside, since you need to compute |x-hm|^2, you can treat x = x1 + i x2 as a complex number to make use of MATLAB's abs function. If you were strictly working with vectors, you would have to use norm, which is OK too, but a bit more verbose. Thus, for some given x=(x10, x20), you would compute x-hm over the entire discretised plane as (x10 - x1) + i (x20 - x2).
Finally, you can compute 1 term of M at a time:
D=1; h=0.1;
D1 = -100;
D2 = 100;
X = (D1 : h : D2); % X is in rows (dim 2)
Y = (D1 : h : D2)'; % Y is in columns (dim 1)
k1=2;k2=2;
phi = k1*X + k2*Y;
M = zeros(length(Y), length(X));
for j = 1:length(X)
for i = 1:length(Y)
% treat (x - hm) as a complex number
x_hm = (X(j)-X) + 1i*(Y(i)-Y); % this computes x-hm for all m
M(i,j) = 1/(pi*D) * sum(sum(phi .* exp(-abs(x_hm).^2/(h^2*D)), 1), 2);
end
end
By the way, this computation takes quite a long time. You can consider either increasing h, reducing D1 and D2, or changing all three of them.
I have an interesting math/CS problem. I need to sample from a possibly infinite random sequence of increasing values, X, with X(i) > X(i-1), with some distribution between them. You could think of this as the sum of a different sequence D of uniform random numbers in [0,d). This is easy to do if you start from the first one and go from there; you just add a random amount to the sum each time. But the catch is, I want to be able to get any element of the sequence in faster than O(n) time, ideally O(1), without storing the whole list. To be concrete, let's say I pick d=1, so one possibility for D (given a particular seed) and its associated X is:
D={.1, .5, .2, .9, .3, .3, .6 ...} // standard random sequence, elements in [0,1)
X={.1, .6, .8, 1.7, 2.0, 2.3, 2.9, ...} // increasing random values; partial sum of D
(I don't really care about D, I'm just showing one conceptual way to construct X, my sequence of interest.) Now I want to be able to compute the value of X[1] or X[1000] or X[1000000] equally fast, without storing all the values of X or D. Can anyone point me to some clever algorithm or a way to think about this?
(Yes, what I'm looking for is random access into a random sequence -- with two different meanings of random. Makes it hard to google for!)
Since D is pseudorandom, there’s a space-time tradeoff possible:
O(sqrt(n))-time retrievals using O(sqrt(n)) storage locations (or,
in general, O(n**alpha)-time retrievals using O(n**(1-alpha))
storage locations). Assume zero-based indexing and that
X[n] = D[0] + D[1] + ... + D[n-1]. Compute and store
Y[s] = X[s**2]
for all s**2 <= n in the range of interest. To look up X[n], let
s = floor(sqrt(n)) and return
Y[s] + D[s**2] + D[s**2+1] + ... + D[n-1].
EDIT: here's the start of an approach based on the following idea.
Let Dist(1) be the uniform distribution on [0, d) and let Dist(k) for k > 1 be the distribution of the sum of k independent samples from Dist(1). We need fast, deterministic methods to (i) pseudorandomly sample Dist(2**p) and (ii) given that X and Y are distributed as Dist(2**p), pseudorandomly sample X conditioned on the outcome of X + Y.
Now imagine that the D array constitutes the leaves of a complete binary tree of size 2**q. The values at interior nodes are the sums of the values at their two children. The naive way is to fill the D array directly, but then it takes a long time to compute the root entry. The way I'm proposing is to sample the root from Dist(2**q). Then, sample one child according to Dist(2**(q-1)) given the root's value. This determines the value of the other, since the sum is fixed. Work recursively down the tree. In this way, we look up tree values in time O(q).
Here's an implementation for Gaussian D. I'm not sure it's working properly.
import hashlib, math
def random_oracle(seed):
h = hashlib.sha512()
h.update(str(seed).encode())
x = 0.0
for b in h.digest():
x = ((x + b) / 256.0)
return x
def sample_gaussian(variance, seed):
u0 = random_oracle((2 * seed))
u1 = random_oracle(((2 * seed) + 1))
return (math.sqrt((((- 2.0) * variance) * math.log((1.0 - u0)))) * math.cos(((2.0 * math.pi) * u1)))
def sample_children(sum_outcome, sum_variance, seed):
difference_outcome = sample_gaussian(sum_variance, seed)
return (((sum_outcome + difference_outcome) / 2.0), ((sum_outcome - difference_outcome) / 2.0))
def sample_X(height, i):
assert (0 <= i <= (2 ** height))
total = 0.0
z = sample_gaussian((2 ** height), 0)
seed = 1
for j in range(height, 0, (- 1)):
(x, y) = sample_children(z, (2 ** j), seed)
assert (abs(((x + y) - z)) <= 1e-09)
seed *= 2
if (i >= (2 ** (j - 1))):
i -= (2 ** (j - 1))
total += x
z = y
seed += 1
else:
z = x
return total
def test(height):
X = [sample_X(height, i) for i in range(((2 ** height) + 1))]
D = [(X[(i + 1)] - X[i]) for i in range((2 ** height))]
mean = (sum(D) / len(D))
variance = (sum((((d - mean) ** 2) for d in D)) / (len(D) - 1))
print(mean, math.sqrt(variance))
D.sort()
with open('data', 'w') as f:
for d in D:
print(d, file=f)
if (__name__ == '__main__'):
test(10)
If you do not record the values in X, and if you do not remember the values in X you have previously generate, there is no way to guarantee that the elements in X you generate (on the fly) will be in increasing order. It furthermore seems like there is no way to avoid O(n) time worst-case per query, if you don't know how to quickly generate the CDF for the sum of the first m random variables in D for any choice of m.
If you want the ith value X(i) from a particular realization, I can't see how you could do this without generating the sequence up to i. Perhaps somebody else can come up with something clever.
Would you be willing to accept a value which is plausible in the sense that it has the same distribution as the X(i)'s you would observe across multiple realizations of the X process? If so, it should be pretty easy. X(i) will be asymptotically normally distributed with mean i/2 (since it's the sum of the Dk's for k=1,...,i, the D's are Uniform(0,1), and the expected value of a D is 1/2) and variance i/12 (since the variance of a D is 1/12 and the variance of a sum of independent random variables is the sum of their variances).
Because of the asymptotic aspect, I'd pick some threshold value for i to switch over from direct summing to using the normal. For example, if you use i = 12 as your threshold you would use actual summing of uniforms for values of i from 1 to 11, and generate a Normal(i/2, sqrt(i/12)) value for i >. That's an O(1) algorithm since the total work is bounded by your threshold, and the results produced will be distributionally representative of what you would see if you actually went through the summing.
This generates a random float with a certain precision level in Ruby:
def generate(min,max,precision)
number = rand * (max - min) + min
factor = 10.0 ** precision
return (number * factor).to_i / factor
end
Someone recently suggested me that it may be simpler to do this:
var = rand(100) * 1.0
var /= 10
Which generates a random float from 0.0 to 10.0.
This sounds good, but I'm not sure about how to control the precision level with that method. How may I make the equivalent to my first method, but using this suggestion?
If you have a range x1 to x2, and you want a minimum increment ("precision") of delta, then you need
n = (x2 - x1)/delta +1
Different integers, which you scale with
rand(n) * delta + x1
To give random numbers between x1 and x2 inclusive, with an increment of delta
How do you define "precision"? The relationship between delta and precision is given ( according to you formula) by
Delta = 10**(-precision)
Or
delta = 1.0 / 10**precision
I'm about to optimize a problem that is defined by n (n>=1, typically n=4) non-negative variables. This is not a n-dimensional problem since the sum of all the variables needs to be 1.
The most straightforward approach would be for each x_i to scan the entire range 0<=x_i<1, and then normalizing all the values to the sum of all the x's. However, this approach introduces redundancy, which is a problem for many optimization algorithms that rely on stochastic sampling of the solution space (genetic algorithm, taboo search and others). Is there any alternative algorithm that can perform this task?
What do I mean by redundancy?
Take two dimensional case as an example. Without the constrains, this would be a two-dimensional problem which would require optimizing two variables. However, due to the requirement that X1 + X2 == 0, one only needs to optimize one variable, since X2 is determined by X1 and vice versa. Had one decided to scan X1 and X2 independently and normalizing them to the sum of 1, then many solution candidates would have been identical vis-a-vis the problem. For example (X1==0.1, X2==0.1) is identical to (X1==0.5, X2==0.5).
If you are dealing with real valued variables then arriving with 2 samples that become identical is quite unlikely. However you do have the problem that your samples would not be uniform. You are much more likely to choose (0.5, 0.5) than (1.0, 0). Oneway of fixing this is subsampling. Basically what you do is that when you are shrinking space along a certain point, you shrink the probability of choosing it.
So basically what you are doing is mapping all the points that are inside the unit cube that satisfy that are in the same direction, map to a single points. These points in the same direction form a line. The longer the line, the larger the probability that you will choose the projected point. Hence you want to bias the probability of choosing a point by the inverse of the length of that line.
Here is the code that can do it(Assuming you are looking for x_is to sum up to 1):
while(true) {
maximum = 0;
norm = 0;
sum = 0;
for (i = 0; i < N; i++) {
x[i] = random(0,1);
maximum = max(x[i], max);
sum += x[i];
norm += x[i] * x[i];
}
norm = sqrt(norm);
length_of_line = norm/maximum;
sample_probability = 1/length_of_line;
if (sum == 0 || random(0,1) > sample_probability) {
continue;
} else {
for (i = 0; i < N; i++) {
x[i] = x[i] /sum;
}
return x;
}
Here is the same function provided earlier by Amit Prakash, translated to python
import numpy as np
def f(N):
while(True):
count += 1
x = np.random.rand(N)
mxm = np.max(x)
theSum = np.sum(x)
nrm = np.sqrt(np.sum(x * x))
length_of_line = nrm / mxm
sample_probability = 1 / length_of_line
if theSum == 0 or rand() > sample_probability:
continue
else:
x = x / theSum
return x