I'm taking a course in Numerical Methods and I've been requested to implement the famous Monte Carlo algorithm to find pi that you can find here.
I had no difficulties in writing the code with an arbitrary number of trials:
REAL(8) FUNCTION distance(xvalue, yvalue) RESULT(dist)
REAL(8), INTENT(in) :: xvalue, yvalue
dist = SQRT(xvalue**2 + yvalue**2)
REAL(8) :: p, x, y
REAL(8), EXTERNAL :: distance
REAL(8) :: pi_last, pi
INTEGER :: npc, npt, i
npc = 0
npt = 0
pi = 1.0
SEED(1) = 12345
DO i=1, 1000000000
x = p
y = p
npt = npt + 1
IF (distance(x, y) < 1.0) THEN
npc = npc + 1
pi_last = pi
pi = 4.0*(npc*1.0)/(npt*1.0)
PRINT*, 'Pi:', pi
I noticed that it converges approximately as sqrt(N of steps). Now I have to stop the algorithm at a certain precision, so I created an endless DO loop with an EXIT inside an IF statement:
REAL(8) FUNCTION distance(xvalue, yvalue) RESULT(dist)
REAL(8), INTENT(in) :: xvalue, yvalue
dist = SQRT(xvalue**2 + yvalue**2)
REAL(8) :: p, x, y
REAL(8), EXTERNAL :: distance
REAL(8) :: pi_last, pi
INTEGER :: npc, npt, i
npc = 0
npt = 0
pi = 1.0
SEED(1) = 12345
x = p
y = p
npt = npt + 1
IF (distance(x, y) < 1.0) THEN
npc = npc + 1
pi_last = pi
pi = 4.0*(npc*1.0)/(npt*1.0)
IF ( ABS(pi - pi_last) < 0.000001 .AND. pi - pi_last /= 0) THEN
PRINT*, 'Pi:', pi
The problem is that this returns a value of pi that doesn't have the precision I asked for. I get the logic behind it: if I get two consecutive values far from pi but close to each other, the condition will be satisfied and the program will exit the DO statement. The problem is that I don't know how to modify it in order to get a precision decided by me. So the question is:
How do I implement this algorithm in a manner such that I can decide the precision of pi in output?
EDIT: Ok, I implemented both of your solutions and they work, but only for 10^(-1), 10^(-3) and 10^(-5). I think it's a problem of the pseudorandom sequence if for 10^(-2) and 10^(-4) it returns an incorrect value of pi.
It's not enough to specify a desired precision -- you also need to allow some chance that the precision target is not met. Then you can solve for the number of trials in (e.g.) Hoeffding's inequality to meet your desired precision with the desired probability (as you've observed, n needs to be about the square root of one over the precision to succeed with constant probability).
In an ideal setting (perfect random number generator generating real numbers in the mathematical sense) your variable npc is a random variable with binomial distribution B(n,π/4) where n is npt from your program. Its expected value is n*π/4, so you correctly compute the approximation of π as pi=4*npc/npt. Now this approximation can take all values from 0 to 4 no matter how many loop iterations you calculate, because npc can take all values from 0 to npt. For a range around π you can only give a probability (using c as shorthand for npc; P denotes the probability of an event):
P(|pi - π| < d) = P(-d < pi - π < d) = P(-d < 4*c/n - π < d) = P(n*(π-d)/4 < c < n*(π+d)/4) =
= P(c < n*(π+d)/4) - P(c < n*(π-d)/4) ~=
~= FN(n*(π+d)/4) -
FN(n*(π-d)/4) = 2F(d*√(n/(π(4-π))))-1
where FN is the probability function of the normal distribution N(nπ/4;nπ/4(1-π/4)) which approximates the above binomial distribution and F is the probability function of the standard normal distribution. Now given a deviation d and a probability p you can compute n s.t. the last term does not go below p:
n = ceil(π(4-π)(F-1((p+1)/2)/d)^2))
Then with n loop iterations you can compute the approximation pi of π to a desired precision with given probability. If we want to achieve a probability of p=99%, then the above formula simplifies to
n ~= 17.89/d2 ,
so for precision d=0.0001 roughly n=1.789E9 iterations are necessary!
Note: since a computer cannot meet the above ideal setting there is also a (theoretical) limit on the precision you can reach with this algorithm. There are only finitely many floating point numbers representable in a computer, so your points (x,y) lie on a kind of grid. The best approximation of π that can be computed with this algorithm boils down to performing your loop over all grid points in [0,1]x[0,1]. The good old C-function rand() has a resolution of 31 bits (at least in the VS stdlib). So it does not make sense to compute more than n=312 points, which gives a maximal precision of √(17.89/n) = 1.97E-9 when asking for 99% correctness.
I'm using a 64 bit LCG (MMIX (by Knuth)). It generate a certain block of random numbers inside my code, which use them to perform some operations. My code works in single core and I would like to parallelize the work to reduce the execution time.
Before start thinking to more advanced methods in this sense I'd like to simply execute more identical codes in parallel (in fact the code repeats the same task over a certain numbers of indipendent simulation, so I can simply split the number of simulation between more identical codes and run them in parallel).
My only problem now is to find a seed for each code; in particular, to avoid the possibility of unwanted non trivial correlation between data generated in different codes, I have to be sure that the random number generated in the various codes don't overlap. To do so, starting from a certain seed in the first code I have to find a way to find a value (the next seed) very distant not in absolute value but in the pseudo-random sequence (so, such that, to go from the first to the second seed, I need a huge number of steps of LCG).
My first attempt was this:
starting from the LCG relation between 2 consecutive numbers generated in the sequence
So, in principle, I could calculate the above relation with, say, n = 2^40 and I_0 equal to the value of the first seed, and obtain a new seed distant 2^40 steps in the random CLG sequence from the first one.
The problem is that, doing so, I necessary go in overflow calculating a^n. In fact for MMIX (by Knuth) a~2^62 and i use unsigned long long int (64 bit). Note that the only problem here is the fraction in the above relation. If there only were sum and multiplication I could ignore the overflow problem due to the following modular properties (in fact I'm using 2^64 as c (64 bit generator)):
So, starting from a certain value (first seed), how can I find a second one distant a huge number of step in the LC pseudo-random sequence?
r3mainer solution is perfectly suited for python codes. I'm trying now to implement it in c using unsigned __int128 variables. I have only one problem: in principle I should compute:
Say, for simplicity, I want to compute:
with n = 2^40 and c(a-1)~2^126. I proceed with a cycle.Starting with temp = a, in each iteration I compute temp = temp*temp, then I compute temp%c(a-1). The problem is in the second step (temp = temp*temp). temp in fact could be, in principle any number < c(a-1)~2^126. If temp is a big number, say > 2^64, I'll go in overflow, reaching 2^128 - 1, before the next module operation. So is there a way to avoid it? For now the only solution I see is to perform each multiplication with a loop over bit, as suggested here: c code: prevent overflow in modular operation with huge modules (modules near the overflow treshold)
Is there another way to perform module operation during the multiplication?
(note that being c = 2^64, with mod(c) operation I don't have the same problem because the overflow point (for ull int variables) coincides with the module)
Any LCG of the form x[n+1] = (x[n] * a + c) % m can be skipped to an arbitrary position very quickly.
Starting with a seed value of zero, the first few iterations of the LCG will give you this sequence:
x₀ = 0
x₁ = c % m
x₂ = (c(a + 1)) % m
x₃ = (c(a² + a + 1)) % m
x₄ = (c(a³ + a² + a + 1)) % m
It's pretty easy to see that each term is actually the sum of a geometric series, which can be calculated with a simple formula:
x_n = (c(a^{n-1} + a^{n-2} + ... + a + 1)) % m
= (c * (a^n - 1) / (a - 1)) % m
The (a^n - 1) term can be calculated quickly by modular exponentiation, but dividing by (a-1) is a bit tricky because (a-1) and m are both even (i.e., not coprime), so we can't calculate the modular multiplicative inverse of (a-1) mod m directly.
Instead, calculate (a^n-1) mod m*(a-1), then perform a straightforward (non-modular) division of the result by a-1. In Python, the calculation would go something like this:
def lcg_skip(m, a, c, n):
# Calculate nth term of LCG sequence with parameters m (modulus),
# a (multiplier) and c (increment), assuming an initial seed of zero
a1 = a - 1
t = pow(a, n, m * a1) - 1
t = (t * c // a1) % m
return t
def test(nsteps):
m = 2**64
a = 6364136223846793005
c = 1442695040888963407
print("Calculating by brute force:")
seed = 0
for i in range(nsteps):
seed = (seed * a + c) % m
print("Calculating by fast method:")
# Calculate nth term by modular exponentiation
print(lcg_skip(m, a, c, nsteps))
So to create LCGs with non-overlapping output sequences, all you would need to do is use initial seed values generated by lcg_skip() with values of n that are far enough apart.
Well, for LCG it is known property to jump forward and backward in O(log2(N)) time where N is the distance between jump points, paper by F. Brown, "Random Number Generation with Arbitrary Stride," Trans. Am. Nucl. Soc. (Nov. 1994).
It means if you have LCG parameters (a, c) satisfying Hull–Dobell Theorem, then whole period would be 264 numbers before repeating themself, and say for Nt number pf threads you make jump distance of 264 / Nt, and all threads start with the same seed and just jump after initializing LCG by (264 / Nt)*threadId, and you would be completely safe from RNG correlations due to sequences overlap.
For simplest case of common 64 unsigned modulo math, as implemented in NumPy, code below should work fine
import numpy as np
class LCG(object):
UZERO: np.uint64 = np.uint64(0)
UONE : np.uint64 = np.uint64(1)
def __init__(self, seed: np.uint64, a: np.uint64, c: np.uint64) -> None:
self._seed: np.uint64 = np.uint64(seed)
self._a : np.uint64 = np.uint64(a)
self._c : np.uint64 = np.uint64(c)
def next(self) -> np.uint64:
self._seed = self._a * self._seed + self._c
return self._seed
def seed(self) -> np.uint64:
return self._seed
def set_seed(self, seed: np.uint64) -> np.uint64:
self._seed = seed
def skip(self, ns: np.int64) -> None:
Signed argument - skip forward as well as backward
The algorithm here to determine the parameters used to skip ahead is
described in the paper F. Brown, "Random Number Generation with Arbitrary Stride,"
Trans. Am. Nucl. Soc. (Nov. 1994). This algorithm is able to skip ahead in
O(log2(N)) operations instead of O(N). It computes parameters
A and C which can then be used to find x_N = A*x_0 + C mod 2^M.
nskip: np.uint64 = np.uint64(ns)
a: np.uint64 = self._a
c: np.uint64 = self._c
a_next: np.uint64 = LCG.UONE
c_next: np.uint64 = LCG.UZERO
while nskip > LCG.UZERO:
if (nskip & LCG.UONE) != LCG.UZERO:
a_next = a_next * a
c_next = c_next * a + c
c = (a + LCG.UONE) * c
a = a * a
nskip = nskip >> LCG.UONE
self._seed = a_next * self._seed + c_next
seed = np.uint64(1)
rng64 = LCG(seed, np.uint64(6364136223846793005), np.uint64(1))
rng64.skip(-3) # back by 3
rng64.skip(-3) # back by 3
rng64.skip(2) # forward by 2
Tested in Python 3.9.1, x64 Win 10
I am writing a function that checks if a matrix X is positive semidefinite with a given rank k. To do this, I compute the eigenvalues of X, and I check that exactly k of them are positive and the rest are 0. Here's what I have so far:
using LinearAlgebra
function ispossemdef(X::AbstractMatrix, k::Int, ϵ::Real = 1e-10)
n = size(X, 1) # dim of X
!issymmetric(X) && return false # short-circuit if X is asymmetric
k > n && error("k > n") # throw error if k > n
eigs = eigvals(X) # eigenvalues of X in ascending order
z = eigs[1:(n - k)] # the values that should be zero
p = eigs[(n - k + 1):end] # the values that should be positive
n_minus_k_zero_eigenvalues = norm(z) < ϵ
k_positive_eigenvalues = all(p .> ϵ)
return n_minus_k_zero_eigenvalues & k_positive_eigenvalues
Is there a better algorithm for doing this? Better might mean faster (avoids computing the eigenvalues), or more numerically stable (lets me get away with a stricter error tolerance).
For example, the isposdef function (which is the k = n special case of what I'm doing) works by attempting to compute the Cholesky factor of X, and reporting back with whether or not it could. Can I generalize this procedure to semidefinite matrices? If so, is it better than checking the eigenvalues?
It will not work on all matrices, but have you looked at
using LinearAlgebra # for julia 1+
help> isposdef
at the isposdef() function?
I am trying to compute the IEEE-754 32-bit Floating Point Square Root of various inputs but for one particular input the below algorithm based upon the Newton-Raphson method won't converge, I am wondering what I can do to fix the problem? For the platform I am designing I have a 32-bit floating point adder/subtracter, multiplier, and divider.
For input 0x7F7FFFFF (3.4028234663852886E38)., the algorithm won't converge to the correct answer of 18446743523953729536.000000 This algorithm's answer gives 18446743523953737728.000000.
I am using MATLAB to implement my code before I implement this in hardware. I can only use single precision floating point values, (SO NO DOUBLES).
clc; clear; close all;
% Input
R = typecast(uint32(hex2dec(num2str(dec2hex(((hex2dec('7F7FFFFF'))))))),'single')
% Initial estimate
OneOverRoot2 = single(1/sqrt(2));
Root2 = single(sqrt(2));
% Get low and high bits of input R
hexdata_high = bitand(bitshift(hex2dec(num2hex(single(R))),-16),hex2dec('ffff'));
hexdata_low = bitand(hex2dec(num2hex(single(R))),hex2dec('ffff'));
% Change exponent of input to -1 to get Mantissa
temp = bitand(hexdata_high,hex2dec('807F'));
Expo = bitshift(bitand(hexdata_high,hex2dec('7F80')),-7);
hexdata_high = bitor(temp,hex2dec('3F00'));
b = typecast(uint32(hex2dec(num2str(dec2hex(((bitshift(hexdata_high,16)+ hexdata_low)))))),'single');
% If exponent is odd ...
if (bitand(Expo,1))
% Pretend the mantissa [0.5 ... 1.0) is multiplied by 2 as Expo is odd,
% so it now has the value [1.0 ... 2.0)
% Estimate the sqrt(mantissa) as [1.0 ... sqrt(2))
% IOW: linearly map (0.5 ... 1.0) to (1.0 ... sqrt(2))
Mantissa = (Root2 - 1.0)/(1.0 - 0.5)*(b - 0.5) + 1.0;
% The mantissa is in range [0.5 ... 1.0)
% Estimate the sqrt(mantissa) as [1/sqrt(2) ... 1.0)
% IOW: linearly map (0.5 ... 1.0) to (1/sqrt(2) ... 1.0)
Mantissa = (1.0 - OneOverRoot2)/(1.0 - 0.5)*(b - 0.5) + OneOverRoot2;
newS = Mantissa*2^(bitshift(Expo-127,-1));
% S = (S + R/S)/2 method
for j = 1:6
fprintf('S %u %f %f\n', j, S, (S-sqrt(R)));
S = single((single(S) + single(single(R)/single(S))))/2;
S = single(S);
goodaccuracy = (abs((single(S)-single(sqrt(single(R)))))) < 2^-23
difference = (abs((single(S)-single(sqrt(single(R))))))
% Get hexadecimal output
hexdata_high = (bitand(bitshift(hex2dec(num2hex(single(S))),-16),hex2dec('ffff')));
hexdata_low = (bitand(hex2dec(num2hex(single(S))),hex2dec('ffff')));
fprintf('FLOAT: T Input: %e\t\tCorrect: %e\t\tMy answer: %e\n', R, sqrt(R), S);
fprintf('output hex = 0x%04X%04X\n',hexdata_high,hexdata_low);
out = hex2dec(num2hex(single(S)));
I took a whack at this. Here's what I came up with:
float mysqrtf(float f) {
if (f < 0) return 0.0f/0.0f;
if (f == 1.0f / 0.0f) return f;
if (f != f) return f;
// half-ass an initial guess of 1.0.
int expo;
float foo = frexpf(f, &expo);
float s = 1.0;
if (expo & 1) foo *= 2, expo--;
// this is the only case for which what's below fails.
if (foo == 0x0.ffffffp+0) return ldexpf(0x0.ffffffp+0, expo/2);
// do four newton iterations.
for (int i = 0; i < 4; i++) {
float diff = s*s-foo;
diff /= s;
s -= diff/2;
// do one last newton iteration, computing s*s-foo exactly.
float scal = s >= 1 ? 4096 : 2048;
float shi = (s + scal) - scal; // high 12 bits of significand
float slo = s - shi; // rest of significand
float diff = shi * shi - foo; // subtraction exact by sterbenz's theorem
diff += 2 * shi * slo; // opposite signs; exact by sterbenz's theorem
diff += slo * slo;
diff /= s; // diff == fma(s, s, -foo) / s.
s -= diff/2;
return ldexpf(s, expo/2);
The first thing to analyse is the formula (s*s-foo)/s in floating-point arithmetic. If s is a sufficiently good approximation to sqrt(foo), Sterbenz's theorem tells us that the numerator is within an ulp(foo) of the right answer --- all of that error is approximation error from computing s*s. Then we divide by s; this gives us at worst another half-ulp of approximation error. So, even without a fused multiply-add, diff is within 1.5 ulp of what it should be. And we divide it by two.
Notice that the initial guess doesn't in and of itself matter as long as you follow it up with enough Newton iterations.
Measure the error of an approximation s to sqrt(foo) by abs(s - foo/s). The error of my initial guess of 1 is at most 1. A Newton iteration in exact arithmetic squares the error and divides it by 4. A Newton iteration in floating-point arithmetic --- the kind I do four times --- squares the error, divides it by 4, and kicks in another 0.75 ulp of error. You do this four times and you find you have a relative error at most 0x0.000000C4018384, which is about 0.77 ulp. This means that four Newton iterations yield a faithfully-rounded result.
I do a fifth Newton step to get a correctly-rounded square root. The reason why it works is a little more intricate.
shi holds the "top half" of s while slo holds the "bottom half." The last 12 bits in each significand will be zero. This means, in particular, that shi * shi and shi * slo and slo * slo are exactly representable as floats.
s*s is within two ulps of foo. shi*shi is within 2047 ulps of s*s. Thus shi * shi - foo is within 2049 ulps of zero; in particular, it's exactly representable and less than 2-10.
You can check that you can add 2 * shi * slo and get an exactly-representable result that's within 2-22 of zero and then add slo*slo and get an exactly representable result --- s*s-foo computed exactly.
When you divide by s, you kick in an additional half-ulp of error, which is at most 2-48 here since our error was already so small.
Now we do a Newton step. We've computed the current error correctly to within 2-46. Adding half of it to s gives us the square root to within 3*2-48.
To turn this into a guarantee of correct rounding, we need to prove that there are no floats between 1/2 and 2, other than the one I special-cased, whose square roots are within 3*2-48 of a midpoint between two consecutive floats. You can do some error analysis, get a Diophantine equation, find all of the solutions of that Diophantine equation, find which inputs they correspond to, and work out what the algorithm does on those. (If you do this, there is one "physical" solution and a bunch of "unphysical" solutions. The one real solution is the only thing I special-cased.) There may be a cleaner way, however.
I have a black box function, f(x) and a range of values for x.
I need to find the lowest value of x for which f(x) = 0.
I know that for the start of the range of x, f(x) > 0, and if I had a value for which f(x) < 0 I could use regula falsi, or similar root finding methods, to try determine f(x)=0.
I know f(x) is continuous, and should only have 0,1 or 2 roots for the range in question, but it might have a local minimum.
f(x) is somewhat computationally expensive, and I'll have to find this first root a lot.
I was thinking some kind of hill climbing with a degree of randomness to avoid any local minimums, but then how do you know if there was no minimum less than zero or if you just haven't found it yet? I think the function shouldn't have more than two minimum points, but I can't be absolutely certain of that enough to rely on it.
If it helps, x in this case represents a time, and f(x) represents the distance between a ship and a body in orbit (moon/planet) at that time. I need the first point where they are a certain distance from each other.
My method will sound pretty complicated, but in the end the computation time of the method will be far smaller than the distance calculations (evaluation of your f(x)). Also, there are quite many implementations of it already written up in existing libraries.
So what I would do:
approximate f(x) with a Chebychev polynomial
find the real roots of that polynomial
If any are found, use those roots as initial estimates in a more precise rootfinder (if needed)
Given the nature of your function (smooth, continuous, otherwise well-behaved) and the information that there's 0,1 or 2 roots, a good Chebychev polynomial can already be found with 3 evaluations of f(x).
Then find the eigenvalues of the companion matrix of the Chebychev coefficients; these correspond to the roots of the Chebychev polynomial.
If all are imaginary, there's 0 roots.
If there are some real roots, check if two are equal (that "rare" case you spoke of).
Otherwise, all real eigenvalues are roots; the lowest one of which is the root you seek.
Then use Newton-Raphson to refine (if necessary, or use a better Chebychev polynomial). Derivatives of f can be approximated using central differences
f'(x) = ( f(x+h)-f(h-x) ) /2/h (for small h)
I have an implementation of the Chebychev routines in Matlab/Octave (given below). Use like this:
R = FindRealRoots(#f, x_min, x_max, 5, true,true);
with [x_min,x_max] your range in x, 5 the number of points to use for finding the polynomial (the higher, the more accurate. Equals the amount of function evaluations needed), and the last true will make a plot of the actual function and the Chebychev approximation to it (mainly for testing purposes).
Now, the implementation:
% FINDREALROOTS Find approximations to all real roots of any function
% on an interval [a, b].
% Roots = FindRealRoots(funfcn, a, b, n, vectorized, make_plot)
% FINDREALROOTS() approximates all the real roots of the function 'funfcn'
% in the interval [a,b]. It does so by finding the roots of an [n]-th degree
% Chebyshev polynomial approximation, via the eignevalues of the associated
% companion matrix.
% When the argument [vectorized] is [true], FINDREALROOTS() will evaluate
% the function 'funfcn' at all [n] required points in the interval
% simultaneously. Otherwise, it will use ARRAFUN() to calculate the [n]
% function values one-by-one. [vectorized] defaults to [false].
% When the argument [make_plot] is true, FINDREALROOTS() plots the
% original function and the Chebyshev approximation, and shows any roots on
% the given interval. Also [make_plot] defaults to [false].
% All [Roots] (if any) will be sorted.
% First version 26th May 2007 by Stephen Morris,
% Nightingale-EOS Ltd., St. Asaph, Wales.
% Modified 14/Nov (Rody Oldenhuis)
% See also roots, eig.
function Roots = FindRealRoots(funfcn, a, b, n, vectorized, make_plot)
% parse input and initialize.
inarg = nargin;
if n <= 2, n = 3; end % Minimum [n] is 3:
if (inarg < 5), vectorized = false; end % default: function isn't vectorized
if (inarg < 6), make_plot = false; end % default: don't make plot
% some convenient variables
bma = (b-a)/2; bpa = (b+a)/2; Roots = [];
% Obtain the Chebyshev coefficients for the function
% Based on the routine given in Numerical Recipes (3rd) section 5.8;
% calculates the Chebyshev coefficients necessary to approximate some
% function over the interval [a,b]
% initialize
c = zeros(1,n); k=(1:n)'; y = cos(pi*((1:n)-1/2)/n);
% evaluate function on Chebychev nodes
if vectorized
f = feval(funfcn,(y*bma)+bpa);
f = arrayfun(#(x) feval(funfcn,x),(y*bma)+bpa);
% compute the coefficients
for j=1:n, c(j)=(f(:).'*(cos((pi*(j-1))*((k-0.5)/n))))*(2-(j==1))/n; end
% coefficients may be [NaN] if [inf]
% ??? TODO - it is of course possible for c(n) to be zero...
if any(~isfinite(c(:))) || (c(n) == 0), return; end
% Define [A] as the Frobenius-Chebyshev companion matrix. This is based
% on the form given by J.P. Boyd, Appl. Num. Math. 56 pp.1077-1091 (2006).
one = ones(n-3,1);
A = diag([one/2; 0],-1) + diag([1; one/2],+1);
A(end, :) = -c(1:n-1)/2/c(n);
A(end,end-1) = A(end,end-1) + 0.5;
% Now we have the companion matrix, we can find its eigenvalues using the
% MATLAB built-in function. We're only interested in the real elements of
% the matrix:
eigvals = eig(A); realvals = eigvals(imag(eigvals)==0);
% if there aren't any real roots, return
if isempty(realvals), return; end
% Of course these are the roots scaled to the canonical interval [-1,1]. We
% need to map them back onto the interval [a, b]; we widen the interval just
% a tiny bit to make sure that we don't miss any that are right on the
% boundaries.
rangevals = nonzeros(realvals(abs(realvals) <= 1+1e-5));
% also sort the roots
Roots = sort(rangevals*bma + bpa);
% As a sanity check we'll plot out the original function and its Chebyshev
% approximation: if they don't match then we know to call the routine again
% with a larger 'n'.
if make_plot
% simple grid
grid = linspace(a,b, max(25,n));
% evaluate function
if vectorized
fungrid = feval(funfcn, grid);
fungrid = arrayfun(#(x) feval(funfcn,x), grid);
% corresponding Chebychev-grid (more complicated but essentially the same)
y = (2.*grid-a-b)./(b-a); d = zeros(1,length(grid)); dd = d;
for j = length(c):-1:2, sv=d; d=(2*y.*d)-dd+c(j); dd=sv; end, chebgrid=(y.*d)-dd+c(1);
% Now make plot
figure(1), clf, hold on
plot(grid, fungrid ,'color' , 'r');
line(grid, chebgrid,'color' , 'b');
line(grid, zeros(1,length(grid)), 'linestyle','--')
legend('function', 'interpolation')
end % make plot
end % FindRealRoots
You could use the secant method which is a discrete version of Newton's method.
The root is estimated by calculating the line between two points (= the secant) and its crossing of the X axis.
Your function has only 0, 1 or 2 roots, so it can be done using an algorithm it doesn't ensure the first root.
Find one root using Newton's method or other method. If it can't find any root, this algorithm also give up.
Let the found root is r and beginning of the range of the x is x0. let d = (r-x0)/2.
While d > 0, calculate f(r-d). if f(r-d) > 0, half d (d := d / 2) and loop. if
f(r-d) <= 0, escape the loop.
if loop is finished by d = 0, report r as the first root. if d > 0, find a root between x0 and r-d by using any other method and report it.
I assumed two prerequiesite conditions.
f(x) takes x of floating point numbers
At each point of the roots of f(x), the graph of f(x) crosses to x-axis. They are not touching root like x=0 in f(x)=x^2.
Using condition 2, you can prove that if there is no point such that f(r-d) < 0, ∀ x: x0 < x < r, f(x) > 0.
You could make a small change to the uniroot.all function from the R library rootSolve.
uniroot.all <- function (f, interval, lower= min(interval),
upper= max(interval), tol= .Machine$double.eps^0.2,
maxiter= 1000, n = 100, nroots = -1, ... ) {
## error checking as in uniroot...
if (!missing(interval) && length(interval) != 2)
stop("'interval' must be a vector of length 2")
if (!is.numeric(lower) || !is.numeric(upper) || lower >=
stop("lower < upper is not fulfilled")
## subdivide interval in n subintervals and estimate the function values
xseq <- seq(lower,upper,len=n+1)
mod <- f(xseq,...)
## some function values may already be 0
Equi <- xseq[which(mod==0)]
ss <- mod[1:n]*mod[2:(n+1)] # interval where functionvalues change sign
ii <- which(ss<0)
for (i in ii) {
Equi <- c(Equi, uniroot(f, lower = xseq[i], upper = xseq[i+1] ,...)$root)
if (length(Equi) == nroots) {
And run it like this:
uniroot.all(f = your_function, interval = c(start, stop), nroots = 1)
I'm using the Lapack routine zgeev to obtain the (complex) eigenvalues and
eigenvectors of a non-symmetric complex matrix in Fortran. The resulting array
of eigenvectors is in some arbitrary order. I would like to reorder both the
array of eigenvalues and the corresponding columns in the matrix of eigenvectors
so that the eigenvalues are in ascending order with respect to the real part of
each eigenvalue. I could of course roll my own sorting routine, but I was
wondering if there was already a Fortran routine somewhere that can do this for
me, maybe even as part of lapack.
You could just look at the end of zsteqr.f (the hermitian tridigaonal solver) and generalise that. The relevant bit of code is
* Use Selection Sort to minimize swaps of eigenvectors
DO 180 II = 2, N
I = II - 1
K = I
P = D( I )
DO 170 J = II, N
IF( D( J ).LT.P ) THEN
K = J
P = D( J )
D( K ) = D( I )
D( I ) = P
CALL ZSWAP( N, Z( 1, I ), 1, Z( 1, K ), 1 )
So I think you just have to change the comparison line (but untested)
I wrote one a few days ago, but the sorting was done according to the real values. This is an implementation of Quicksort. Make sure you input the function you want to be used as the key for sorting.
complex(8) :: PIVOT