Fractional Exponentiation in Forth - exponentiation

I'm trying to write a function that fits a value to a model.
I have a measurement from a pressure sensor and using a calibrated model I have to convert the value into the final pressure management. Doing so involves raising the measurement to a fractional power, in this case x^2.032.
I'm writing this in Mecrisp Stellaris, a dialect of Forth.
I'm a bit stuck. I understand 2.032 = 254/125, but is there a cleaner way to write things than to simply take a huge power and a huge root?

If your language (or calculator) has square-root, then ypu can use that to compute any power. Of course if the language has a power function, it would be better (simpler, faster, more accurate) to use that.
For example to compute
pow( x, 2.032)
we first expand 2.032 as a binary fraction (for example by looking at it in floating point in hex) as
1.032 = 2 + 1/pow(2,5) + 1/pow(2,11) + 1/pow(2,12)
Thus
pow( x, 2.032) = pow(x,2) * pow( x, 1/pow(2,5)) * ...
We can compute
pow( x, 1/pow(2,5))
by starting with x and taking 5 square roots in succession.
The general method is to loop over the binary expansion of 2.032, taking square roots, and accumulating into the answer when the binary digit is 1

Related

Rational approximation of rational exponentiation root with error control

I am looking for an algorithm that would efficiently calculate b^e where b and e are rational numbers, ensuring that the approximation error won't exceed given err (rational as well). Explicitly, I am looking for a function:
rational exp(rational base, rational exp, rational err)
that would preserve law |exp(b, e, err) - b^e| < err
Rational numbers are represented as pairs of big integers. Let's assume that all rationality preserving operations like addition, multiplication etc. are already defined.
I have found several approaches, but they did not allow me to control the error clearly enough. In this problem I don't care about integer overflow. What is the best approach to achieve this?
This one is complicated, so I'm going to outline the approach that I'd take. I do not promise no errors, and you'll have a lot of work left.
I will change variables from what you said to exp(x, y, err) to be x^y within error err.If y is not in the range 0 <= y < 1, then we can easily multiply by an appropriate x^k with k an integer to make it so. So we only need to worry about fractional `y
If all numerators and denominators were small, it would be easy to tackle this by first taking an integer power, and then taking a root using Newton's method. But that naive idea will fall apart painfully when you try to estimate something like (1000001/1000000)^(2000001/1000000). So the challenge is to keep that from blowing up on you.
I would recommend looking at the problem of calculating x^y as x^y = (x0^y0) * (x0^(y-y0)) * (x/x0)^y = (x0^y0) * e^((y-y0) * log(x0)) * e^(y * log(x/x0)). And we will choose x0 and y0 such that the calculations are easier and the errors are bounded.
To bound the errors, we can first come up with a naive upper bound b on x0^y0 - something like "next highest integer than x to the power of the next highest integer than y". We will pick x0 and y0 to be close enough to x and y that the latter terms are under 2. And then we just need to have the three terms estimated to within err/12, err/(6*b) and err/(6*b). (You might want to make those errors tighter half that then make the final answer a nearby rational.)
Now when we pick x0 and y0 we will be aiming for "close rational with smallish numerator/denominator". For that we start calculating the continued fraction. This gives a sequence of rational numbers that quickly converges to a target real. If we just cut off the sequence fairly soon, we can quickly find a rational number that is within any desired distance of a target real while keeping relatively small numerators and denominators.
Let's work from the third term backwards.
We want y * log(x/x0) < log(2). But from the Taylor series if x/2 < x0 < 2x then log(x/x0) < x/x0 - 1. So we can search the continued fraction for an appropriate x0.
Once we have found it, we can use the Taylor series for log(1+z) to calculate log(x/x0) to within err/(12*y*b). And then the Taylor series for e^z to calculate the term to our desired error.
The second term is more complicated. We need to estimate log(x0). What we do is find an appropriate integer k such that 1.1^k <= x0 < 1.1^(k+1). And then we can estimate both k * log(1.1) and log(x0 / 1.1^k) fairly precisely. Find a naive upper bound to that log and use it to find a close enough y0 for the second term to be within 2. And then use the Taylor series to estimate e^((y-y0) * log(x0)) to our desired precision.
For the first term we use the naive method of raising x0 to an integer and then Newton's method to take a root, to give x0^y0 to our desired precision.
Then multiply them together, and we have an answer. (If you chose the "tighter errors, nicer answer", then now you'd do a continued fraction on that answer to pick a better rational to return.)

Why is x \ y so much slower than (x' * x) \ (x' * y)?

For an NxP matrix x and an Nx1 vector y with N > P, the two expressions
x \ y -- (1)
and
(x' * x) \ (x' * y) -- (2)
both compute the solution b to the matrix equation
x * b = y
in the least squares sense, i.e. so that the quantity
norm(y - x * b)
is minimized. Expression (2) does it using the classic algorithm for the solution of an ordinary least squares regression, where the left-hand argument to the \ operator is square. It is equivalent to writing
inv(x' * x) * (x' * y) -- (3)
but it uses an algorithm which is more numerically stable. It turns out that (3) is moderately faster than (2) even though (2) doesn't have to produce the inverse matrix as a byproduct, but I can accept that given the additional numerical stability.
However, some simple timings (with N=100,000 and P=30) show that expression (2) is more than 5 times faster than expression (1), even though (1) has greater flexibility to choose the algorithm used! For example, any call to (1) could just dispatch on the size of X, and in the case N>P it could reduce to (2), which would add a tiny amount of overhead, but certainly wouldn't take 5 times longer.
What is happening in expression (1) that is causing it to take so much longer?
Edit: Here are my timings
x = randn(1e5, 30);
y = randn(1e5,1);
tic, for i = 1:100; x\y; end; t1=toc;
tic, for i = 1:100; (x'*x)\(x'*y); end; t2=toc;
assert( abs(norm(x\y) - norm((x'*x)\(x'*y))) < 1e-10 );
fprintf('Speedup: %.2f\n', t1/t2)
Speedup: 5.23
You are aware of the fact that in your test
size(x) == [1e5 30] but size(x'*x) == [30 30]
size(y) == [1e5 1] but size(x'*y) == [30 1]
That means that the matrices entering the mldivide function differ in size by 4 orders of magnitude! This would render any overhead of determining which algorithm to use rather large and significant (and perhaps also running the same algorithm on the two different problems).
In other words, you have a biased test. To make a fair test, use something like
x = randn(1e3);
y = randn(1e3,1);
I find (worst of 5 runs):
Speedup: 1.06 %// R2010a
Speedup: 1.16 %// R2010b
Speedup: 0.97 %// R2013a
...the difference has all but evaporated.
But, this does show very well that if you indeed have a regression problem with low dimensionality compared to the number of observations, it really pays off to do the multiplication first :)
mldivide is a catch-all, and really great at that. But often, having knowledge about the problem may make more specific solutions, like pre-multiplication, pre-conditioning, lu, qr, linsolve, etc. orders of magnitude faster.
even though (1) has greater flexibility to choose the algorithm used!
For example, any call to (1) could just dispatch on the size of X, and
in the case N>P it could reduce to (2), which would add a tiny amount
of overhead, but certainly wouldn't take 5 times longer.
This is not the case. It could take a lot of overhead to choose which algorithm to use, particularly when compared to the computation on relatively small inputs such as these. In this case, because MATLAB can see that you have x'*x, it knows that one of the arguments must be both square and symmetric (yes - that knowledge of linear algebra is built in to MATLAB even at a parser level), and can straight away call one of the appropriate code paths within \.
I can't say whether this fully explains the timing differences you're seeing. I would want to investigate further, at least by:
Making sure to put the code within a function, and warming the function up to ensure that the JIT is engaged - and then trying the same thing with feature('accel', 'off') to remove the effect of the JIT
Trying this on a much bigger range of input sizes to check what contribution an 'algorithm choice overhead' made compared to computation time.

Power function in vhdl

I want to make power function using vhdl where the power is floating number and the number is integer (will be always "2").
2^ some floating number.
I use ieee library and (fixed_float_types.all, fixed_pkg.all, and float_pkg.all).
I thought of calculating all the possible outputs and save them in ROM, but i don't know the ranges of the power.
How to implement this function and if there is any implemented function like this where to find it?
thanks
For simulation, you will find suitable power functions in the IEEE.math_real library
library IEEE;
use IEEE.math_real.all;
...
X <= 2 ** Y;
or
X <= 2.0 ** Y;
This is probably not synthesisable. If I needed a similar operation for synthesis, I would use a lookup table of values, slopes and second derivatives, and a quadratic interpolator. I have used this approach for reciprocal and square root functions to single precision accuracy; 2**n over a reasonable range of n is smooth enough that the same approach should work.
If an approximation would do, I think I would use the integer part of my exponent to determine the integer power of 2, like if the floating point number is 111.011010111 You know that the integer power of 2 part is 0b10000000. Then I would do a left to right conditional add based on the fractional bit, so for 111.011010111 you know you need to add implement 0b10000000 times ( 0*(1/2) + 1*(1/4) + 1*(1/8) + 0*(1/16).....and so on). 1/2, 1/4, 1/8, et cetera are right shifts of 0b10000000. This implements the integer part of the exponentiation, and then approximates the fractional part as multiplication of the integer part.
As simple as any, 0.1 in binary is equivalent to 0.5 in decimal and that is equivalent to calculating a square root.
I've been working on floating point numbers and it took about 4-5 hours to figure this out for implementation of power function in the most simple and synthesizeable way. Just go on with repeated square roots like for b"0.01" you want to do double square root like sqrt(sqrt(x)) and for b"0.11" sqrt * double sqrt like sqrt(x)*sqrt(sqrt(x)) and so on...
This is a synthesizeable implementation of pow function...

Is there a fast way to invert a matrix in Matlab?

I have lots of large (around 5000 x 5000) matrices that I need to invert in Matlab. I actually need the inverse, so I can't use mldivide instead, which is a lot faster for solving Ax=b for just one b.
My matrices are coming from a problem that means they have some nice properties. First off, their determinant is 1 so they're definitely invertible. They aren't diagonalizable, though, or I would try to diagonlize them, invert them, and then put them back. Their entries are all real numbers (actually rational).
I'm using Matlab for getting these matrices and for this stuff I need to do with their inverses, so I would prefer a way to speed Matlab up. But if there is another language I can use that'll be faster, then please let me know. I don't know a lot of other languages (a little but of C and a little but of Java), so if it's really complicated in some other language, then I might not be able to use it. Please go ahead and suggest it, though, in case.
I actually need the inverse, so I can't use mldivide instead,...
That's not true, because you can still use mldivide to get the inverse. Note that A-1 = A-1 * I. In MATLAB, this is equivalent to
invA = A\speye(size(A));
On my machine, this takes about 10.5 seconds for a 5000x5000 matrix. Note that MATLAB does have an inv function to compute the inverse of a matrix. Although this will take about the same amount of time, it is less efficient in terms of numerical accuracy (more info in the link).
First off, their determinant is 1 so they're definitely invertible
Rather than det(A)=1, it is the condition number of your matrix that dictates how accurate or stable the inverse will be. Note that det(A)=∏i=1:n λi. So just setting λ1=M, λn=1/M and λi≠1,n=1 will give you det(A)=1. However, as M → ∞, cond(A) = M2 → ∞ and λn → 0, meaning your matrix is approaching singularity and there will be large numerical errors in computing the inverse.
My matrices are coming from a problem that means they have some nice properties.
Of course, there are other more efficient algorithms that can be employed if your matrix is sparse or has other favorable properties. But without any additional info on your specific problem, there is nothing more that can be said.
I would prefer a way to speed Matlab up
MATLAB uses Gauss elimination to compute the inverse of a general matrix (full rank, non-sparse, without any special properties) using mldivide and this is Θ(n3), where n is the size of the matrix. So, in your case, n=5000 and there are 1.25 x 1011 floating point operations. So on a reasonable machine with about 10 Gflops of computational power, you're going to require at least 12.5 seconds to compute the inverse and there is no way out of this, unless you exploit the "special properties" (if they're exploitable)
Inverting an arbitrary 5000 x 5000 matrix is not computationally easy no matter what language you are using. I would recommend looking into approximations. If your matrices are low rank, you might want to try a low-rank approximation M = USV'
Here are some more ideas from math-overflow:
https://mathoverflow.net/search?q=matrix+inversion+approximation
First suppose the eigen values are all 1. Let A be the Jordan canonical form of your matrix. Then you can compute A^{-1} using only matrix multiplication and addition by
A^{-1} = I + (I-A) + (I-A)^2 + ... + (I-A)^k
where k < dim(A). Why does this work? Because generating functions are awesome. Recall the expansion
(1-x)^{-1} = 1/(1-x) = 1 + x + x^2 + ...
This means that we can invert (1-x) using an infinite sum. You want to invert a matrix A, so you want to take
A = I - X
Solving for X gives X = I-A. Therefore by substitution, we have
A^{-1} = (I - (I-A))^{-1} = 1 + (I-A) + (I-A)^2 + ...
Here I've just used the identity matrix I in place of the number 1. Now we have the problem of convergence to deal with, but this isn't actually a problem. By the assumption that A is in Jordan form and has all eigen values equal to 1, we know that A is upper triangular with all 1s on the diagonal. Therefore I-A is upper triangular with all 0s on the diagonal. Therefore all eigen values of I-A are 0, so its characteristic polynomial is x^dim(A) and its minimal polynomial is x^{k+1} for some k < dim(A). Since a matrix satisfies its minimal (and characteristic) polynomial, this means that (I-A)^{k+1} = 0. Therefore the above series is finite, with the largest nonzero term being (I-A)^k. So it converges.
Now, for the general case, put your matrix into Jordan form, so that you have a block triangular matrix, e.g.:
A 0 0
0 B 0
0 0 C
Where each block has a single value along the diagonal. If that value is a for A, then use the above trick to invert 1/a * A, and then multiply the a back through. Since the full matrix is block triangular the inverse will be
A^{-1} 0 0
0 B^{-1} 0
0 0 C^{-1}
There is nothing special about having three blocks, so this works no matter how many you have.
Note that this trick works whenever you have a matrix in Jordan form. The computation of the inverse in this case will be very fast in Matlab because it only involves matrix multiplication, and you can even use tricks to speed that up since you only need powers of a single matrix. This may not help you, though, if it's really costly to get the matrix into Jordan form.

Fast, Vectorizable method of taking floating point number modulus of special primes?

Is there a fast method for taking the modulus of a floating point number?
With integers, there are tricks for Mersenne primes, so that its possible to calculate y = x MOD 2^31-1 without needing division. integer trick
Can any similar tricks be applied for floating point numbers?
Preferably, in a way that can be converted into vector/SIMD operations, or moved into GPGPU code. This rules out using integer calculations on the floating point data.
The primes I'm interested in would be 2^7-1 and 2^31-1, although if there are more efficient ones for floating point numbers, those would be welcome.
One intended use of this algorithm would be to calculate a running "checksum" of input floating point numbers as they are being read into an algorithm. To avoid taking up too much of the calculation capability, I'd like to keep this lightweight.
Apparently a similar technique is used for larger numbers, particularly 2^127 - 1. Unfortunately, the math in the paper is beyond me, and I haven't been able to figure out how to convert it to smaller primes.
Example of floating point MOD 2^127 - 1 - HASH127
I looked at djb's paper, and you have it easier, since 31 bits fits comfortably into the 53-bit precision double significand. Assuming that your checksum consists of some ring operations over Z/(2**31 - 1), it will be easier (and faster) to solve the relaxed problem of computing a small representative of x mod Z/(2**31 - 1); at the end, you can use integer arithmetic to find a canonical one, which is slow but shouldn't happen too often.
The basic reduction step is to replace an integer x = y + 2**31 * z with y + z. The trick that djb uses is to compute w = (x + L) - L, where L is a large integer carefully chosen to provoke roundoff in such a way that z = 2**-31 * w. Then compute y = x - w and output y + z, which will have magnitude at most 2**32. (I apologize if this operation isn't quite enough; if so, please post your checksum algorithm.)
The choice of L involves knowing how precise the significand is. For the modulus 2**31 - 1, we want the unit of least precision (ulp) to be 2**31. For doubles in the range [1.0, 2.0), the ulp is 2**-52, so L should be 2**52 * 2**31. If you were doing this with the modulus 2**7 - 1, then you'd take L = 2**52 * 2**7. As djb notes, this trick depends crucially on intermediate results not being computed in higher precision.

Resources