Can computing tan(x)=sin(x)/cos(x) cause a loss of precision? - performance

After a call to sincos(x,&s,&c) from the unix m math library it would be natural to get the tangent as s/c. Is this safe or there may be (ill) cases in which the (supposedly) more expensive tan(x) should be preferred due to precision issues?

If the error in sin(x) is es, and the error in cos(x) is ec, then the error in t=(sin(x)/cos(x) is
et = abs(t)*sqrt((es/sin(x))^2 + (ec/cos(x))^2)
The error in sin, cos and tan should be right around the precision of the number representation times the value, perhaps a bit or two off. Since the error will track with the precision of the number, the above equation reduces to
et = sqrt(es^2 + ec^2)
and es and ec should be close to each other. So, the total error from using sin/cos as opposed to a new calculation of the tangent should be about a factor of sqrt(2) greater than the error from calculating the tangent directly.
That assumes the error in sin, cos and tan are all about the same, but that's a pretty good assumption these days. There is a small wobble in error over the range of the function, but that should be handled using some extended precision bits during the calculation, and should not show up significantly in the values you see.
Whether this matters, of course, depends on the situation.
See for a decent quick-and-dirty intro to error propagation, which is a good way to tackle this sort of question.


MSE giving negative results in High-Level Synthesis

I am trying to calculate the Mean Squared Error in Vitis HLS. I am using hls::pow(...,2) and divide by n, but all I receive is a negative value for example -0.004. This does not make sense to me. Could anyone point the problem out or have a proper explanation for this??
Besides calculating the mean squared error using hls::pow does not give the same results as (a - b) * (a - b) and for information I am using ap_fixed<> types and not normal float or double precision
Thanks in advance!
It sounds like an overflow and/or underflow issue, meaning that the values reach the sign bit and are interpreted as negative while just be very large.
Have you tried tuning the representation precision or the different saturation/rounding options for the fixed point class? This tuning will depend on the data you're processing.
For example, if you handle data that you know will range between -128.5 and 1023.4, you might need very few fractional bits, say 3 or 4, leaving the rest for the integer part (which might roughly be log2((1023+128)^2)).
Alternatively, if n is very large, you can try a moving average and calculate the mean in small "chunks" of length m < n.
p.s. Getting the absolute value of a - b and store it into an ap_ufixed before the multiplication can already give you one extra bit, but adds an instruction/operation/logic to the algorithm (which might not be a problem if the design is pipelined, but require space if the size of ap_ufixed is very large).

Decrease precision Sympy Equality Class

I am performing some symbolic calculations using Sympy, and the calculations are just too computationally expensive. I was hoping to minimize the number of bytes used per calculation, and thus increase processing speed. I am solving two polynomial equations for two unknowns, but whenever i create the Equalities using the Sympy equality class it introduces precision that did not exist in the variables supplied. It adds extra numbers to the ends to create the 15 point precision standard of sympy. I was hoping there might be a way to keep this class from doing this, or just limit the overall precision of sympy for this problem, as this amount of precision is not necessary for my calculations. I have read through all the documentation i can find on the class, and on precision handling in sympy with no luck.
My code looks like this.
x=sym.Symbol('x', real=True)
y=sym.Symbol('y', real=True)
Each value of c5 originally calculates to double precision float as normal with python, and since i don't require that precision i just recast it as float16. So the values look like
However when cast into the equality e. The equation becomes
e=1.5470203040506025*x2 + 15.43000345000245*y2....etc
with the standard sympy 15 point precision on every coefficient, even though those numbers are not representative of the data.
I'm hoping that lowering this precision i might decrease my run time. I have a lot of these polynomials to solve for. I've already tried using sympy's float class, and eval function, and many other things. Any help would be appreciated.
Give the number of significant figures to Float as the second argument:
.>> from sympy import Float, Eq
>>> c0,c1,c2,c3,c4,c5 = [Float(i,4) for i in (c0,c1,c2,c3,c4,c5)]
>>> Eq(c0*x**2+c1*y**2+c2*x*y+c3*x+c4*y+c5,0)
Eq(1.547*x**2 + 1.55*x*y + 5.687*x + 15.43*y**2 + 7.345*y + 6.433, 0)

error bound in function approximation algorithm

Suppose we have the set of floating point number with "m" bit mantissa and "e" bits for exponent. Suppose more over we want to approximate a function "f".
From the theory we know that usually a "range reduced function" is used and then from such function we derive the global function value.
For example let x = (sx,ex,mx) (sign exp and mantissa) then...
log2(x) = ex + log2( so basically the range reduced function is "log2(".
I have implemented at present reciprocal, square root, log2 and exp2, recently i've started to work with the trigonometric functions. But i was wandering if given a global error bound (ulp error especially) it is possible to derive an error bound for the range reduced function, is there some study about this kind of problem? Speaking of the log2(x) (as example) i would lke to be able to say...
"ok i want log2(x) with k ulp error, to achieve this given our floating point system we need to approximate log2( with p ulp error"
Remember that as i said we know we are working with floating point number, but the format is generic, so it could be the classic F32, but even for example e=10, m = 8 end so on.
I can't actually find any reference that shows such kind of study. Reference i have (i.e. muller book) doesn't treat the topic in this way so i was looking for some kind of paper or similar. Do you know any reference?
I'm also trying to derive such bound by myself but it is not easy...
There is a description of current practice, along with a proposed improvement and an error analysis, at The description of current practice appears consistent with the overview at, which is consistent with my memory of the most talked about problem being the mod pi range reduction of trigonometric functions.
I think IEEE floating point was a big step forwards just because it standardized things at a time when there were a variety of computer architectures, so lowering the risks of porting code between them, but the accuracy requirements implied by this may have been overkill: for many problems the constraint on the accuracy of the output is the accuracy of the input data, not the accuracy of the calculation of intermediate values.

MATLAB script to generate reports of rounding errors in algorithms

I am interested in use or created an script to get error rounding reports in algorithms.
I hope the script or something similar is already done...
I think this would be usefull for digital electronic system design because sometimes it´s neccesary to study how would be the accuracy error depending of the number of decimal places that are considered in the design.
This script would work with 3 elements, the algorithm code, the input, and the output.
This script would show the error line by line of the algorithm code.
It would modify the algorith code with some command like roundn and compare the error of the output.
I would define the error as
Errorrounding = Output(without rounding) - Output round
For instance I have the next algorithm
calculation1 = input*constan1 + constan2 %line 1 of the algorithm
output = exp(calculation1) %line 2 of the algorithm
Where 'input' is the input of n elements vector and 'output' is the output and 'constan1' and 'constan2' are constants.
n is the number of elements of the input vector
So, I would put my algorithm in the script and it generated in a automatic way the next algorithm:
input_round = roundn(input,-1*mdec)
calculation1 = input*constant1+constant2*ones(1,n)
calculation1_round = roundn(calculation1,-1*mdec)
output_round= roundn(output,-1*mdec)
where mdec is the number of decimal places to consider.
Finally the script give the next message
The rounding error at line 1 is #Errorrounding_calculation1
Where '#Errorrounding' would be the result of the next operation Errorrounding_calculation1 = calculation1 - calculation1_round
The rounding error at line 2 is #Errorrounding_output
Where 'Errorrounding_output' would be the result of the next operation Errorrounding_output = output - output_round
Does anyone know if there is something similar already done, or Matlab provides a solution to deal with some issues related?
Thank you.
First point: I suggest reading What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg. It should illuminate a lot of issues regarding floating-point computations that will help you understand more of the intricacies of the problem you are considering.
Second point: I think the problem you are considering is a lot more complicated than you realize. You are interested in the error introduced into a calculation due to the reduced precision from rounding. What you don't realize is that these errors will propagate through your computations. Consider your example:
output = input*C1 + C2
If each of the three operands is a double-precision floating-point number, they will each have some round-off error in their precision. A bound on this round-off error can be found using the function EPS, which tells you the distance from one double-precision number to the next largest one. For example, a bound on the relative error of the representation of input will be 0.5*eps(input), or halfway between it and the next largest double-precision number. We can therefore estimate some errors bounds on the three operands as follows:
err_input = 0.5.*eps(input); %# Maximum round-off error for input
err_C1 = 0.5.*eps(C1); %# Maximum round-off error for C1
err_C2 = 0.5.*eps(C2); %# Maximum round-off error for C2
Note that these errors could be positive or negative, since the true number may have been rounded up or down to represent it as a double-precision value. Now, notice what happens when we estimate the true value of the operands before they were rounded-off by adding these errors to them, then perform the calculation for output:
output = (input+err_input)*(C1+err_C1) + C2+err_C2
%# ...and after reordering terms
output = input*C1 + C2 + err_input*C1 + err_C1*input + err_input*err_C1 + err_C2
%# ^-----------^ ^-----------------------------------------------------^
%# | |
%# rounded computation difference
You can see from this that the precision round-off of the three operands before performing the calculation could change the output we get by as much as difference. In addition, there will be another source of round-off error when the value output is rounded off to represent it as a double-precision value.
So, you can see how it's quite a bit more complicated than you thought to adequately estimate the errors introduced by precision round-off.
This is more of an extended comment than an answer:
I'm voting to close this on the grounds that it isn't a well-formed question. It sort of expresses a hope or wish that there exists some type of program which would be interesting or useful to you. I suggest that you revise the question to, well, to be a question.
You propose to write a Matlab program to analyse the numerical errors in other Matlab programs. I would not use Matlab for this. I'd probably use Mathematica, which offers more sophisticated structural operations on strings (such as program source text), symbolic computation, and arbitrary precision arithmetic. One of the limitations of Matlab for what you propose is that Matlab, like all other computer implementations of real arithmetic, suffers rounding errors. There are other languages which you might choose too.
What you propose is quite difficult, and would probably require a longer answer than most SOers, including this one, would be happy to contemplate writing. Happily for you, other people have written books on the subject, I suggest you start with this one by NJ Higham. You might also want to investigate matters such as interval arithmetic.
Good luck.

How is π calculated within sas?

just curious! but I spotted that the value of π held by SAS is in fact incorrect.
for instance:
data _null_;
x= constant('pi') * 1000000000000000000000000000;
put x= 32.;
gives a π value of (3.)141592653589792961327005696
however - π is of course (3.)1415926535897932384626433832795 ( ) - to 31 dp.
what gives??!!
SAS stores PI as a constant to 14 decimal places. The difference you are seeing is an artifact of floating point math when you did the multiplication step.
data _null_;
put pi= 32.30;
/*On Log */
PI is held as a constant in all programming languages to a set precision. It isn't calculated. Your code just exposes how accurate PI is in SAS.
You got 16 digits of precision. Which means it probably uses an IEEE 754 double-precision floating point representation, which only gives about 16-17 decimal digits of precision. It is impossible for π to be represented in any finite number of digits, so any computer representation is going to be truncated at some number of digits. There are ways of doing arbitrary-precision math (Java has a BigDecimal class), but even then you'd have to truncate π somewhere. And math done that way is several orders of magnitude slower (because it is not handled by direct CPU instructions).
As Garry Shutler said, it's held as a constant. Note that that small fractional values in the numeric types of computer languages are rarely all that accurate (in fact, their accuracy can be lower than their precision), because they're stored as very good approximations that can be manipulated quickly. If you need excellent precision (as in financial and scientific endeavors), you need to use special types like Java's BigDecimal that handle being completely accurate (at the cost of computational speed). (Sorry, don't know SAS so don't know of an analog for BigDecimal.)
