MATLAB script to generate reports of rounding errors in algorithms - algorithm

I am interested in use or created an script to get error rounding reports in algorithms.
I hope the script or something similar is already done...
I think this would be usefull for digital electronic system design because sometimes it´s neccesary to study how would be the accuracy error depending of the number of decimal places that are considered in the design.
This script would work with 3 elements, the algorithm code, the input, and the output.
This script would show the error line by line of the algorithm code.
It would modify the algorith code with some command like roundn and compare the error of the output.
I would define the error as
Errorrounding = Output(without rounding) - Output round
For instance I have the next algorithm
calculation1 = input*constan1 + constan2 %line 1 of the algorithm
output = exp(calculation1) %line 2 of the algorithm
Where 'input' is the input of n elements vector and 'output' is the output and 'constan1' and 'constan2' are constants.
n is the number of elements of the input vector
So, I would put my algorithm in the script and it generated in a automatic way the next algorithm:
input_round = roundn(input,-1*mdec)
calculation1 = input*constant1+constant2*ones(1,n)
calculation1_round = roundn(calculation1,-1*mdec)
output=exp(calculation1_round)
output_round= roundn(output,-1*mdec)
where mdec is the number of decimal places to consider.
Finally the script give the next message
The rounding error at line 1 is #Errorrounding_calculation1
Where '#Errorrounding' would be the result of the next operation Errorrounding_calculation1 = calculation1 - calculation1_round
The rounding error at line 2 is #Errorrounding_output
Where 'Errorrounding_output' would be the result of the next operation Errorrounding_output = output - output_round
Does anyone know if there is something similar already done, or Matlab provides a solution to deal with some issues related?
Thank you.

First point: I suggest reading What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg. It should illuminate a lot of issues regarding floating-point computations that will help you understand more of the intricacies of the problem you are considering.
Second point: I think the problem you are considering is a lot more complicated than you realize. You are interested in the error introduced into a calculation due to the reduced precision from rounding. What you don't realize is that these errors will propagate through your computations. Consider your example:
output = input*C1 + C2
If each of the three operands is a double-precision floating-point number, they will each have some round-off error in their precision. A bound on this round-off error can be found using the function EPS, which tells you the distance from one double-precision number to the next largest one. For example, a bound on the relative error of the representation of input will be 0.5*eps(input), or halfway between it and the next largest double-precision number. We can therefore estimate some errors bounds on the three operands as follows:
err_input = 0.5.*eps(input); %# Maximum round-off error for input
err_C1 = 0.5.*eps(C1); %# Maximum round-off error for C1
err_C2 = 0.5.*eps(C2); %# Maximum round-off error for C2
Note that these errors could be positive or negative, since the true number may have been rounded up or down to represent it as a double-precision value. Now, notice what happens when we estimate the true value of the operands before they were rounded-off by adding these errors to them, then perform the calculation for output:
output = (input+err_input)*(C1+err_C1) + C2+err_C2
%# ...and after reordering terms
output = input*C1 + C2 + err_input*C1 + err_C1*input + err_input*err_C1 + err_C2
%# ^-----------^ ^-----------------------------------------------------^
%# | |
%# rounded computation difference
You can see from this that the precision round-off of the three operands before performing the calculation could change the output we get by as much as difference. In addition, there will be another source of round-off error when the value output is rounded off to represent it as a double-precision value.
So, you can see how it's quite a bit more complicated than you thought to adequately estimate the errors introduced by precision round-off.

This is more of an extended comment than an answer:
I'm voting to close this on the grounds that it isn't a well-formed question. It sort of expresses a hope or wish that there exists some type of program which would be interesting or useful to you. I suggest that you revise the question to, well, to be a question.
You propose to write a Matlab program to analyse the numerical errors in other Matlab programs. I would not use Matlab for this. I'd probably use Mathematica, which offers more sophisticated structural operations on strings (such as program source text), symbolic computation, and arbitrary precision arithmetic. One of the limitations of Matlab for what you propose is that Matlab, like all other computer implementations of real arithmetic, suffers rounding errors. There are other languages which you might choose too.
What you propose is quite difficult, and would probably require a longer answer than most SOers, including this one, would be happy to contemplate writing. Happily for you, other people have written books on the subject, I suggest you start with this one by NJ Higham. You might also want to investigate matters such as interval arithmetic.
Good luck.

Related

MSE giving negative results in High-Level Synthesis

I am trying to calculate the Mean Squared Error in Vitis HLS. I am using hls::pow(...,2) and divide by n, but all I receive is a negative value for example -0.004. This does not make sense to me. Could anyone point the problem out or have a proper explanation for this??
Besides calculating the mean squared error using hls::pow does not give the same results as (a - b) * (a - b) and for information I am using ap_fixed<> types and not normal float or double precision
Thanks in advance!
It sounds like an overflow and/or underflow issue, meaning that the values reach the sign bit and are interpreted as negative while just be very large.
Have you tried tuning the representation precision or the different saturation/rounding options for the fixed point class? This tuning will depend on the data you're processing.
For example, if you handle data that you know will range between -128.5 and 1023.4, you might need very few fractional bits, say 3 or 4, leaving the rest for the integer part (which might roughly be log2((1023+128)^2)).
Alternatively, if n is very large, you can try a moving average and calculate the mean in small "chunks" of length m < n.
p.s. Getting the absolute value of a - b and store it into an ap_ufixed before the multiplication can already give you one extra bit, but adds an instruction/operation/logic to the algorithm (which might not be a problem if the design is pipelined, but require space if the size of ap_ufixed is very large).

Mathematica precision differs from other calculators

If I evaluate the following input in Mathematica 12:
SetPrecision[DecimalForm[123.432654/54.1122356, 130], 130]
The result is:
2.2810488724291406725797060062177479267120361328125000000000000000000000000000000000000000000000000000000000000000000000000000000000
When I run the same calculation in other calculators, the results are equal until the 15th digit of the Mathematica result: 2,281048872429140. However, as of the 16th digit, the other calculators show an equal result whereas Mathematica is showing a different result:
Windows Calculator:
2,281048872429140591633586101550
https://keisan.casio.com/calculator:
2.281048872429140591633586101550[.....]
https://www.mathsisfun.com/calculator-precision.html:
2.281048872429140591633586101550[.....]
Mathematica:
2.281048872429140672579706006217[.....].
Why is (only) Mathematica ending up with a different result?
Can Mathematica somehow end up with the same result as the other calculators (supposing that these unanimous results are the correct ones)?
Mathematica's model of approximate decimal numbers is different from almost everyone else's model of approximate decimal numbers.
Because of the number of digits you supplied for each of 123.432654 and 54.1122356 these are assumed to be and treated as MachinePrecision numbers. That means they have the usual "about 16 digits of precision as supplied by the CPU floating point hardware in your computer, but it is a little more complicated than that.
Because of precedence rules Mathematica first evaluated each of those numbers and converted them to the internal floating point form, with the limited accuracy and all the problems that brings and all the speed of being able to perform calculations in hardware instead of software.
Then it did the division using the internal floating point hardware which resulted in another MachinePrecision number with only about 16 digits of precision.
Then with DecimalForm you asked Mathematica to extrapolate that result with only about 16 good digits into a 130 digit display.
All, or almost all with some very subtle things in dark corners, of the *Form functions are intended to and only used to produce something that can be displayed and not used for further calculations. For example, new users routinely do m=MatrixForm[mymatrix] to see a pretty formatting of the matrix and then proceed to try to do calculations with m, which fails.
Then you asked Mathematica to perform the SetPrecision function on that display to try to turn that into a 130 bit precision number. I can't even guess what that really did internally.
It seems those other calculators assume that the precision of the entered numbers is infinite. WL does not. You can specify what precision the entered numbers have e.g.
123.432654`30/54.1122356`30
2.28104887242914059163358610155

Lua: What is typical approach for using calculated values in a for loop?

What is the typical approach in LUA (before the introduction of integers in 5.3) for dealing with calculated range values in for loops? Mathematical calculations on the start and end values in a numerical for loop put the code at risk of bugs, possibly nasty latent ones as this will only occur on certain values and/or with changes to calculation ordering. Here's a concocted example of a loop not producing the desire output:
a={"a","b","c","d","e"}
maybethree = 3
maybethree = maybethree / 94
maybethree = maybethree * 94
for i = 1,maybethree do print(a[i]) end
This produces the unforuntate output of two items rather than the desired three (tested on 5.1.4 on 64bit x86):
a
b
Programmers unfamiliar with this territory might be further confused by print() output as that prints 3!
The application of a rounding function to the nearest whole number could work here. I understand the approximatation with FP and why this fails, I'm interested in what the typical style/solution is for this in LUA.
Related questions:
Lua for loop does not do all iterations
Lua: converting from float to int
The solution is to avoid this reliance on floating-point math where floating-point precision may become an issue. Or, more realistically, just be aware of when you are using FP and be mindul of the precision issue. This isn’t a Lua problem that requires a Lua-specific solution.
maybethree is a misnomer: it is never three. Your code above is deterministic. It will always print just a and b. Since the maybethree variable is less than three, of course the for loop would not execute 3 times.
The print function is also behaving as defined/expected. Use string.format to show thr FP number in all its glory:
print(string.format("%1.16f", maybethree)) -- 2.9999999999999996
Still need to use calculated values to control your for loop? Then you already mentioned the answer: implement a rounding function.

error bound in function approximation algorithm

Suppose we have the set of floating point number with "m" bit mantissa and "e" bits for exponent. Suppose more over we want to approximate a function "f".
From the theory we know that usually a "range reduced function" is used and then from such function we derive the global function value.
For example let x = (sx,ex,mx) (sign exp and mantissa) then...
log2(x) = ex + log2(1.mx) so basically the range reduced function is "log2(1.mx)".
I have implemented at present reciprocal, square root, log2 and exp2, recently i've started to work with the trigonometric functions. But i was wandering if given a global error bound (ulp error especially) it is possible to derive an error bound for the range reduced function, is there some study about this kind of problem? Speaking of the log2(x) (as example) i would lke to be able to say...
"ok i want log2(x) with k ulp error, to achieve this given our floating point system we need to approximate log2(1.mx) with p ulp error"
Remember that as i said we know we are working with floating point number, but the format is generic, so it could be the classic F32, but even for example e=10, m = 8 end so on.
I can't actually find any reference that shows such kind of study. Reference i have (i.e. muller book) doesn't treat the topic in this way so i was looking for some kind of paper or similar. Do you know any reference?
I'm also trying to derive such bound by myself but it is not easy...
There is a description of current practice, along with a proposed improvement and an error analysis, at https://hal.inria.fr/ensl-00086904/document. The description of current practice appears consistent with the overview at https://docs.oracle.com/cd/E37069_01/html/E39019/z4000ac119729.html, which is consistent with my memory of the most talked about problem being the mod pi range reduction of trigonometric functions.
I think IEEE floating point was a big step forwards just because it standardized things at a time when there were a variety of computer architectures, so lowering the risks of porting code between them, but the accuracy requirements implied by this may have been overkill: for many problems the constraint on the accuracy of the output is the accuracy of the input data, not the accuracy of the calculation of intermediate values.

Algorithm to find a common multiplier to convert decimal numbers to whole numbers

I have an array of numbers that potentially have up to 8 decimal places and I need to find the smallest common number I can multiply them by so that they are all whole numbers. I need this so all the original numbers can all be multiplied out to the same scale and be processed by a sealed system that will only deal with whole numbers, then I can retrieve the results and divide them by the common multiplier to get my relative results.
Currently we do a few checks on the numbers and multiply by 100 or 1,000,000, but the processing done by the *sealed system can get quite expensive when dealing with large numbers so multiplying everything by a million just for the sake of it isn’t really a great option. As an approximation lets say that the sealed algorithm gets 10 times more expensive every time you multiply by a factor of 10.
What is the most efficient algorithm, that will also give the best possible result, to accomplish what I need and is there a mathematical name and/or formula for what I’m need?
*The sealed system isn’t really sealed. I own/maintain the source code for it but its 100,000 odd lines of proprietary magic and it has been thoroughly bug and performance tested, altering it to deal with floats is not an option for many reasons. It is a system that creates a grid of X by Y cells, then rects that are X by Y are dropped into the grid, “proprietary magic” occurs and results are spat out – obviously this is an extremely simplified version of reality, but it’s a good enough approximation.
So far there are quiet a few good answers and I wondered how I should go about choosing the ‘correct’ one. To begin with I figured the only fair way was to create each solution and performance test it, but I later realised that pure speed wasn’t the only relevant factor – an more accurate solution is also very relevant. I wrote the performance tests anyway, but currently the I’m choosing the correct answer based on speed as well accuracy using a ‘gut feel’ formula.
My performance tests process 1000 different sets of 100 randomly generated numbers.
Each algorithm is tested using the same set of random numbers.
Algorithms are written in .Net 3.5 (although thus far would be 2.0 compatible)
I tried pretty hard to make the tests as fair as possible.
Greg – Multiply by large number
and then divide by GCD – 63
milliseconds
Andy – String Parsing
– 199 milliseconds
Eric – Decimal.GetBits – 160 milliseconds
Eric – Binary search – 32
milliseconds
Ima – sorry I couldn’t
figure out a how to implement your
solution easily in .Net (I didn’t
want to spend too long on it)
Bill – I figure your answer was pretty
close to Greg’s so didn’t implement
it. I’m sure it’d be a smidge faster
but potentially less accurate.
So Greg’s Multiply by large number and then divide by GCD” solution was the second fastest algorithm and it gave the most accurate results so for now I’m calling it correct.
I really wanted the Decimal.GetBits solution to be the fastest, but it was very slow, I’m unsure if this is due to the conversion of a Double to a Decimal or the Bit masking and shifting. There should be a
similar usable solution for a straight Double using the BitConverter.GetBytes and some knowledge contained here: http://blogs.msdn.com/bclteam/archive/2007/05/29/bcl-refresher-floating-point-types-the-good-the-bad-and-the-ugly-inbar-gazit-matthew-greig.aspx but my eyes just kept glazing over every time I read that article and I eventually ran out of time to try to implement a solution.
I’m always open to other solutions if anyone can think of something better.
I'd multiply by something sufficiently large (100,000,000 for 8 decimal places), then divide by the GCD of the resulting numbers. You'll end up with a pile of smallest integers that you can feed to the other algorithm. After getting the result, reverse the process to recover your original range.
Multiple all the numbers by 10
until you have integers.
Divide
by 2,3,5,7 while you still have all
integers.
I think that covers all cases.
2.1 * 10/7 -> 3
0.008 * 10^3/2^3 -> 1
That's assuming your multiplier can be a rational fraction.
If you want to find some integer N so that N*x is also an exact integer for a set of floats x in a given set are all integers, then you have a basically unsolvable problem. Suppose x = the smallest positive float your type can represent, say it's 10^-30. If you multiply all your numbers by 10^30, and then try to represent them in binary (otherwise, why are you even trying so hard to make them ints?), then you'll lose basically all the information of the other numbers due to overflow.
So here are two suggestions:
If you have control over all the related code, find another
approach. For example, if you have some function that takes only
int's, but you have floats, and you want to stuff your floats into
the function, just re-write or overload this function to accept
floats as well.
If you don't have control over the part of your system that requires
int's, then choose a precision to which you care about, accept that
you will simply have to lose some information sometimes (but it will
always be "small" in some sense), and then just multiply all your
float's by that constant, and round to the nearest integer.
By the way, if you're dealing with fractions, rather than float's, then it's a different game. If you have a bunch of fractions a/b, c/d, e/f; and you want a least common multiplier N such that N*(each fraction) = an integer, then N = abc / gcd(a,b,c); and gcd(a,b,c) = gcd(a, gcd(b, c)). You can use Euclid's algorithm to find the gcd of any two numbers.
Greg: Nice solution but won't calculating a GCD that's common in an array of 100+ numbers get a bit expensive? And how would you go about that? Its easy to do GCD for two numbers but for 100 it becomes more complex (I think).
Evil Andy: I'm programing in .Net and the solution you pose is pretty much a match for what we do now. I didn't want to include it in my original question cause I was hoping for some outside the box (or my box anyway) thinking and I didn't want to taint peoples answers with a potential solution. While I don't have any solid performance statistics (because I haven't had any other method to compare it against) I know the string parsing would be relatively expensive and I figured a purely mathematical solution could potentially be more efficient.
To be fair the current string parsing solution is in production and there have been no complaints about its performance yet (its even in production in a separate system in a VB6 format and no complaints there either). It's just that it doesn't feel right, I guess it offends my programing sensibilities - but it may well be the best solution.
That said I'm still open to any other solutions, purely mathematical or otherwise.
What language are you programming in? Something like
myNumber.ToString().Substring(myNumber.ToString().IndexOf(".")+1).Length
would give you the number of decimal places for a double in C#. You could run each number through that and find the largest number of decimal places(x), then multiply each number by 10 to the power of x.
Edit: Out of curiosity, what is this sealed system which you can pass only integers to?
In a loop get mantissa and exponent of each number as integers. You can use frexp for exponent, but I think bit mask will be required for mantissa. Find minimal exponent. Find most significant digits in mantissa (loop through bits looking for last "1") - or simply use predefined number of significant digits.
Your multiple is then something like 2^(numberOfDigits-minMantissa). "Something like" because I don't remember biases/offsets/ranges, but I think idea is clear enough.
So basically you want to determine the number of digits after the decimal point for each number.
This would be rather easier if you had the binary representation of the number. Are the numbers being converted from rationals or scientific notation earlier in your program? If so, you could skip the earlier conversion and have a much easier time. Otherwise you might want to pass each number to a function in an external DLL written in C, where you could work with the floating point representation directly. Or you could cast the numbers to decimal and do some work with Decimal.GetBits.
The fastest approach I can think of in-place and following your conditions would be to find the smallest necessary power-of-ten (or 2, or whatever) as suggested before. But instead of doing it in a loop, save some computation by doing binary search on the possible powers. Assuming a maximum of 8, something like:
int NumDecimals( double d )
{
// make d positive for clarity; it won't change the result
if( d<0 ) d=-d;
// now do binary search on the possible numbers of post-decimal digits to
// determine the actual number as quickly as possible:
if( NeedsMore( d, 10e4 ) )
{
// more than 4 decimals
if( NeedsMore( d, 10e6 ) )
{
// > 6 decimal places
if( NeedsMore( d, 10e7 ) ) return 10e8;
return 10e7;
}
else
{
// <= 6 decimal places
if( NeedsMore( d, 10e5 ) ) return 10e6;
return 10e5;
}
}
else
{
// <= 4 decimal places
// etc...
}
}
bool NeedsMore( double d, double e )
{
// check whether the representation of D has more decimal points than the
// power of 10 represented in e.
return (d*e - Math.Floor( d*e )) > 0;
}
PS: you wouldn't be passing security prices to an option pricing engine would you? It has exactly the flavor...

Resources