Decrease precision Sympy Equality Class - precision

I am performing some symbolic calculations using Sympy, and the calculations are just too computationally expensive. I was hoping to minimize the number of bytes used per calculation, and thus increase processing speed. I am solving two polynomial equations for two unknowns, but whenever i create the Equalities using the Sympy equality class it introduces precision that did not exist in the variables supplied. It adds extra numbers to the ends to create the 15 point precision standard of sympy. I was hoping there might be a way to keep this class from doing this, or just limit the overall precision of sympy for this problem, as this amount of precision is not necessary for my calculations. I have read through all the documentation i can find on the class, and on precision handling in sympy with no luck.
My code looks like this.
c0=np.float16((math.cos(A)2)/(a2)+(math.sin(A)2)/(b2))
c1=np.float16((math.cos(A)2)/(b2)+(math.sin(A)2)/(a2))
c2=np.float16((math.sin(2*A))/(a2)-(math.sin(2*A))/(b2))
c3=np.float16((k*math.sin(2*A))/(b2)-(2*h*(math.cos(A))**2)/(a2)-(k*(math.sin(2*A)))/(a2)-(2*h*(math.sin(A))**2)/(b2))
c4=np.float16((h*math.sin(2*A))/(b2)-(2*k*(math.cos(A))**2)/(b2)-(h*(math.sin(2*A)))/(a2)-(2*k*(math.sin(A))**2)/(a2))
c5=np.float16((h2*(math.cos(A))**2)/(a2)+(kh(math.sin(2*A)))/(a2)+(k2*(math.sin(A))2)/(a2)+(h2*(math.sin(A))**2)/(b2)+(k2*(math.cos(A))**2)/(b2)-(kh(math.sin(2*A)))/(b**2)-1)
x=sym.Symbol('x', real=True)
y=sym.Symbol('y', real=True)
e=sym.Eq(c0*x2+c1*y2+c2*x*y+c3*x+c4*y+c5)
Each value of c5 originally calculates to double precision float as normal with python, and since i don't require that precision i just recast it as float16. So the values look like
c0=1.547
c1=15.43
c2=1.55
c3=5.687
c4=7.345
c5=6.433
However when cast into the equality e. The equation becomes
e=1.5470203040506025*x2 + 15.43000345000245*y2....etc
with the standard sympy 15 point precision on every coefficient, even though those numbers are not representative of the data.
I'm hoping that lowering this precision i might decrease my run time. I have a lot of these polynomials to solve for. I've already tried using sympy's float class, and eval function, and many other things. Any help would be appreciated.

Give the number of significant figures to Float as the second argument:
.>> from sympy import Float, Eq
>>> c0,c1,c2,c3,c4,c5 = [Float(i,4) for i in (c0,c1,c2,c3,c4,c5)]
>>> Eq(c0*x**2+c1*y**2+c2*x*y+c3*x+c4*y+c5,0)
Eq(1.547*x**2 + 1.55*x*y + 5.687*x + 15.43*y**2 + 7.345*y + 6.433, 0)

Related

MSE giving negative results in High-Level Synthesis

I am trying to calculate the Mean Squared Error in Vitis HLS. I am using hls::pow(...,2) and divide by n, but all I receive is a negative value for example -0.004. This does not make sense to me. Could anyone point the problem out or have a proper explanation for this??
Besides calculating the mean squared error using hls::pow does not give the same results as (a - b) * (a - b) and for information I am using ap_fixed<> types and not normal float or double precision
Thanks in advance!
It sounds like an overflow and/or underflow issue, meaning that the values reach the sign bit and are interpreted as negative while just be very large.
Have you tried tuning the representation precision or the different saturation/rounding options for the fixed point class? This tuning will depend on the data you're processing.
For example, if you handle data that you know will range between -128.5 and 1023.4, you might need very few fractional bits, say 3 or 4, leaving the rest for the integer part (which might roughly be log2((1023+128)^2)).
Alternatively, if n is very large, you can try a moving average and calculate the mean in small "chunks" of length m < n.
p.s. Getting the absolute value of a - b and store it into an ap_ufixed before the multiplication can already give you one extra bit, but adds an instruction/operation/logic to the algorithm (which might not be a problem if the design is pipelined, but require space if the size of ap_ufixed is very large).

Bug ommits data interval - possible causes?

I have encountered a strange bug and wanted to ask if someone has any idea what might be the cause.
The bug:
When I correlate the facial width-to-height ratio (FWHR) of NHL players with their penalty minutes per games played (PIM/GP), a section of the FWHR distribution is blank (between 1.98-2 and 2-2.022; see Figure 1). The FWHR is an int/int ratio where each int has two digits. It is extremely unlikely this reflects a true signal and is therefore most likely a bug in the code I am using.
Context:
I know my PIM/P data is correct (retrieved from NHL's website) but the FWHR was calculated using an algorithm. The problem most likely lies within this facial measuring algorithm. I have not been able to locate the bug and therefore turn to you for advice.
Question:
While the code for the facial measuring algorithm is far too long to be presented here, I wanted to ask if someone might have any ideas on what might have caused it/ what I could check for?
The Nature of Ratio Distributions
Idea: It should be impossible for a ratio of two 2-digit integers to fill all 2-decimal values between two integers. Could such impossible values be especially pronounced around 2.0? For example, maybe 1.99 can not be represented?
Method: Loop through 2-digit ints and append the ratio to a list.
Then check if the list lacks values around 2.0 (e.g., 1.99).
import numpy as np
from matplotlib import pyplot as plt
def int_ratio_generator():
ratio_list = []
for i in range(1,100):
for j in range(1,100):
ratio = i/j
ratio_list.append(ratio)
return ratio_list
ratio_list = int_ratio_generator()
key = 1.99 in ratio_list
print('\nis 1.99 a possible ratio from 2-digit ints?', key)
fig, ax = plt.subplots()
X = ratio_list
Y = np.random.rand(len(ratio_list),1)
plt.scatter(X, Y, color='C0')
plt.xlim(1.8, 2.2)
plt.show()
Conclusion:
Ratios from positive 2-digit integers do not fill all possible 2-decimal values between integers, and impossible values include 1.99.
It follows that previously impossible values can be filled by including a larger range of ints, or by introducing decimal numbers within the same range.
Furthermore, as shown by the simulation above, ratio distributions with 2-digit integers will have relatively large ranges of impossible values on either side of each integer.

Can computing tan(x)=sin(x)/cos(x) cause a loss of precision?

After a call to sincos(x,&s,&c) from the unix m math library it would be natural to get the tangent as s/c. Is this safe or there may be (ill) cases in which the (supposedly) more expensive tan(x) should be preferred due to precision issues?
If the error in sin(x) is es, and the error in cos(x) is ec, then the error in t=(sin(x)/cos(x) is
et = abs(t)*sqrt((es/sin(x))^2 + (ec/cos(x))^2)
The error in sin, cos and tan should be right around the precision of the number representation times the value, perhaps a bit or two off. Since the error will track with the precision of the number, the above equation reduces to
et = sqrt(es^2 + ec^2)
and es and ec should be close to each other. So, the total error from using sin/cos as opposed to a new calculation of the tangent should be about a factor of sqrt(2) greater than the error from calculating the tangent directly.
That assumes the error in sin, cos and tan are all about the same, but that's a pretty good assumption these days. There is a small wobble in error over the range of the function, but that should be handled using some extended precision bits during the calculation, and should not show up significantly in the values you see.
Whether this matters, of course, depends on the situation.
See http://lectureonline.cl.msu.edu/~mmp/labs/error/e2.htm for a decent quick-and-dirty intro to error propagation, which is a good way to tackle this sort of question.

Difference in accuracy with floating point division vs multiplication

Is there a difference between this:
average = (x1+x2)/2;
deviation1 = x1 -average;
deviation2 = x2 -average;
variance = deviation1*deviation1 + deviation2*deviation2;
and this:
average2 = (x1+x2);
deviation1 = 2*x1 -average2;
deviation2 = 2*x2 -average2;
variance = (deviation1*deviation1 + deviation2*deviation2) / 4;
Note that in the second version I am trying to delay division as late as possible. Does the second version [delay divisions] increase accuracy in general?
Snippet above is only intended as an example, I am not trying to optimize this particular snippet.
BTW, I am asking about division in general, not just by 2 or a power of 2 as they reduce to simple shifts in IEEE 754 representation. I took division by 2, just to illustrate the issue using a very simple example.
There's nothing to be gained from this. You are only changing the scale but you'd don't get any more significant figures in your calculation.
The Wikipedia article on variance explains at a high level some of the options for calculation variance in a robust fashion.
You do not gain precision from this since IEEE754 (which is probably what you're using under the covers) gives you the same precision (number of bits) at whatever scale you're working. For example 3.14159 x 107 will be as precise as 3.14159 x 1010.
The only possible advantage (of the former) is that you may avoid overflow when setting the deviations. But, as long as the values themselves are less than half of the maximum possible, that won't be a problem.
I have to agree with David Heffernan, it won't give you a higher precision.
The reason is how float values are stored. You have some bits representing the significant digits and some bits representing the exponent (for example 3.1714x10-12). The bits for the significant digits will always be the same no matter how large your number is - which means in the end the result will not really be a different one.
Even worse - delaying the division can get you an overflow if you have very large numbers.
If you really need a higher precision there are lots of Libraries allowing large numbers or numbers with higher precision.
The best way to answer your question would be to run tests (both randomly-distributed and range-based?) and see if the resulting numbers differ at all in the binary representation.
Note that one issue you'll have if you do this is that your functions won't work for value > MAX_INT/2, because of the way you code average.
avg = (x1+x2)/2 # clobbers numbers > MAX_INT/2
avg = 0.5*x1 + 0.5*x2 # no clobbering
This is almost certainly not an issue though unless you are writing a language-level library. And if most of your numbers are small, it may not matter at all? In fact it probably isn't worth considering, since the value of variance will exceed MAX_INT since it is inherenty a squared quantity; I'd say you might wish to use standard deviation, but no one does that.
Here I do some experiments in python (which I think supports the IEEE whatever-it-is by virtue of probably delegating math to C libraries...):
>>> def compare(numer, denom):
... assert ((numer/denom)*2).hex()==((2*numer)/denom).hex()
>>> [compare(a,b) for a,b in product(range(1,100),range(1,100))]
No problem, I think because division and multiplication by 2 is nicely representable in binary. However try multiplication and division by 3:
>>> def compare(numer, denom):
... assert ((numer/denom)*3).hex()==((3*numer)/denom).hex(), '...'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
File "<stdin>", line 2, in compare
AssertionError: 0x1.3333333333334p-1!=0x1.3333333333333p-1
Does it probably matter much? Perhaps if you're working with very small numbers (in which case you may wish to use log arithmetic). However if you're working with large numbers (uncommon in probability) and you delay division, you will as I mentioned risk overflow, but even worse, risk bugs due to hard-to-read code.

How is π calculated within sas?

just curious! but I spotted that the value of π held by SAS is in fact incorrect.
for instance:
data _null_;
x= constant('pi') * 1000000000000000000000000000;
put x= 32.;
run;
gives a π value of (3.)141592653589792961327005696
however - π is of course (3.)1415926535897932384626433832795 ( http://www.joyofpi.com/pi.html ) - to 31 dp.
what gives??!!
SAS stores PI as a constant to 14 decimal places. The difference you are seeing is an artifact of floating point math when you did the multiplication step.
data _null_;
pi=constant("PI");
put pi= 32.30;
run;
/*On Log */
pi=3.141592653589790000000000000000
PI is held as a constant in all programming languages to a set precision. It isn't calculated. Your code just exposes how accurate PI is in SAS.
You got 16 digits of precision. Which means it probably uses an IEEE 754 double-precision floating point representation, which only gives about 16-17 decimal digits of precision. It is impossible for π to be represented in any finite number of digits, so any computer representation is going to be truncated at some number of digits. There are ways of doing arbitrary-precision math (Java has a BigDecimal class), but even then you'd have to truncate π somewhere. And math done that way is several orders of magnitude slower (because it is not handled by direct CPU instructions).
As Garry Shutler said, it's held as a constant. Note that that small fractional values in the numeric types of computer languages are rarely all that accurate (in fact, their accuracy can be lower than their precision), because they're stored as very good approximations that can be manipulated quickly. If you need excellent precision (as in financial and scientific endeavors), you need to use special types like Java's BigDecimal that handle being completely accurate (at the cost of computational speed). (Sorry, don't know SAS so don't know of an analog for BigDecimal.)

Resources