TSQL: Which real number type results in faster comparisons - performance

Among TSQL types (MSSQL): real, float and decimal; which type would result in faster comparisons?
Would decimal use hardaware FPU computation or is it purely in software?

Decimals cannot use the FPU. You get best performance with float or real which map to a standard IEEE floating point number which is supported by the FPU.
float = double
real = single
Of course, single is faster.

Are these relative comparisons >, < or equality comparisons =, != ?
Floats and reals are approximation data types where as a decimal is an actual representation. If you're doing equality, you'll want to stay away from floats and reals. So that leaves decimals.
More than likely, SQL Server won't go to the FPU for relative comparisons. FPU and other coprocessors are used for arithmetic.

Related

Are float types "nested" for different precisions

I am currently coding in a scientific code in python. This code uses several different types of float precision, namely half, single and double precision (using numpy). However, my question is more general as it is not specific to python.
Question: are these precisions "nested" in the sense that any number exactly representable (i.e. no approximation) in a lower precision is also exactly representable in higher precision.
Other phrasing: Do I change the value of a float when casting to higher precision ?
I'm quite sure that the answer is yes - at least for IEEE754 standard floating point types. If you cast a variable precisely representing a number to a higher precision type, the least significand bits of the new mantissa will be zero, so the answer to your second question is: No, the numeric value won't be changed.

What is the fastest way to get the value of e?

What is the most optimised algorithm which finds the value of e with moderate accuracy?
I am looking for a comparison between optimised approaches giving more importance to speed than high precision.
Edit: By moderate accuracy I mean upto 6-7 decimal places.
But if there is a HUGE difference in speed, then I can settle with 4-5 places.
basic datatype
As mentioned in the comments 6-7 decimal places is too small accuracy to do this by an algorithm. Instead use a constant which is the fastest way anyway for this.
const double e=2.7182818284590452353602874713527;
If FPU is involved the constant is usually stored there too... Also having single constant occupies much less space than a function that computes it ...
finite accuracy
Only once bignums are involved then has any merit to use algorithm to compute e. The algo depends on target accuracy. Again for smaller accuracies are predefined constants used:
e=2.71828182845904523536028747135266249775724709369995957496696762772407663035354759457138217852516642742746639193200305992181741359662904357290033429526059563073813232862794349076323382988075319525101901157383418793070215408914993488416750924476146066808226480016847741185374234544243710753907774499206955170189
but usually in hex format for faster and more precise manipulation:
e=2.B7E151628AED2A6ABF7158809CF4F3C762E7160F38B4DA56A784D9045190CFEF324E7738926CFBE5F4BF8D8D8C31D763DA06C80ABB1185EB4F7C7B5757F5958490CFD47D7C19BB42158D9554F7B46BCED55C4D79FD5F24D6613C31C3839A2DDF8A9A276BCFBFA1C877C56284DAB79CD4C2B3293D20E9E5EAF02AC60ACC93ECEBh
For limited/finite accuracy and best speed is the PSLQ algorithm best. My understanding is that it is algorithm to find relation between real number and integer iterations.
here is my favourite PSLQ up to 800 digits of Pi PSLQ example
arbitrary accuracy
For arbitrary or "fixed" precision you need algorithm that is with variable precision. This is what I use in my arbnum class:
e=(1+1/x)^x where x -> +infinity
If you chose x as power of 2 realize that x is just single set bit of the number and 1/x has predictable bit-width. So the e will be obtained with single division and pow. Here an example:
arbnum arithmetics_e() // e computation min(_arbnum_max_a,arbnum_max_b)*5 decimals
{ // e=(1+1/x)^x ... x -> +inf
int i; arbnum c,x;
i=_arbnum_bits_a; if (i>_arbnum_bits_b) i=_arbnum_bits_b; i>>=1;
c.zero(); c.bitset(_arbnum_bits_b-i); x.one(); x/=c; c++;
for (;!x.bitget(_arbnum_bits_b);x>>=1) c*=c; //=pow(c,x);
return c;
}
Where _arbnum_bits_a,_arbnum_bits_b is the number of bits before and after decimal point in binary. So it break down to some bit operations, one bignum division and single power by squaring. Beware that multiplication and division is not that simple with bignums and usually involves Karatsuba or worse ...
There are also polynomial approaches out there that does not require bignum arithmetics similar to compute Pi. The idea is to compute chunk of binary bits per iteration without affecting the previously computed bits (too much). They should be faster but as usual for any optimizations that depends on the implementation and HW it runs on.
For reference see Brother's formula here : https://www.intmath.com/exponential-logarithmic-functions/calculating-e.php

Algorithm and datastructure for calculating trig functions to arbitrary precisions

I need to do a lot of calculations to arbitrarily high precisions - in Javascript which only has a 64 bit float representation of numbers.
I can see how I could combine multiple variables to represent large numbers: for example to represent a large decimal of m digits, where the 64 bit floating point can represent n digits, I need m / n variables.
But how can I implement an algorithm that calculates tan() to an arbitrary precision, using only 64-bit floating point arithmetic?
Why do you want to (re)do it yourself ? I'd use a lib for that. For instance http://mathjs.org/docs/datatypes/bignumbers.html.

LUT versus Newton-Raphson Division For IEEE-754 32-bit Floating Point

I was wondering what are the tradeoffs when implementing 32-bit IEEE-754 floating point division: using LUTs versus through the Newton-Raphson method?
When I say tradeoffs I mean in terms of memory size, instruction count, etc.
I have a small memory (130 words (each 16-bits)). I am storing upper 12-bits of mantissa (including hidden bit) in one memory location and lower 12-bits of mantissa in another location.
Currently I am using newton-raphson division, but am considering what are the tradeoffs if I changed my method. Here is a link to my algorithm: Newton's Method for finding the reciprocal of a floating point number for division
Thank you and please explain your reasoning.
The trade-off is fairly simple. A LUT uses extra memory in the hope of reducing the instruction count enough to save some time. Whether it's effective will depend a lot on the details of the processor -- caching in particular.
For Newton-Raphson, you change X/Y to X* (1/Y) and use your iteration to find 1/Y. At least in my experience, if you need full precision, it's rarely useful -- it's primary strength is in allowing you to find something to (say) 16-bit precision more quickly.
The usual method for division is a bit-by-bit method. Although that particular answer deals with integers, for floating point you do essentially the same except that along with it you subtract the exponents. A floating point number is basically A*2N, where A is the significand and N is the exponent part of the number. So, you take two numbers A*2N / B * 2M, and carry out the division as A/B * 2N-M, with A and B being treated as (essentially) integers in this case. The only real difference is that with floating point you normally want to round rather than truncate the result. That basically just means carrying out the division (at least) one extra bit of precision, then rounding up if that extra bit is a one.
The most common method using lookup tables is SRT division. This is most often done in hardware, so I'd probably Google for something like "Verilog SRT" or "VHDL SRT". Rendering it in C++ shouldn't be terribly difficult though. Where the method I outlined in the linked answer produces on bit per iteration, this can be written to do 2, 4, etc. If memory serves, the size of table grows quadratically with the number of bits produced per iteration though, so you rarely see much more than 4 in practice.
Each Newton-Raphson step roughly doubles the number of digits of precision, so if you can work out the number of bits of precision you expect from a LUT of a particular size, you should be able to work out how many NR steps you need to attain your desired precision. The Cray-1 used NR as the final stage of its reciprocal calculation. Looking for this I found a fairly detailed article on this sort of thing: An Accurate, High Speed Implementation of Division by Reciprocal Approximation from the 9th IEEE Symposium on Computer Arithmetic (September 6-8, 1989).

Faster integer division when denominator is known?

I am working on GPU device which has very high division integer latency, several hundred cycles. I am looking to optimize divisions.
All divisions by denominator which is in a set { 1,3,6,10 }, however numerator is a runtime positive value, roughly 32000 or less. due to memory constraints, lookup table may not be a good option.
Can you think of alternatives?
I have thought of computing float point inverses, and using those to multiply numerator.
Thanks
PS. thank you people. bit shift hack is a really cool.
to recover from roundoff, I use following C segment:
// q = m/n
q += (n*(j +1)-1) < m;
a/b=a*(1/b)
x=(1<<16)/b
a/b=(a*x)>>16
can you build a lookup table for the denominators? since you said 15 bit numerators, you could use 17 for the shifts if everything is unsigned 32 bit:
a/b=a*((1<<17)/b)>>17
The larger the shift the less the rounding error. You can do a brute force check to see how many times, if any, this is actually wrong.
The book, "Hacker's Delight" by Henry Warren, has a whole chapter devoted to integer division by constants, including techniques that transform an integer division to a multiply/shift/add series of operations.
This page calculates the magic numbers for the multiply/shift/add operations:
http://www.hackersdelight.org/magic.htm
The standard embedded systems hack for this is to convert an integer division by N into a fixed-point multiplication by 1/N.
Assuming 16 bits, 0.33333 can be represented as 21845 (decimal). Multiply, giving a 32-bit integer product, and shift down 16 bits.
You will almost certainly encounter some roundoff (truncation) error. This may or may not be something you can live with.
It MIGHT be worthwhile to look hard at your GPU and see if you can hand-code a faster integer division routine, taking advantage of your knowledge of the restricted range of the numerator.

Resources