I'm implementing algorithm D of section 4.3.2 of volume 2 of The Art of Computer Programming by D. E. Knuth.
On step D3 I'm supposed to compute q = floor(u[j+n]*BASE+u[j+n-1] / v[n-1]) and r = u[j+n]*BASE+u[j+n-1] mod v[n-1]. Here, u (dividend) and v (divisor) are single-precision* arrays of length m+n and n, respectively. BASE is the representation base, which for a binary computer of 32 or 64 bits equals to 2^32 or 2^64, respectively.
My question is about the precision in which q and r are represented. As I understand the rest of the algorithm, they are supposed to be single-precision*, but its easy to spot many cases where they must be double-precision* to fit the result.
How are those values supposed to be computed? In what precision?
* The expression single/double-precision refers to integer arithmetic, not to floating-point arithmetic.
When divisor is normalized (most significant bit set), quotient always will fit in a single word. With a power of two base representation, normalization is accomplished by cheap left shift operations.
Link to a more detailed and formal answer.
Related
Which exponent(s) d will
require this many?
Would greatly appreciate any advice as to how to go about solving this problem.
assuming unsigned integers and simple power by squaring algo like:
DWORD powuu(DWORD a,DWORD b)
{
int i,bits=32;
DWORD d=1;
for (i=0;i<bits;i++)
{
d*=d;
if (DWORD(b&0x80000000)) d*=a;
b<<=1;
}
return d;
}
You need just replace a*b with modmul(a,b,n) or (a*b)%n so the answer is:
if exponent has k bits and l from them are set you need k+l multiplications
worst case is 2k multiplications for exponent (2^k)-1
For more info see related QAs:
Power by squaring for negative exponents
modular arithmetics and NTT (finite field DFT) optimizations
For a naive implementation, it's clearly the exponent with the largest Hamming weight (number of set bits). In this case (2^k - 1) would require the most multiplication steps: (k).
For k-ary window methods, the number of multiplications can be made independent of the exponent. e.g., for a fixed window size: w = 3 we could compute {m^0, m^1, m^2, m^3, .., m^7} group coefficients (all mod n in this case, and probably in Montgomery representation for efficient reduction). The result is ceil(k/w) multiplications. This is often preferred in cryptographic implementations, as the exponent is not revealed by simple timing attacks. Any k-bit exponent has the same timing. (The reality is a bit more complex if it is assumed the attacker has 'fine-grained' access to things like cache performance, etc.)
Sliding window techniques are typically more efficient, and only slightly more difficult to implement than fixed-window methods. however, they also leak side channel data, as timing will be dependent on the exponent. Furthermore, the 'best' sequence to use is known to be a hard problem.
I am trying to compute the floating-point square root of x using assembly code using the newton-raphson method for first finding the inverse square root (1/sqrt(x)) and then multiplying by x to find sqrt(x).
However, I was reading the wikipedia page regarding newton-raphson division and it appears that depending on how you compute X_{i+1}, you will need a different amount of precision in intermediate steps.
From Wikipedia:
From a computation point of view the expressions X_{i+1} = X_i +
X_i(1-DX_i) and X_{i+1} = X_i(2-DX_i) are not equivalent. To obtain a
result with a precision of n bits while making use of the second
expression one must compute the product between X_i and (2-DX_i) with
double the required precision (2n bits). In contrast the product
between X_i and (1-DX_i) need only be computed with a precision of n
bits."
So, I have two questions:
I don't understand why one must compute the product between X_i and (2-DX_i) with double the required precision (2n bits) to obtain a result with a precision of n bits. Can someone please explain why?
Is there something similar that applies with Newton-Raphson Square Root? For instance, I am computing X_{i+1} = X_{i}(3/2 - 1/2 N X_{i}^2) but this can also be computed as X_{i} + X_{i}(1/2 - 1/2 N X_{i}^2). Does one expression require more intermediate precision, just like Newton-Raphson division does? Is there a different format I should be using to require only n bits of precision to obtain a result with n bits of precision?
I am looking for a result with an error <= 1 ulp
Rather than converting an arbitrary decimal to an exact fraction (something like 323527/4362363), I am trying to convert to just common easily-discernible (in terms of human-readability) quantities like 1/2, 1/4, 1/8 etc.
Other than using a series of if-then, less than/equal to etc comparisons, are there more optimized techniques to do this?
Edit: In my particular case, approximations are acceptable. The idea is that 0.251243 ~ 0.25 = 1/4 - in my usage case, that's "good enough", with the latter more preferable for human readability in terms of a quick indicator (not used for calculation, just used as display numerics).
Look up "continued fraction approximation". Wikipedia has a basic introduction in its "continued fraction" article, but there are optimized algorithms that generate the approximated value while generating the fraction.
Then pick some stopping heuristic, a combination of size of denominator and closeness of approximation, for when you're "close enough".
You can use Euclidean algorithm to get Greatest Common Divisor between enumerator and denominator and divide them by it.
In the following, I'm going to assume that our decimals fall in between 0 and 1. It should be straightforward to adapt this to larger numbers and negative numbers.
Probably the easiest thing to do would be to choose the largest denominator that you would find acceptable and then create a list of fractions between 0 and 1 which have that denominators less than or equal to them. Be sure to avoid any fractions which can be simplified. Obviously, once you've listed 1/2, you don't need 2/4. You can avoid fractions which can be simplified by checking that the GCD of the numerator and denominator is 1 suing Euclid's algorithm. Once you have your list. Evaluate these as floating point numbers (probably doubles, but the data type obviously depends on your choice of programming language). Then insert them into a balanced binary search tree storing both the original fraction and the floating point evaluation of the fraction. You should only need to do this once to set things up initially so the n*log(n) time (where n is the number of fractions) isn't very much.
Then, whenever you get a number, simply search the tree to find the closest number to it which is in the search tree. Note that this is slightly more complicated than searching for an exact match because the node you're looking for may not be a leaf node. So, as you traverse the tree keep a record of the closest valued node that you have visited. Once you reach a leaf node and compare that one to your closest valued node that you have visited, you are done. Whichever your closest one is, it's fraction is your answer.
Here is a suggestion: Assuming your starting fraction is p/q
Calculate r = p/q as a rational(floating point) value (e.g. r = float(p)/float(q))
Calculate the rounded decimal x = int(10000*r)
Calculate GCD (greatest common denominator) of x and 10000: s = GCD(x, 10000)
Represent the result as m / n where m = x/s and n = y/s (your example computes to 371 / 5000)
Normally, all denominators of 1000 are fairly human readable.
This might not provide the best result when the value is closer to simpler cases such as 1/3. However, I personally find 379/1000 much more human readable than 47/62 (which is the shortest fractional representation). You can add a few exceptions to fine tune such process though (e.g. calculating the p/GCD(p,q) , q/GCD(p,q) and accepting it if one of those are single digit values before proceeding to this method)
Pretty dumb solution, just for "previewing" fraction :
factor = 1/decimal
result = 1/Round(factor)
mult = 1
while (result = 1) {
mult = mult * 10
result = (1 * mult)/(Round(mult * factor))
}
result = simplify_with_GCD(result)
good luck!
Suppose we have some arbitrary positive number x.
Is there a method to represent its inverse in binary or x's inverse is 1/x - how does one express that in binary?
e.g. x=5 //101
x's inverse is 1/x, it's binary form is ...?
You'd find it the same way you would in decimal form: long division.
There is no shortcut just because you are in another base, although long division is significantly simpler.
Here is a very nice explanation of long division applied to binary numbers.
Although, just to let you know, most floating-point systems on today's machines do very fast division for you.
In general, the only practical way to "express in binary" an arbitrary fraction is as a pair of integers, numerator and denominator -- "floating point", the most commonly used (and hardware supported) binary representation of non-integer numbers, can represent exactly on those fractions whose denominator (when the fraction is reduced to the minimum terms) is a power of two (and, of course, only when the fixed number of bits allotted to the representation is sufficient for the number we'd like to represent -- but, the latter limitation will also hold for any fixed-size binary representation, including the simplest ones such as integers).
0.125 = 0.001b
0.0625 = 0.0001b
0.0078125 = 0.0000001b
0.00390625 = 0.00000001b
0.00048828125 = 0.00000000001b
0.000244140625 = 0.000000000001b
----------------------------------
0.199951171875 = 0.001100110011b
Knock yourself out if you want higher accuracy/precision.
Another form of multiplicative inverse takes advantage of the modulo nature of integer arithmetic as implemented on most computers; in your case the 32 bit value
11001100110011001100110011001101 (-858993459 signed int32 or 3435973837 unsigned int32) when multiplied by 5 equals 1 (mod 4294967296). Only values which are coprime with the power of two the modulo operates on have such multiplicative inverses.
If you just need the first few bits of a binary fraction number, this trick will give you those bits: (2 << 31) / x. But don't use this trick on any real software project. (because it is rough, inaccurate and plainly wrong way to represent the value)
Is there a fast method for taking the modulus of a floating point number?
With integers, there are tricks for Mersenne primes, so that its possible to calculate y = x MOD 2^31-1 without needing division. integer trick
Can any similar tricks be applied for floating point numbers?
Preferably, in a way that can be converted into vector/SIMD operations, or moved into GPGPU code. This rules out using integer calculations on the floating point data.
The primes I'm interested in would be 2^7-1 and 2^31-1, although if there are more efficient ones for floating point numbers, those would be welcome.
One intended use of this algorithm would be to calculate a running "checksum" of input floating point numbers as they are being read into an algorithm. To avoid taking up too much of the calculation capability, I'd like to keep this lightweight.
Apparently a similar technique is used for larger numbers, particularly 2^127 - 1. Unfortunately, the math in the paper is beyond me, and I haven't been able to figure out how to convert it to smaller primes.
Example of floating point MOD 2^127 - 1 - HASH127
I looked at djb's paper, and you have it easier, since 31 bits fits comfortably into the 53-bit precision double significand. Assuming that your checksum consists of some ring operations over Z/(2**31 - 1), it will be easier (and faster) to solve the relaxed problem of computing a small representative of x mod Z/(2**31 - 1); at the end, you can use integer arithmetic to find a canonical one, which is slow but shouldn't happen too often.
The basic reduction step is to replace an integer x = y + 2**31 * z with y + z. The trick that djb uses is to compute w = (x + L) - L, where L is a large integer carefully chosen to provoke roundoff in such a way that z = 2**-31 * w. Then compute y = x - w and output y + z, which will have magnitude at most 2**32. (I apologize if this operation isn't quite enough; if so, please post your checksum algorithm.)
The choice of L involves knowing how precise the significand is. For the modulus 2**31 - 1, we want the unit of least precision (ulp) to be 2**31. For doubles in the range [1.0, 2.0), the ulp is 2**-52, so L should be 2**52 * 2**31. If you were doing this with the modulus 2**7 - 1, then you'd take L = 2**52 * 2**7. As djb notes, this trick depends crucially on intermediate results not being computed in higher precision.