Multiply list of floats to retain maximum precision on a modern CPU - precision

I have a list of many single-precision IEEE floating point numbers. How does one go through the list and multiply all numbers to ensure maximum precision, and no over/underflows? We assume there should be no over/underflows with arbitrary precision multiplication followed by truncation to float.
Can one look at a list of floats and figure out the optimal order without actually multiplying the numbers, or must we start multiplying, and only then try and find the next number we should multiply by to retain maximum precision, in a sort of feedback (next step search) algorithm? Can we limit ourselves to sorting based on going through the list and just adding the exponents?
For i uniformly distributed floating point numbers of width m bits for the mantissa and n for the exponent, with a cpu that stores multiplication results in a register with an r bit mantissa and s bit exponent and then truncates back to m+n on read, assuming multiplying those numbers in arbitrary precision and truncating to the original m+n bit format would not produce over/underflow, what are the chances that multiplying with the finite r+s bit register would produce no overflow? In the case of no overflow, what is the kind of precision I can expect from this operation depending on i, n, m, r, and s?
A great partial answer would just answer this for floats, doubles, and various common register sizes, for small, medium, large, and very large i.

Related

What is an efficient algorithm to perform a one-bit logical shift of an integer variable?

Let a and b be two integer variables with value >=0. To shift the binary bit sequence representing a (e.g. 10110010 for a=178) one bit to the left (i.e. into b=01100100=100) you can duplicate a and then take away 2^n, where n ist the bit length of the binary representation of a, so that b=a+a-2^n.
However, this algorithm takes O(n), where n is the number of bits in the sequence, and is in practice not viable for sequences of upwards of 10000 bits.
What is a more efficient alternative whose efficiency does not depend on the length of the bit chain?

Fermat vs Mersenne as modulus

So there are some number theory applications where we need to do modulo with big numbers, and we can choose the modulus. There's two groups that can get huge optimizations - Fermat and Mersenne.
So let's call an N bit sequence a chunk. N is often not a multiple of the word size.
For Fermat, we have M=2^N+1, so 2^N=-1 mod M, so we take the chunks of the dividend and alternate adding and subtracting.
For Mersenne, we have M=2^N-1, so 2^N=1 mod M, so we sum the chunks of the dividend.
In either case, we will likely end up with a number that takes up 2 chunks. We can apply this algorithm again if needed and finally do a general modulo algorithm.
Fermat will make the result smaller on average due to the alternating addition and subtraction. A negative result isn't that computationally expensive, you just keep track of the sign and fix it in the final modulo step. But I'd think bignum subtraction is a little slower than bignum addition.
Mersenne sums all chunks, so the result is a little larger, but that can be fixed with a second iteration of the algorithm at next to no extra cost.
So in the end, which is faster?
Schönhage–Strassen uses Fermat. There might be some other factors other than performance that make Fermat better than Mersenne - or maybe it's just straight up faster.
If you need a prime modulus, you're going to make the decision based on the convenience of the size.
For example, 2^31-1 is often convenient on 64-bit architectures, since it fits pretty snugly into 32 bits and and the product of two of them fits into a 64-bit word, either signed or unsigned.
On 32-bit architectures, 2^16+1 has similar advantages. It doesn't quite fit unto 16 bits, of course, but if you treat 0s a special case, then it's still pretty easy to multiply them in a 32-bit word.

What is complexity measured against? (bits, number of elements, ...)

I've read that the naive approach to testing primality has exponential complexity because you judge the algorithm by the size of its input. Mysteriously, people insist that when discussing primality of an integer, the appropriate measure of the size of the input is the number of bits (not n, the integer itself).
However, when discussing an algorithm like Floyd's, the complexity is often stated in terms of the number of nodes without regard to the number of bits required to store those nodes.
I'm not trying to make an argument here. I honestly don't understand the reasoning. Please explain. Thanks.
Traditionally speaking, the complexity is measured against the size of input.
In case of numbers, the size of input is log of this number (because it is a binary representation of it), in case of graphs, all edges and vertices must be represented somehow in the input, so the size of the input is linear in |V| and |E|.
For example, naive primality test that runs in linear time of the number itself, is called pseudo-polynomial. It is polynomial in the number, but it is NOT polynomial in the size of the input, which is log(n), and it is in fact exponential in the size of the input.
As a side note, it does not matter if you use the size of the input in bits, bytes, or any other CONSTANT factor for this matter, because it will be discarded anyway later on when computing the asymptotical notation as constants.
The main difference is that when discussing algorithms we keep in the back of our mind a hardware that is able to perform operations on the data used in O(1) time. When being strict or when considering data which is not able to fit into the processors register then taking the number of bits in account becomes important.
Although the size of input is measured in the number of bits, in many cases we can use a shortcut that lets us divide out a constant number of bits. This constant factor is embedded in the representation that we choose for our data structure.
When discussing graph algorithms, we assume that each vertex and each edge has a fixed cost of representation in terms of the number of bits, which does not depend of the number of vertices and edges. This assumption requires that weights associated with vertices and edges have fixed size in terms of the number of bits (i.e. all integers, all floats, etc.)
With this assumption in place, adjacency list representation has fixed size per edge or vertex, because we need one pointer per edge and one pointer per vertex, in addition to the weights, which we presume to be of constant size as well.
Same goes for adjacency matrix representation, because we need W(E2 + V) bits for the matrix, where W is the number of bits required to store the weight.
In rare situations when weights themselves are dependent on the number of vertices or edges the assumption of fixed weight no longer holds, so we must go back to counting the number of bits.

Approximate the typical value of a sample

Say I have a sample of N positive real numbers and I want to find a "typical" value for these numbers. Of course "typical" is not very well defined but one could think of the following more concrete problem :
The numbers are distributed such that (roughly speaking) a fraction (1-epsilon) of them is drawn from a Gaussian with positive mean m > 0 and mean square deviation sigma << m and a small fraction epsilon of them is drawn from some other distribution, heavy tailed both for large and small numbers. I want to estimate the mean of the Gaussian within a few standard deviation.
A solution would be to compute the median but while it is O(N), constant factors are not so good for moderate N and moreover it requires quite a bit of coding. I am ready to give up precision on my estimate against code simplicity and/or small N performance (say N is 10 or 20 for instance, and I have at most one or two outliers).
Do you have any suggestion ?
(For instance, if my outliers where only coming from large values, I would compute the average of the log of my values and exponentiate it. Under some further assumptions this gives me, generally, a good estimate and I can compute it easily and with a sharp O(N)).
You could take the mean of the numbers excluding the min and max. The formula is (sum - min - max) / (N - 2), and the terms in the numerator can be computed simply with one pass (watch out for floating point issues though).
I think you should reconsider the median, either using quickselect or Blum-Floyd-Pratt-Rivest-Tarjan (as implented here by Coetzee). It's fast and robust.
If you need better speed you might consider picking a fixed number of random elements and taking their median. This is sublinear (O(1) or O(log n) depending on the model) and works well for large sets.

Which base should I use in radix sort? And how do I convert between bases?

If I have to sort a list of integers of base 10, firstly I convert this integers to, for example, base 2, then perform radix sort and finally convert integers back to base 10?
Generally, how do you perform radix sort with radix different from base of integers in list?
Generally speaking, this depends on how the inputs are represented.
If your inputs are represented as fixed-width integer values, then it's easy to convert between bases by using division and mod. For example, you can get the last base-b digit of a number n by computing n % b and can drop that digit off by computing n / b (rounding down).
If your inputs are represented as strings, then it's harder to reinterpret the characters in other bases. Without using fancy algorithmic techniques, you can usually switch to bases that are powers of the base in which the number is represented by treating blocks of digits as individual digits. You can also use smaller bases by, for example, taking each digit, individually rewriting those digits in smaller bases, then using a radix sort that goes less than one digit at a time.
If you're interested in using a really heavyweight theoretical technique, this paper demonstrates a way to encode numbers in any base in binary in a way that allows for constant-time random access of the digits with no loss in space. This is certainly way more advanced than other approaches and the constant factor probably would make this inefficient in practice, but it shows that in theory it's possible to use any base you'd like.

Resources