Unexpectedly low error for a numerical integrator with certain equations of motion

Unexpectedly low error for a numerical integrator with certain equations of motion - precision

I have an RKF7(8) integrator whose output I've verified with several simple test functions. However, when I use it on the equation of motion I'm interested in, the local truncation errors are suddenly very small. For a timestep of around 1e-1, my errors are all around 1e-18 or 1e-19. For the simple test functions (so far, sine and an exponential), the errors are always reasonable, ie 1e-7 or so for the same timestep.
The only difference between the simple test functions and the problem function is that it's a huge chunk of code, like maybe 1000 different terms, some with relatively large exponents (like 9 or 10). Could that possibly affect the precision? Should I change my code to use long doubles?

Very interesting question. The problem you are facing might be related to issues (or limitations) of the floating point arithmetic. Since your function contains coefficients in a wide numerical interval, it is likely that you have some loss of precision in your calculations. In general, these problems can come in the form of:
Overflow
Underflow
Multiplication and division
Adding numbers of very different magnitudes
Subtracting numbers of similar magnitudes
Overflow and underflow occur when the numbers you are dealing with are too large or too small with respect to the machine precision, and it would be my bet that this is not what happens in your system. Nevertheless, one must take into account that multiplication and division operations can lead to overflow and underflow. On the other hand, adding numbers of very different magnitudes (or subtracting numbers of similar magnitudes) can lead to severe loss of precision due to the roundoff errors. From my experience in optimization problems that involve large and small numbers, I would say this could be a reasonable explanation of the behavior of your integrator.
I have two suggestions for you. The fist one is of course increasing the precision of your numbers to the maximum available. This might help or not depending on how ill conditioned your problem is. The second one is to use a better algorithm to perform the sums in your numerical method. In contrast to the naive addition of all number sequentially, you could use a more elaborated strategy by dividing your sums into sub-sums, effectively reducing roundoff errors. Notable examples of these algorithms are the pairwise summation and the Kahan summation.
I hope this answer offers you some clues. Good luck!

Related

Lag with very small floating point numbers?

So, I made a simple fluid dynamics simulation with XNA, and I get very accurate wave-like behaviors. But when the waves get smaller and smaller, and at some point reach amplitudes of -4.0E-43 and less, the application starts to lag horribly. Does c# switch to some stupid rounding algorithm or something ? I've not observed any NaN's and I don't get any exceptions. Oh, the simulation loops runs in a separate thread.

C# is not the culprit here. Denormal numbers are.
These are numbers with magnitudes between 0 and 2-126 (1.175494351e-38) which are not stored in the standard (or 'normal') floating-point format. In fact, they are actually stored as fixed point numbers with a multiplier of 2-149.
Because they are rare and require different algorithms, operations involving denormal numbers are not optimized to the same degree as normal operations, if at all.

What is the difference between a floating point merge and an integer merge?

In this paper, two cases have been considered for comparing algorithms - integers and floating points.
I understand the differences regarding these data types in terms of storage, but am not sure why there is a difference among these.
Why is there a difference in performance between the following two cases
Using merge sort on Integers
Using merge sort on Floating Points
I understand that it comes down to speed comparison in both the cases, the question is why these speeds might be different?

The paper states, in section 4, “Conclusion”, “the execution time for merging integers on the CPU is 2.5X faster than the execution time for floating point on the CPU”. This large a difference is surprising on the Intel Nehalem Xeon E5530 used in the measurements. However, the paper does not give information about source code, specific instructions or processor features used in the merge, compiler version, or other tools used. If the processor is used efficiently, there should be only very minor differences in the performance of an integer merge versus a floating-point merge. Thus, it seems likely that the floating-point code used in the test was inefficient and is an indicator of poor tools used rather than any shortcoming of the processor.

Merge sort has an inner loop of quite a bit of instructions. Comparing floats might be a little more expensive but only by 1-2 cycles. You will not notice the difference of that among the much bigger amount of merge code.
Comparing floats is hardware accelerated and fast compared to everything else you are doing in that algorithm.
Also, the comparison likely can overlap other instructions so the difference in wall-clock time might be exactly zero (or not).

Memory and time issues when dividing two matrices

I have two sparse matrices in matlab
M1 of size 9thousandx1.8million and M2 of size 1.8millionx1.8million.
Now I need to calculate the expression
M1/M2
and it took me like an hour. Is it normal? Is there any efficient way in matlab so that I can overcome this time issue. I mean it's a lot and if I make number of iterations then it will keep on taking 1 hour. Any suggestion?

A quick back-of-the-envelope calculation based on assuming some iterative method like conjugate gradient or Kaczmarz method is used, and plugging in the sizes makes me believe that an hour isn't bad.
Because of the tridiagonality the matrix that's being "inverted" (if not explicitly), both of those methods are going to take a number of instructions near "some near-unity scalar factor" times ~9000 times 1.8e6 times "the number of iterations required for convergence". The product of the two things in quotes is probably around 50 (minimum) to around 1000 (maximum). I didn't cherry pick these to make your math work, these are about what I'd expect from having done these. If you assume about 1e9 instructions per second (which doesn't account much for memory access etc.) you get around 13 minutes to around 4.5 hours.
Thus, it seems in the right range for an algorithm that's exploiting sparsity.
Might be able to exploit it better yourself if you know the structure, but probably not by much.
Note, this isn't to say that 13 minutes is achievable.
Edit: One side note, I'm not sure what's being used, but I assumed iterative methods. It's also possible that direct methods are used (like explained here). These methods can be very efficient for sparse systems if you exploit the sparsity right. It's very possible that Matlab is using these by default, but it's worth investigating what Matlab is doing in your case.
In my limited experience, iterative methods were usually preferred over direct methods as the size of the systems get large (yours is large.) Our linear systems worked out to be block tridiagonal as well, as they often do in image processing.

Why are numbers rounded in standard computer algorithms? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I understand that this makes the algorithms faster and use less storage space, and that these would have been critical features for software to run on the hardware of previous decades, but is this still an important feature? If the calculations were done with exact rational arithmetic then there would be no rounding errors at all, which would simplify many algorithms as you would no longer have to worry about catastrophic cancellation or anything like that.

Floating point is much faster than arbitrary-precision and symbolic packages, and 12-16 significant figures is usually plenty for demanding science/engineering applications where non-integral computations are relevant.

The programming language ABC used rational numbers (x / y where x and y were integers) wherever possible.
Sometimes calculations would become very slow because the numerator and denominator had become very big.
So it turns out that it's a bad idea if you don't put some kind of limit on the numerator and denominator.

In the vast majority of computations, the size of numbers required to to compute answers exactly would quickly grow beyond the point where computation would be worth the effort, and in many calculations it would grow beyond the point where exact calculation would even be possible. Consider that even running something like like a simple third-order IIR filter for a dozen iterations would require a fraction with thousands of bits in the denominator; running the algorithm for a few thousand iterations (hardly an unusual operation) could require more bits in the denominator than there exist atoms in the universe.

Many numerical algorithms still require fixed-precision numbers in order to perform well enough. Such calculations can be implemented in hardware because the numbers fit entirely in registers, whereas arbitrary precision calculations must be implemented in software, and there is a massive performance difference between the two. Ask anybody who crunches numbers for a living whether they'd be ok with things running X amount slower, and they probably will say "no that's completely unworkable."
Also, I think you'll find that having arbitrary precision is impractical and even impossible. For example, the number of decimal places can grow fast enough that you'll want to drop some. And then you're back to square one: rounded number problems!
Finally, sometimes the numbers beyond a certain precision do not matter anyway. For example, generally the nnumber of significant digits should reflect the level of experimental uncertainty.
So, which algorithms do you have in mind?

Traditionally integer arithmetic is easier and cheaper to implement in hardware (uses less space on the die so you can fit more units on there). Especially when you go into the DSP segment this can make a lot of difference.

Minimizing the effect of rounding errors caused by repeated operations effectively

I just recently came across the Kahan (or compensated) summation algorithm for minimizing roundoff, and I'd like to know if there are equivalent algorithms for division and/or multiplication, as well as subtraction (if there happens to be one, I know about associativity). Implementation examples in any language, pseudo-code or links would be great!
Thanks

Subtraction is usually handled via the Kahan method.
For multiplication, there are algorithms to convert a product of two floating-point numbers into a sum of two floating-point numbers without rounding, at which point you can use Kahan summation or some other method, depending on what you need to do next with the product.
If you have FMA (fused multiply-add) available, this can easily be accomplished as follows:
p = a*b;
r = fma(a,b,-p);
After these two operations, if no overflow or underflow occurs, p + r is exactly equal to a * b without rounding. This can also be accomplished without FMA, but it is rather more difficult. If you're interested in these algorithms, you might start by downloading the crlibm documentation, which details several of them.
Division... well, division is best avoided. Division is slow, and compensated division is even slower. You can do it, but it's brutally hard without FMA, and non-trivial with it. Better to design your algorithms to avoid it as much as possible.
Note that all of this becomes a losing battle pretty quickly. There's a very narrow band of situations where these tricks are beneficial--for anything more complicated, it's much better to just use a wider-precision floating point library like mpfr. Unless you're an expert in the field (or want to become one), it's usually best to just learn to use such a library.

Designing algorithms to be numerically stable is an academic discipline and field of research in its own right. It's not something you can do (or learn) meaningfully via "cheat sheets" - it requires specific mathematical knowledge and needs to be done for each specific algorithm. If you want to learn how to do this, the reference in the Wikipedia article sounds pretty good: Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, Society of Industrial and Applied Mathematics, Philadelphia, 1996. ISBN 0-89871-355-2.
A relatively simple way to diagnose the stability of an algorithm is to use interval arithmetic.

You could use bignums and rational fractions rather than floating point numbers in which case you are limited only by the finite availability of memory to hold the require precision.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio