Fixed point precision real numbers arithmetic support for Eigen/Eigen3 - c++11

I'm going to raise again a very general question relating to Eigen/Eigen3 matrix library support for different matrix support "fields/representations" for operations.
I've analyzed a bit the Eigen matrix template library, and so far, I've only seen suffort for floating point real numbers arithmetic (that is IEEE754 single precision 32 bits and double precision 64 bits floating point numbers).
I would like to raise a question concerning fixed-size precision real numbers arithmetic support for Eigen/Eigen3:
is there any support for fixed precision vectorization in Eigen/Eigen3 ?
if not so, what would be necessary to implement such a suport ?
can standard decomposition routines and matrix operations be immediately implemented using fixed size precision ? If so, how ?
if not so, what are the pre-requisites for such a support (concepts, operators overloads, "real" functions required to be implemented, etc...) in order to implement such operations/decompositions without impairing Eigen's core ?
are there any plans to implement such functionnalities into the core of Eigen/Eigen3 ?
If such kind of things aren't foreseen in the near future,
does there exist already any kind of such functionnality that you are aware of and that would be compatible with Eigen/Eigen3 in order to fully implement vectorization/optimizations ?
if not so, which approach would you recomment if s.o. was interested in implementing it ?
I would like to know the feasibility to implement a few matrix computations onto a 16- or 32-bits micro-controller. I'm not aware of any such kind of things that are disclosed under GPL licencing scheme, and would be geatly interested if such thing would be usable. If not, I would like to assess workload necessary to implement it.
Thanking anyone in advance for help.

Related

How to multiply matrices containing floating points in FPGA?

I would like to ask a question about matrix multiplication in HDL. For 6 months I have been learning about FPGAs and ASIC design, but still do not have the enough experience for programming FPGAs using Verilog/VHDL. I had a quick search and found that Verily is suitable for me. Anyway, you just suppose me as a beginner and till now I only followed simple tutorials made of using Xilinx Spartan 3E-XCS1600E MicroBlaze Starter Kit, because I have it, too.
The most challenging part for me was to create matrices in Verilog. If I am able to create matrices and fill them with integers first, then I can move on the next step matrices with floating numbers. In advance, I also want to take inverse of these matrix and seems hard to me extremely.
My question is, what should I do in order to multiply matrices? Is there any tricky or easier way to do that like in C language? (I know Verilog is a HDL and we cannot think on that way). Also how can I convert my floating numbers to fixed or integer type? Then I think I can solve my problem in this way. I looked trough other questions but did not understand well. Thanks for your response and help.
Bonus Question: If I try these operations on MATLAB or Simulink, could it be easier to convert it to HDL using HDL Coder? If it is, could you guide me to do that?
Regards,
Leonardo
You can create matrices with RAM in hardware design. Actually, everything can be described as RAM:)
Of course only integer can be supported in Verilog, but we do have some method that can create and compute float numbers.
Define a float syntax. Suppose that we have reg var[7:0], we can assume var[7:4] is the integer part and var[3:0] is the decimal part. Like 8'b0101_1001 equals 5.9 in DEC. You must limit the range of var[3:0] to 0~9!
IEEE 754. http://grouper.ieee.org/groups/754/ This standard has been widely used in many areas, but I think it will be a little difficult for you.
Deal with a matrices is nothing special, just follow what you have learned in math class.
I'm not good at English. Hope you can understand.

Fixed precision vs. arbitrary precision

A lot of modern languages have support for arbitrary precision numbers. Java has BigInteger, Haskell has Integer, Python is de-facto arbitrary. But for a lot of these languages, arbitrary precision isn't de-facto, and is instead prefixed with 'Big'.
Why isn't arbitrary precision de-facto for all modern languages? is there a particular reason to use fixed-precision numbers over arbitrary-precision?
If I had to guess it would be because fixed precision somehow corresponds to assembly instructions, so it's more optimal and runs faster, which would be a worthwhile trade-off if you didn't have to worry about overflow because you knew the numeric ranges beforehand. Are there any particular use cases of fixed-precision over arbitrary-precision?
It is a trade-off between performance and features/safety. I cannot think of any reason why I would prefer using overflowing integers other then performance. Also, I could easily emulate overflowing semantics with non-overflowing types if I ever needed to.
Also, overflowing a signed int is a very rare occurrence in practice. I happens almost never. I wish that modern CPUs supported raising an exception on overflow without performance cost.
Different languages emphasize different features (where performance is a feature as well). That's good.
Fixed precision is likely to be faster than arbitrary; don't confuse however fixed precision with precision of the machine itself. You can use an extended (but fixed) precision in some cases. I myself often used the excellent library qd by the great expert D.H. Bailey (see http://crd-legacy.lbl.gov/~dhbailey/mpdist/) which can be easely installed on a Linux system for instance. This library provides two fixed-precision types with a greater precision than native double-precision, but they are (by far) quicker than more known arbitrary-precision libraries.

Performing many small matrix operations in parallel in OpenCL

I have a problem that requires me to do eigendecomposition and matrix multiplication of many (~4k) small (~3x3) square Hermitian matrices. In particular, I need each work item to perform eigendecomposition of one such matrix, and then perform two matrix multiplications. Thus, the work that each thread has to do is rather minimal, and the full job should be highly parallelizable.
Unfortunately, it seems all the available OpenCL LAPACKs are for delegating operations on large matrices to the GPU rather than for doing smaller linear algebra operations inside an OpenCL kernel. As I'd rather not implement matrix multiplcation and eigendecomposition for arbitrarily sized matrices in OpenCL myself, I was hoping someone here might know of a suitable library for the job?
I'm aware that OpenCL might be getting built-in matrix operations at some point since the matrix type is reserved, but that is not really of much use right now. There is a similar question here from 2011, but it pretty much just says to roll your own, so I'm hoping the situation has improved since then.
In general, my experience with libraries like LAPACK, fftw, cuFFT, etc. is that when you want to do many really small problems like this, you are better off writing your own for performance. Those libraries are usually written for generality, so you can often beat their performance for specific small problems, especially if you can use unique properties of your particular problem.
I realize you don't want to hear "roll your own" but for this type of problem it is really the best thing to do IMO. You might find a library to do this, but considering the code that you really want (for performance) will not generalize, I doubt it exists. You'll be looking specifically for code to find the eigenvalues of 3x3 matrices. That's less of a library and more of a random code snippet with a suitable license that you can manipulate to take advantage of your specific problem.
In this specific case, you can find the eigenvalues of a 3x3 matrix with the textbook method using the characteristic polynomial. Remember that there is a relatively simple closed form solution for cubic equations: http://en.wikipedia.org/wiki/Cubic_function#General_formula_for_roots.
While I think it is very likely that this approach would be much faster than iterative methods, it would be wise to verify that if performance is an issue.

How can we improve error in decision-making by applying conditioning on floating-point numbers?

I am working with floating point arithmetic that involves decision making by the use of conditioning such as if...else etc. The algorithm works fine but I doubt that it's not been optimized to get the best results. I want to know that how can I improve the numerical stability by reducing the error in floating point numbers during comparison. I'm using C language in my project. Any suggestions will be greatly appreciated. Thanks
If you need better precision than one of the built in floating point standards, then a third party library or creating your own number system are about the only options. GNU Multiple Precision, is an option.

What's the most efficient way to run cross-platform, deterministic simulations in Haskell?

My goal is to run a simulation that requires non-integral numbers across different machines that might have a varying CPU architectures and OSes. The main priority is that given the same initial state, each machine should reproduce the simulation exactly the same. Secondary priority is that I'd like the calculations to have performance and precision as close as realistically possible to double-precision floats.
As far as I can tell, there doesn't seem to be any way to affect the determinism of floating
point calculations from within a Haskell program, similar to the _controlfp and _FPU_SETCW macros in C. So, at the moment I consider my options to be
Use Data.Ratio
Use Data.Fixed
Use Data.Fixed.Binary from the fixed-point package
Write a module to call _ controlfp (or the equivivalent for each platform) via FFI.
Possibly, something else?
One problem with the fixed point arithmetic libraries is that they don't have e.g. trigonometric functions or logarithms defined for them (as they don't implement the Floating type-class) so I guess I would need to provide lookup tables for all the functions in the simulation seed data. Or is there some better way?
Both of the fixed point libraries also hide the newtype constructor, so any (de-)serialization would need to be done via toRational/fromRational as far as I can tell, and that feels like it would add unnecessary overhead.
My next step is to benchmark the different fixed-point solutions to see the real world performance, but meanwhile, I'd gladly take any advice you have on this subject.
Clause 11 of the IEEE 754-2008 standard describes what is needed for reproducible floating-point results. Among other things, you need unambiguous expression evaluation rules. Some languages permit floating-point expressions to be evaluated with extra precision or permit some alterations of expressions (such as evaluating a*b+c in a single instruction instead of separate multiply and add instructions). I do not know about Haskell’s semantics. If Haskell does not precisely map expressions to definite floating-point operations, then it cannot support reproducible floating-point results.
Also, since you mention trigonometric and logarithmic functions, be aware that these vary from implementation to implementation. I am not aware of any math library that provides correctly rounded implementations of every standard math function. (CRLibm is a project to create one.) So each math library uses its own approximations, and their results vary slightly. Perhaps you might work around this by including a math library with your simulation code, so that it is used instead of each Haskell implementation’s default library.
Routines that convert between binary floating-point and decimal numerals are also a source of differences between implementations. This is less of a problem than it used to be because algorithms for converting correctly are known. However, it is something that might need to be checked in each implementation.

Resources