Efficient EigenSolver Implementation - algorithm

I am looking for an efficient eigensolver ( language not important, although I would be programming in C#), that utilizes the multi-core features found in modern CPU. Being able to work directly with pardiso solver is a major plus. My matrix are mostly sparse matrix, so an ideal solver should be able to take advantage of this fact and greatly enhance the memory usage and performance.
So far I have only found LAPACK and ARPACK. The LAPACK, as implemented in Intel MKL, is a good candidate, as it offers multi-core optimization. But it seems that the drivers inside the LAPACK don't work directly with pardiso solver, furthermore, it seems that they don't take advantage of sparse matrix ( but I am not sure on this point).
ARPACK, on the other hand, seems to be pretty hard to setup in Windows environment, and the parallel version, PARPACK, doesn't work so well. The bonus point is that it can work with pardiso solver.
The best would be Intel MKL + ARPACK with multi-core speedup. Not sure whether there is any existing implementations that already do what I want to do?

I'm working on a problem with needs very similar to the ones you state. I'm considering FEAST:
http://www.ecs.umass.edu/~polizzi/feast/index.htm
I'm trying to make it work right now, but it seems perfect. I'm interested in hearing what your experience with it is, if you use it.
cheers
Ned

Have a look at the Eigen2 library.

I've implemented it already, in C#.
The idea is that one must convert the matrix format in CSR format. Then, one can use MKL to compute linear equation solving algorithm ( using pardiso solver), the matrix-vector manipulation.

Related

Scientific library in C/C with OpenMP

I have to exploit PpenMP in some algorithm and for this purpose I need some mathematical functions, like eig or svd as it is available in MATLAB and it is quite fast in MATLAB. I already tried the following libraries with OpenMP
GSL - GNU Scientific Library
Eigen C++ template library
but I don't know why my OpenMP parallelised code is much slower than the serial code, may be there is some thing wrong in the library, or that the function random, eig or svd are blocking? I have no idea how to figure it out, can some body suggest me which is most compatible math library with OpenMP.
I can recommend Intel's MKL; note that it costs money which may affect your decision. I neither know nor care what language(s) it is written in, just so long as it provides APIs callable from my chosen language. Mine is Fortran, but it has bindings for C too
If you look around SO you'll find many questions from people whose first (or second or third) OpenMP programs were actually slower than their serial versions. Look at some of the answers. Don't conclude that there is a magic bullet, in the shape of a library, to make your code faster. Instead, realise that it is most likely that you've written a poorly-parallelised program and fix that.
Finally, if you have an installation of Matlab, don't expect to be able to write your own routines to outperform Matlab's. I won't say it can't be done, but I think you'll find it very difficult.
GSL is compatible with OpenMP. You can try with Intel Math Kernel Library which comes as a trial version for free.
If the speed up is not so much, then probably the code is not much parallelizable. You may want to debug and see the details of the running threads in Intel Thread Checker, that could be helpful to see where the bottlenecks are.
I think you just want to find a fast implementation of lapack (or related routines) which is already threaded, but it's a little hard to tell from your question. High Performance Mark suggests MKL, which is an excellent example; others include ATLAS or FLAME which are open source but take some doing to build.

Fastest math programming language?

I have an application that requires millions of subtractions and remainders, i originally programmed this algorithm inside of C#.Net but it takes five minutes to process this information and i need it faster than that.
I have considered perl and that seems to be the best alternative now. Vb.net was slower in testing. C++ may be better also. Any advice would be greatly appreciated.
You need a compiled language like Fortran, C, or C++. Other languages are designed to give you flexibility, object-orientation, or other advantages, and assume absolutely fastest performance is not your highest priority.
Know how to get maximum performance out of a single thread, and after you have done so investigate sharing the work across multiple cores, for example with MPI. To get maximum performance in a single thread, one thing I do is single-step it at the machine instruction level, to make sure it's not dawdling about in stuff that could be removed.
Some calculations are regular enough to take profit of GPGPUs: recent graphic cards are essentially specialized massively parallel numerical co-processors. For instance, you could code your numerical kernels in OpenCL. Otherwise, learn C++11 (not some earlier version of the C++ standard) or C. And in many cases Ocaml could be nearly as fast as C++ but much easier to code with.
Perhaps your problem can be handled by scilab or R, I did not understand it enough to help more.
And you might take advantage of your multi-core processor by e.g. using Pthreads or MPI
At last, the Linux operating system is perhaps better to deal with massive calculations. It is significant that most super computers use it today.
If execution speed is the highest priority, that usually means Fortran.
Try Julia: its killing feature is being easy to code in a high level concise way, while keeping performances at the same order of magnitude of Fortran/C.
PARI/GP is the best I have used so far. It's written in C.
Try to look at DMelt mathematical program. The program calls Java libraries. Java virtual machine can optimize long mathematical calculations for you.
The standard tool for mathmatic numerical operations in engineering is often Matlab (or as free alternatives octave or the already mentioned scilab).

How can this linear solver be linked within Mathematica?

Here is a good linear solver named GotoBLAS. It is available for download and runs on most computing platforms. My question is, is there an easy way to link this solver with the Mathematica kernel, so that we can call it like LinearSolve? One thing most of you may agree on for sure is that if we have a very large Linear system then we better get it solved by some industry standard Linear solver. The inbuilt solver is not meant for really large problems.
Now that Mathematica 8 has come up with better compilation and library link capabilities we can expect to use some of those solvers from within Mathematica. The question is does that require little tuning of the source code, or you need to be an advanced wizard to do it. Here in this forum we may start linking some excellent open source programs like GotoBLAS with Mathematica and exchange our views. Less experienced people can get some insight from the pro users and at the end we get a much stronger Mathematica. It will be an open project for the ever increasing Mathematica community and a platform where these newly introduced capabilities of Mathematica 8 could be transparently documented for future users.
I hope some of you here will give solid ideas on how we can get GotoBLAS running from within Mathematica. As the newer compilation and library link capabilities are usually not very well documented, they are not used by the common users very often. This question can act as a toy example to document these new capabilities of Mathematica. Help in this direction by the experienced forum members will really lift the motivation of new users like me as well as it will teach us a very useful thing to extend Mathematica's number crunching arsenal.
The short answer, I think, is that this is not something you really want to do.
GotoBLAS, as I understand it, is a specific implementation of BLAS, which stands for Basic Linear Algebra Subroutines. "Basic" really means quite basic here - multiply a matrix times a vector, for example. Thus, BLAS is not a solver that a function like LinearSolve would call. LinearSolve would (depending on the exact form of the arguments) call a LAPACK command, which is a higher level package built on top of BLAS. Thus, to really link GotoBLAS (or any BLAS) into Mathematica, one would really need to recompile the whole kernel.
Of course, one could write a C/Fortran program that was compiled against GotoBLAS and then link that into Mathematica. The resulting program would only use GotoBLAS when running whatever specific commands you've linked into Mathematica, however, which rather misses the whole point of BLAS.
The Wolfram Kernel (Mathematica) is already linked to the highly-optimized Intel Math Kernel Library, and is distributed with Mathematica. The MKL is multithreaded and vectorized, so I'm not sure what GotoBLAS would improve upon.

CUDA - Simple matrix addition/sum operation

This should be very simple but I could not find an exhaustive answer:
I need to perform A+B = C with matrices, where A and B are two matrices of unknown size (they could be 2x2 or 20.000x20.000 as greatest value)
Should I use CUBLAS with Sgemm function to calculate?
I need the maximum speed achievable so I thought of CUBLAS library which should be well-optimized
For any sort of technical computing, you should always use optimized libraries when available. Existing libraries, used by hundreds of other people, are going to be better tested and better optimized than anything you do yourself, and the time you don't spend writing (and debugging, and optimizing) that function yourself can be better spent working on the actual high-level problem you want to solve instead of re-discovering things other people have already implemented. This is just basic specialization of labour stuff; focus on the compute problem you want to solve, and let people who spend their days professionally writing GPGPU matrix routines do that for you.
Only when you are sure that existing libraries don't do what you need -- maybe they solve too general a problem, or make certain assumptions that don't hold in your case -- should you roll your own.
I agree with the others that in this particular case, the operation is pretty straightforward and it's feasible to DIY; but if you're going to be doing anything else with those matricies once you're done adding them, you'd be best off using optimized BLAS routines for whatever platform you're on.
What you want to do would be trivial to implement in CUDA and will be bandwidth limited.
And since CUBLAS5.0, cublasgeam can be used for that. It computes the weighted sum of 2 optionally transposed matrices.

Fast FEM Solvers

What are the fast solvers for FEM equations? I would prefer open source implementation, but if there is a commercial implementation, then I won't mind paying for it.
Code Aster is an open source FE code. code aster
The pre- and post-processing is usually done with Salome - both originate from EDF.
How about FEAP. It has full source code available when you purchase it. It is pretty big project, maybe its too much for your needs, but check it out.
FEAP is a general purpose finite
element analysis program which is
designed for research and educational
use. Source code of the full program
is available for compilation using
Windows (Compaq or Intel compiler),
LINUX or UNIX operating systems, and
Mac OS X based Apple systems.
It has also a Personal Edition called FEAPpv available for free, including source code. Differences between those versions are listed in this pdf.
"brad"? do you mean "broad"?
you don't say if your problem is linear or non-linear. that'll make a very big difference.
the solver depends on the type of equation and the size of your problem. for elliptical pdes you can choose standard linear algebra techniques like lu decomposition, iterative methods like successive over relaxation, or wavefront solvers that minimize memory consumption.
some people like solving non-linear steady-state problems as if they were dynamics problems. the idea is to create "fake" mass and damping matricies and use explicit time integration to converge to steady state.
lots of choices. standard linear algebra is a good starting point.
language? java?
Oops, that's kind of a brad question.
Solving differential equations usually starts with analyzing equation itself. Some equations are notoriously difficult to solve efficiently, e.g. indifinite boundary problems.
So if you have something else than an elliptic problem, you'll might better prepare for hard times ahead.
Next important and crutial part is transfering the contiouus problem into a discrete mesh. Typically the accuracy of your results will vary with different ways to generate this mesh. You'll need some sound experience here.
So I'd say there is nothing like the fast slover for FEM equations. Anyway, while Wikipedia gives a short overview of the topic, you might perhaps also have a look a the german Wikipedia page. It lists well-known FEM implementations.
OpenFoam and Elmer are two open source solvers. Not sure about Elmer, but I think OpenFoam might uses the control volume approach.
I used OpenFOAM for fluid dynamics research. You can do parallel processing with it with MPI. And if you have a Cray T3E it will be fast!
It's open source :D
http://www.opencfd.co.uk/openfoam/features.html#features
Please have look for Deal.II open source library:
http://www.dealii.org/
They provide also VirtualBox image which comes pre-installed libs.

Resources