Erlang Matrix Library - matrix

I'm looking for a robust library to handle matrices in Erlang. Nothing fancy, just efficient handling of multiplication and basic operations. I could do that with lists etc. but I'm sure my implementation won't be very efficient !

The presentation in this link talks about some erlang bindings to BLAS etc -
High Performance Technical Computing in Erlang. Hope this is helpful.

Related

What performance tradeoffs exist between various clojure matrix libraries?

There are a number of matrix libraries out there for clojure:
vectorz-clj
clatrix
parallel colt
cerebro
What are the performance trade-offs between these libraries?
I've heard that with some of the underlying implementations, there are trade-offs between (for-example) matrix instantiation and operation performance, but I haven't been able to find a comprehensive resource detailing these considerations.
Thanks
If you want to make use of core.matrix, there are only two implementations at present that are reasonably mature and performant:
Clatrix - uses calls to native BLAS
vectorz-clj - a flexible and fast pure-JVM implementation
It really comes down to your use cases. If you mostly care about big linear algebra operations and don't mind the native dependencies, then Clatrix is your best bet at present - simply because BLAS implementations are so fast. This is particularly useful for:
Large matrix multiplication
Linear algebra (matrix decompositions etc.)
If you want to do general array-programming work, then vectorz-clj has the advantage of being pure JVM code and much more flexible in terms of array/matrix formats. Examples of things that vectorz-clj supports well that you can't do in Clatrix:
N-dimensional arrays
Various specialised types of sparse arrays (diagonal matrices, different sparse storage formats etc.)
Arrays with arbitrary strided access (like Numpy)
Lightweight "views" into larger arrays
Overall, vectorz-clj won't be as fast for things like big matrix multiplication, but is probably faster than Clatrix for many other operations and small/medium sized vector work. I'd normally choose vectorz-clj unless I thought that linear algebra performance would be the main bottleneck.
The other core.matrix implementations are less mature, but may still be useful for specific use cases. A nice feature of core.matrix is the ability to mix and match implementations while using the same common API, so it's not an "all or nothing" choice.
Disclaimer: I have created or contributed to many of the above projects. I hope I've given a fairly unbiased and objective evaluation.
If you don't need core.matrix support, then you have many more options - you can use any of the Java matrix libraries via Clojure's Java interop. In theory, these could become core.matrix implementations as well - the only constraint is that someone needs to do the work to extend the core.matrix protocols to support the new matrix types.

Examples where compiler-optimized functional code performs better than imperative code

One of the promises of side-effect free, referentially transparent functional programming is that such code can be extensively optimized.
To quote Wikipedia:
Immutability of data can, in many cases, lead to execution efficiency, by allowing the compiler to make assumptions that are unsafe in an imperative language, thus increasing opportunities for inline expansion.
I'd like to see examples where a functional language compiler outperforms an imperative one by producing a better optimized code.
Edit: I tried to give a specific scenario, but apparently it wasn't a good idea. So I'll try to explain it in a different way.
Programmers translate ideas (algorithms) into languages that machines can understand. At the same time, one of the most important aspects of the translation is that also humans can understand the resulting code. Unfortunately, in many cases there is a trade-off: A concise, readable code suffers from slow performance and needs to be manually optimized. This is error-prone, time consuming, and it makes the code less readable (up to totally unreadable).
The foundations of functional languages, such as immutability and referential transparency, allow compilers to perform extensive optimizations, which could replace manual optimization of code and free programmers from this trade-off. I'm looking for examples of ideas (algorithms) and their implementations, such that:
the (functional) implementation is close to the original idea and is easy to understand,
it is extensively optimized by the compiler of the language, and
it is hard (or impossible) to write similarly efficient code in an imperative language without manual optimizations that reduce its conciseness and readability.
I apologize if it is a bit vague, but I hope the idea is clear. I don't want to give unnecessary restrictions on the answers. I'm open to suggestions if someone knows how to express it better.
My interest isn't just theoretical. I'd like to use such examples (among other things) to motivate students to get interested in functional programming.
At first, I wasn't satisfied by a few examples suggested in the comments. On second thoughts I take my objections back, those are good examples. Please feel free to expand them to full answers so that people can comment and vote for them.
(One class of such examples will be most likely parallelized code, which can take advantage of multiple CPU cores. Often in functional languages this can be done easily without sacrificing code simplicity (like in Haskell by adding par or pseq in appropriate places). I' be interested in such examples too, but also in other, non-parallel ones.)
There are cases where the same algorithm will optimize better in a pure context. Specifically, stream fusion allows an algorithm that consists of a sequence of loops that may be of widely varying form: maps, filters, folds, unfolds, to be composed into a single loop.
The equivalent optimization in a conventional imperative setting, with mutable data in loops, would have to achieve a full effect analysis, which no one does.
So at least for the class of algorithms that are implemented as pipelines of ana- and catamorphisms on sequences, you can guarantee optimization results that are not possible in an imperative setting.
A very recent paper Haskell beats C using generalised stream fusion by Geoff Mainland, Simon Peyton Jones, Simon Marlow, Roman Leshchinskiy (submitted to ICFP 2013) describes such an example. Abstract (with the interesting part in bold):
Stream fusion [6] is a powerful technique for automatically transforming
high-level sequence-processing functions into efficient implementations.
It has been used to great effect in Haskell libraries
for manipulating byte arrays, Unicode text, and unboxed vectors.
However, some operations, like vector append, still do not perform
well within the standard stream fusion framework. Others,
like SIMD computation using the SSE and AVX instructions available
on modern x86 chips, do not seem to fit in the framework at
all.
In this paper we introduce generalized stream fusion, which
solves these issues. The key insight is to bundle together multiple
stream representations, each tuned for a particular class of stream
consumer. We also describe a stream representation suited for efficient
computation with SSE instructions. Our ideas are implemented
in modified versions of the GHC compiler and vector library.
Benchmarks show that high-level Haskell code written using
our compiler and libraries can produce code that is faster than both
compiler- and hand-vectorized C.
This is just a note, not an answer: the gcc has a pure attribute suggesting it can take account of purity; the obvious reasons are remarked on in the manual here.
I would think that 'static single assignment' imposes a form of purity -- see the links at http://lambda-the-ultimate.org/node/2860 or the wikipedia article.
make and various build systems perform better for large projects by assuming that various build steps are referentially transparent; as such, they only need to rerun steps that have had their inputs change.
For small to medium sized changes, this can be a lot faster than building from scratch.

Efficient EigenSolver Implementation

I am looking for an efficient eigensolver ( language not important, although I would be programming in C#), that utilizes the multi-core features found in modern CPU. Being able to work directly with pardiso solver is a major plus. My matrix are mostly sparse matrix, so an ideal solver should be able to take advantage of this fact and greatly enhance the memory usage and performance.
So far I have only found LAPACK and ARPACK. The LAPACK, as implemented in Intel MKL, is a good candidate, as it offers multi-core optimization. But it seems that the drivers inside the LAPACK don't work directly with pardiso solver, furthermore, it seems that they don't take advantage of sparse matrix ( but I am not sure on this point).
ARPACK, on the other hand, seems to be pretty hard to setup in Windows environment, and the parallel version, PARPACK, doesn't work so well. The bonus point is that it can work with pardiso solver.
The best would be Intel MKL + ARPACK with multi-core speedup. Not sure whether there is any existing implementations that already do what I want to do?
I'm working on a problem with needs very similar to the ones you state. I'm considering FEAST:
http://www.ecs.umass.edu/~polizzi/feast/index.htm
I'm trying to make it work right now, but it seems perfect. I'm interested in hearing what your experience with it is, if you use it.
cheers
Ned
Have a look at the Eigen2 library.
I've implemented it already, in C#.
The idea is that one must convert the matrix format in CSR format. Then, one can use MKL to compute linear equation solving algorithm ( using pardiso solver), the matrix-vector manipulation.

Parallel algorithms and data structures

Inkeeping with my interests in algorithms (see here), I would like to know if there are (contrary to my previous question), algorithms and data structures that are mainstream in parallel programming. It is probably early to ask about mainstream parallel algos and ds, but some of the gurus here may have had good experiences/bad experiences with some of them.
EDIT: I am more interested in successful practical applications of algos and ds than in academic papers.
Thanks
Many of Google's whitepapers, especially but not exclusively ones linked from this page, describe successful practical applications of parallel distributed computing and/or their DS and algorithmic underpinnings. For example, this paper deals with modifying a DBMS's data structures to extract intra-transaction parallelism; this one (and some others) introduces the popular mapreduce architecture, since implemented e.g. in hadoop; this one is about highly parallelizable approximate matrix factoring suitable for use in "kernel methods" in machine learning; etc, etc...
Maybe, I totally miss the point, but there are a ton of mainstream parallel algos and data structures, e.g. matrix multiplication, FFT, PDE and linear equation solvers, integration and simulation (Monte-Carlo / random numbers), searching and sorting, and so on. Take a look at the Designing and Building Parallel Programs or Patterns for Parallel Programming. And then there is CUDA and the like. What are you after?
Sorting:
Standard Template Library for Extra Large Data Sets
Sort Benchmark

Fast FEM Solvers

What are the fast solvers for FEM equations? I would prefer open source implementation, but if there is a commercial implementation, then I won't mind paying for it.
Code Aster is an open source FE code. code aster
The pre- and post-processing is usually done with Salome - both originate from EDF.
How about FEAP. It has full source code available when you purchase it. It is pretty big project, maybe its too much for your needs, but check it out.
FEAP is a general purpose finite
element analysis program which is
designed for research and educational
use. Source code of the full program
is available for compilation using
Windows (Compaq or Intel compiler),
LINUX or UNIX operating systems, and
Mac OS X based Apple systems.
It has also a Personal Edition called FEAPpv available for free, including source code. Differences between those versions are listed in this pdf.
"brad"? do you mean "broad"?
you don't say if your problem is linear or non-linear. that'll make a very big difference.
the solver depends on the type of equation and the size of your problem. for elliptical pdes you can choose standard linear algebra techniques like lu decomposition, iterative methods like successive over relaxation, or wavefront solvers that minimize memory consumption.
some people like solving non-linear steady-state problems as if they were dynamics problems. the idea is to create "fake" mass and damping matricies and use explicit time integration to converge to steady state.
lots of choices. standard linear algebra is a good starting point.
language? java?
Oops, that's kind of a brad question.
Solving differential equations usually starts with analyzing equation itself. Some equations are notoriously difficult to solve efficiently, e.g. indifinite boundary problems.
So if you have something else than an elliptic problem, you'll might better prepare for hard times ahead.
Next important and crutial part is transfering the contiouus problem into a discrete mesh. Typically the accuracy of your results will vary with different ways to generate this mesh. You'll need some sound experience here.
So I'd say there is nothing like the fast slover for FEM equations. Anyway, while Wikipedia gives a short overview of the topic, you might perhaps also have a look a the german Wikipedia page. It lists well-known FEM implementations.
OpenFoam and Elmer are two open source solvers. Not sure about Elmer, but I think OpenFoam might uses the control volume approach.
I used OpenFOAM for fluid dynamics research. You can do parallel processing with it with MPI. And if you have a Cray T3E it will be fast!
It's open source :D
http://www.opencfd.co.uk/openfoam/features.html#features
Please have look for Deal.II open source library:
http://www.dealii.org/
They provide also VirtualBox image which comes pre-installed libs.

Resources