What's the difference between LibSVM and LibLinear - algorithm

libsvm and liblinear are both software libraries that implement Support Vector Machines. What's the difference? And how do the differences make liblinear faster than libsvm?

In practice the complexity of the SMO algorithm (that works both for kernel and linear SVM) as implemented in libsvm is O(n^2) or O(n^3) whereas liblinear is O(n) but does not support kernel SVMs. n is the number of samples in the training dataset.
Hence for medium to large scale forget about kernels and use liblinear (or maybe have a look at approximate kernel SVM solvers such as LaSVM).
Edit: in practice libsvm becomes painfully slow at 10k samples.

SVM is support vector machine, which is basically a linear classifier, but using many kernel transforms to turn a non-linear problem into a linear problem beforehand.
From the link above, it seems like liblinear is very much the same thing, without those kernel transforms. So, as they say, in cases where the kernel transforms are not needed (they mention document classification), it will be faster.

From : http://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf
It supports L2-regularized logistic regression (LR), L2-loss and L1-loss linear support vector machines (SVMs) (Boser et al., 1992). It inherits many features of the popular SVM library LIBSVM
And you might also see some useful information here from one of the creators: http://agbs.kyb.tuebingen.mpg.de/km/bb/showthread.php?tid=710
The main idea, I would say, is that liblinear is optimized to deal with linear classification (i.e. no kernels necessary), whereas linear classification is only one of the many capabilities of libsvm, so logically it may not match up to liblinear in terms of classification accuracy. Obviously, I'm making some broad generalizations here, and the exact details on the differences are probably covered in the paper I linked above as well as with the corresponding user's guide to libsvm from the libsvm website.

Related

SIMD-exploiting implementation of Peterson and Monico's Lanczos algorithm over the field with two elements

(This question is probably flirting with the "no software recommendations" rule; I understand why it might be closed).
In their paper F_2 Lanczos revisited, Peterson and Monico give a version of the Lanczos algorithm for finding a subspace of the kernel of a linear map over Z/2Z. If my cursory reading of their paper is correct (whether it is or not is clearly not a question for SO), the algorithm presented requires a number of iterations that scales inversely proportional to the word size of the machine used. The authors implemented their proof of concept algorithm with a 64 bit word size.
Does there exist a publicly available implementation of that algorithm utilizing wide SIMD words for (a potentially significant) speedup?
An existing implementation would be a software recommendation. A more interesting question is "Is it possible to use SIMD to make this algorithm run faster?" From my glance at the paper, it sounds like SIMD is exactly what they are describing ("We will partition a 64 bit machine word x into eight subwords...where each ... is an 8-bit word") so if the authors' implementation is publicly available somewhere, the answer is "yes" because they're already using it. If this algorithm were written in C/C++ or something like that, an optimizing compiler would likely do a pretty good job of vectorizing it with SIMD even without manually specifying how to split the registers (can be verified by looking at the assembly). It would arguably be preferable to implement in high level language without splitting registers manually, because then the compiler could optimize it for any target machine's word size.

Performing many small matrix operations in parallel in OpenCL

I have a problem that requires me to do eigendecomposition and matrix multiplication of many (~4k) small (~3x3) square Hermitian matrices. In particular, I need each work item to perform eigendecomposition of one such matrix, and then perform two matrix multiplications. Thus, the work that each thread has to do is rather minimal, and the full job should be highly parallelizable.
Unfortunately, it seems all the available OpenCL LAPACKs are for delegating operations on large matrices to the GPU rather than for doing smaller linear algebra operations inside an OpenCL kernel. As I'd rather not implement matrix multiplcation and eigendecomposition for arbitrarily sized matrices in OpenCL myself, I was hoping someone here might know of a suitable library for the job?
I'm aware that OpenCL might be getting built-in matrix operations at some point since the matrix type is reserved, but that is not really of much use right now. There is a similar question here from 2011, but it pretty much just says to roll your own, so I'm hoping the situation has improved since then.
In general, my experience with libraries like LAPACK, fftw, cuFFT, etc. is that when you want to do many really small problems like this, you are better off writing your own for performance. Those libraries are usually written for generality, so you can often beat their performance for specific small problems, especially if you can use unique properties of your particular problem.
I realize you don't want to hear "roll your own" but for this type of problem it is really the best thing to do IMO. You might find a library to do this, but considering the code that you really want (for performance) will not generalize, I doubt it exists. You'll be looking specifically for code to find the eigenvalues of 3x3 matrices. That's less of a library and more of a random code snippet with a suitable license that you can manipulate to take advantage of your specific problem.
In this specific case, you can find the eigenvalues of a 3x3 matrix with the textbook method using the characteristic polynomial. Remember that there is a relatively simple closed form solution for cubic equations: http://en.wikipedia.org/wiki/Cubic_function#General_formula_for_roots.
While I think it is very likely that this approach would be much faster than iterative methods, it would be wise to verify that if performance is an issue.

state-of-the-art of classification algorithms

We know there are like a thousand of classifiers, recently I was told that, some people say adaboost is like the out of the shell one.
Are There better algorithms (with
that voting idea)
What is the state of the art in
the classifiers.Do you have an example?
First, adaboost is a meta-algorithm which is used in conjunction with (on top of) your favorite classifier. Second, classifiers which work well in one problem domain often don't work well in another. See the No Free Lunch wikipedia page. So, there is not going to be AN answer to your question. Still, it might be interesting to know what people are using in practice.
Weka and Mahout aren't algorithms... they're machine learning libraries. They include implementations of a wide range of algorithms. So, your best bet is to pick a library and try a few different algorithms to see which one works best for your particular problem (where "works best" is going to be a function of training cost, classification cost, and classification accuracy).
If it were me, I'd start with naive Bayes, k-nearest neighbors, and support vector machines. They represent well-established, well-understood methods with very different tradeoffs. Naive Bayes is cheap, but not especially accurate. K-NN is cheap during training but (can be) expensive during classification, and while it's usually very accurate it can be susceptible to overtraining. SVMs are expensive to train and have lots of meta-parameters to tweak, but they are cheap to apply and generally at least as accurate as k-NN.
If you tell us more about the problem you're trying to solve, we may be able to give more focused advice. But if you're just looking for the One True Algorithm, there isn't one -- the No Free Lunch theorem guarantees that.
Apache Mahout (open source, java) seems to pick up a lot of steam.
Weka is a very popular and stable Machine Learning library. It has been around for quite a while and written in Java.
Hastie et al. (2013, The Elements of Statistical Learning) conclude that the Gradient Boosting Machine is the best "off-the-shelf" Method. Independent of the Problem you have.
Definition (see page 352):
An “off-the-shelf” method is one that
can be directly applied to the data without requiring a great deal of timeconsuming data preprocessing or careful tuning of the learning procedure.
And a bit older meaning:
In fact, Breiman (NIPS Workshop, 1996) referred to AdaBoost with trees as the “best off-the-shelf classifier in the world” (see also Breiman (1998)).

What is the best matrix multiplication algorithm? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
What is the best matrix multiplication algorithm? What means 'the best'for me? It means the fastest and ready for todays machines.
Please give links to pseudocode if you can.
BLAS is the best ready-to-use efficient matrix multiplication library. There are many different implementation. Here is a benchmark I made for some implementations on a MacBook Pro with dual-core Intel Core 2 Duo 2.66 GHz :
gotoBLAS2 (open-source) : https://www.tacc.utexas.edu/research-development/tacc-software/gotoblas2
ATLAS (open-source) : http://math-atlas.sourceforge.net/
Accelerate.framework (Apple) : http://developer.apple.com/performance/accelerateframework.html
a non-optimized, but portable, implementation that I called 'vanilla' (from the GSL)
There are also other commercial implementations that I didn't test here :
MKL (Intel) : http://software.intel.com/en-us/articles/intel-mkl/
ACML (AMD) : http://developer.amd.com/cpu/Libraries/acml/Pages/default.aspx
The best matrix multiplication algorithm is the one that someone with detailed architectural knowledge has already hand-tuned for your target platform.
There are lots of good libraries that supply tuned matrix-multiply implementations. Use one of them.
There are probably better ones but these are the ones I've head of (better than the standard cubic complexity algorithm).
Strassen's - O(N^2.8)
Coppersmith Winograd - O(N^2.376)
Why pseudocode? Why implement it yourself? If speed is your concern, there are highly optimized algorithms available that include optimizations for specific instruction sets (e.g. SIMD), implementing those all by yourself offers no real benefit (apart from maybe learning),
Take a look at different BLAS implementations, like:
http://www.netlib.org/blas/
http://math-atlas.sourceforge.net/
Here is algorithms course of MIT and the matrix multiplication lecture
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/video-lectures/lecture-19-shortest-paths-iii-all-pairs-shortest-paths-matrix-multiplication-floyd-warshall-johnson/
matrix multiplication - O(n^3)
Strassen’s algorithm - O(n^2.8) http://en.wikipedia.org/wiki/Strassen_algorithm
Coppersmith–Winograd - O(n^2.376) http://en.wikipedia.org/wiki/Coppersmith%E2%80%93Winograd_algorithm
Depends on the size of the matrix, and whether it's sparse or not.
For small-to-medium-sized dense matrices, I believe that some variation on the "naive" O(N^3) algorithm is a win, if you pay attention to cache-coherence and use the platform's vector instructions.
Data arrangement is important -- for cases where your standard matrix layout is cache-unfriendly (e.g., column-major * row-major), you should try binary decomposition of your matrix multiplication -- even if you don't use Strassen's or other "fast" algorithms, this order of operations can yield a "cache-oblivious" algorithm that automatically makes good use of every level of cache. If you have the luxury to rearrange your matrices, you might try combining this with a bit-interleaved (or "Z-order") ordering of data elements.
Finally, remember: premature optimization is the root of all evil. And when it's not premature any more, always profile & benchmark before, during, and after optimizing....
There is no "best algorithm" for all matrices on all modern CPUs.
You will need to do some research into the many methods available, and then find a best-fit solution to the particular problems you are calculating on the particular hardware you are dealing with.
For example, the "fastest" way on your hardware platform may be to use a "slow" algorithm but ask your GPU to apply it to 256 matrices in parallel. Or using a "fast" general-purpose (mxn) algorithm may produce much slower results than using an optimised 3x3 matrix multiply. If you really want it to be fast then you may want to consider getting down to the bare metal to make sure you make best use of specific CPU features like SIMD instructions, branch prediction and cache coherence, at the expense of portability.
There is an algorithm call the Cannon's algorithm a distributed matrix multiplication algorithm. More here

Parallel algorithms and data structures

Inkeeping with my interests in algorithms (see here), I would like to know if there are (contrary to my previous question), algorithms and data structures that are mainstream in parallel programming. It is probably early to ask about mainstream parallel algos and ds, but some of the gurus here may have had good experiences/bad experiences with some of them.
EDIT: I am more interested in successful practical applications of algos and ds than in academic papers.
Thanks
Many of Google's whitepapers, especially but not exclusively ones linked from this page, describe successful practical applications of parallel distributed computing and/or their DS and algorithmic underpinnings. For example, this paper deals with modifying a DBMS's data structures to extract intra-transaction parallelism; this one (and some others) introduces the popular mapreduce architecture, since implemented e.g. in hadoop; this one is about highly parallelizable approximate matrix factoring suitable for use in "kernel methods" in machine learning; etc, etc...
Maybe, I totally miss the point, but there are a ton of mainstream parallel algos and data structures, e.g. matrix multiplication, FFT, PDE and linear equation solvers, integration and simulation (Monte-Carlo / random numbers), searching and sorting, and so on. Take a look at the Designing and Building Parallel Programs or Patterns for Parallel Programming. And then there is CUDA and the like. What are you after?
Sorting:
Standard Template Library for Extra Large Data Sets
Sort Benchmark

Resources