maximum likelihood and support vector complexity - algorithm

Can anyone give some references showing how to determine the maximum likelihood and support vector machine classifiers' computation complexity?
I have been searching the web but don't seem to find a good docs that details how to find the equations that model the computation complexity of those classifier algorithms.
Thanks

Support vector machines, and a number of maximum likelihood fits are convex minimization problems. Therefore they could in theory be solved in polynomial time using http://en.wikipedia.org/wiki/Ellipsoid_method.
I suspect that you can get much better estimates if you consider methods. http://www.cse.ust.hk/~jamesk/papers/jmlr05.pdf says that standard SVM fitting on m instances costs O(m^3) time and O(m^2) space. http://research.microsoft.com/en-us/um/people/minka/papers/logreg/minka-logreg.pdf gives costs per iteration for logistic regression but does not give a theoretical basis for estimating the number of iterations. In practice I would hope that this goes to quadratic convergence most of the time and is not too bad.

Related

How efficient is efficient when it comes to Polynomial Time Algorithms?

I hope this is the right place for this question.
Polynomial time algorithms! How do polynomial time algorithms (PTAs) actually relate to the processing power, memory size (RAM) and storage of computers?
We consider PTAs to be efficient. We know that even for a PTA, the time complexity increases with the input size n. Take for example, there already exists a PTA that determines if a number is prime. But what happens if I want to check a number this big https://justpaste.it/3fnj2? Is the PTA for prime check still considered efficient? Is there a computer that can compute if such a big number like that is prime?
Whether yes or no (maybe no, idk), how does the concept of polynomial time algorithms actually apply in the real world? Is their some computing bound or something for so-called polynomial time algorithms?
I've tried Google searches on this but all I find are mathematical Big O related explanations. I don't find articles that actual relate the concept of PTAs to computing power. I would appreciate some explanation or links to some resources.
There are a few things to explain.
Regarding Polynomial Time as efficient is just an arbitrary agreement. The mathematicians have defined a set Efficient_Algorithms = {P algorithm, where P Polynomial}. That is only a mathematical definition. Mathematicians don't see your actual hardware and they don't care for it. They work with abstract concepts. Yes, scientists consider O(n^100) as efficient.
But you cannot compare one to one statements from theoretical computer science with computer programs running on hardware. Scientists work with formulas and theorems while computer programs are executed on electric circuits.
The Big-Oh notation does not help you for comparing implementations of an algorithms. The Big-Oh notation compares algorithms but not the implementations of them. This can be illustrated as follows. Consider you have a prime checking algorithm with a high polynomial complexity. You implement it and you see it does not perform well for practical use cases. So you use a profiler. It tells you where the bottle neck is. You find out that 98% of the computations time are matrix multiplications. So you develop a processor that does exactly such calculations extremely fast. Or you buy the most modern graphics card for this purpose. Or you wait 150 years for a new hardware generation. Or you achieve to make most of these multiplications parallel. Imagine you achieved somehow to reduce the time for matrix multiplications by 95%. With this wonderful hardware you run your algorithm. And suddenly it performs well. So your algorithm is actually efficient. It was only your hardware that was not powerful enough. This is not an thought experiment. Such dramatic improvements of computation power are reality quite often.
The very most of algorithms that have a polynomial complexity have such because the problems they are solving are actually of polynomial complexity. Consider for example the matrix multiplication. If you do it on paper it is O(n^3). It is the nature of this problem that it has a polynomial complexity. In practice and daily life (I think) most problems for which you have a polynomial algorithm are actually polynomial problems. If you have a polynomial problem, then a polynomial algorithm is efficient.
Why do we talk about polynomial algorithms and why do we consider them as efficient? As already said, this is quite arbitrary. But as a motivation the following words may be helpful. When talking about "polynomial algorithms", we can say there are two types of them.
The algorithms that have a complexity that is even lower than polynomial (e.t. linear or logarithmic). I think we can agree to say these are efficient.
The algorithms that are actually polynomial and not lower than polynomial. As illustrated above, in practice these algorithms are oftentimes polynomial because they solve problems that are actually of polynomial nature and therefore require polynomial complexity. If you see it this way, then of course we can say, these algorithms are efficient.
In practice if you have a linear problem you will normally recognise it as a linear problem. Normally you would not apply an algorithm that has a worse complexity to a linear problem. This is just practical experience. If you for example search an element in a list you would not expect more comparisons than the number of elements in the list. If in such cases you apply an algorithm that has a complexity O(n^2), then of course this polynomial algorithm is not efficient. But as said, such mistakes are oftentimes so obvious, that they don't happen.
So that is my final answer to your question: In practice software developers have a good feeling for linear complexity. Good developers also have a feeling of logarithmic complexity in real life. In consequence that means, you don't have to worry about complexity theory too much. If you have polynomial algorithm, then you normally have a quite good feeling to tell if the problem itself is actually linear or not. If this is not the case, then your algorithm is efficient. If you have an exponential algorithm, it may not be obvious what is going on. But in practice you see the computation time, do some experiments or get complains from users. Exponential complexity is normally not deniable.

2nd order symplectic exponentially fitted integrator

I have to solve equations of motion of a charged particle under the effect of electromagnetic field. Since I have to deal with speed over precision I could not use adaptive stepsize algorithms (like Runge-Kutta Cash-Karp) because they would take too much time. I was looking for an algorithm which is both symplectic (like Boris integration) and exponentially fitted (in order to solve the equation of motion even if the equation is stiff). I found a method but it is for second order differential equations:
https://www.math.purdue.edu/~xiaj/work/SEFRKN.pdf
Later I found a paper which would describe a fourth order symplectic exponentially-fitted Runge-Kutta:
http://users.ugent.be/~gvdbergh/files/publatex/annals1.pdf
Since I have to deal with speed I was looking for a lower order algorithm. Does a 2nd order symplectic exponentially fitted ODE algorithm exist?

What's the fastest running time for modular multiplication of large integers on multiple processors?

I am interested to learn about the time complexity for multiplying large integers modulo a (large) constant using multiple processors. What this boils down to is basically just integer multiplication, since division and remainder can also be implemented using multiplication (e.g. reciprocal multiplication or Barrett reduction).
I know that the runtime of currently known integer multiplication algorithms is roughly bounded by the lower bound o(n * log n). My research has failed to find out if this is for a single core or multi core machine. However, I am thinking this is for a single core machine as the algorithms seem to use a divide-and-conquer approach.
Now my question is, what is the currently known lower bound for the time complexity of a parallel integer multiplication algorithm implemented on m cores? Can a time complexity with lower bound of o(n) or less be achieved, given enough cores? (i.e. if m depends on n?) Here o(n) describes the input size of the integers at hand.
So far in my research I have read several papers claiming a speedup using parallel FFT multiplication. Unfortunately these only claim empirical speedups (e.g. "a 56% speed improvement using 6 cores on a such and such computer") and then fail to explain the theoretical speedup expressed in time complexity bounds.
I am aware the "fastest" integer multiplication algorithm has not yet been found, this is an unsolved problem in computer science. I am merely inquiring about the currently known bounds for such parallel algorithms.
Update #1: User #delnan linked to a wiki page about the NC complexity class. That wiki page mentions integer multiplication is in NC, meaning there exists an O((log n)^c) algorithm on O(n^k) processors. This is helpful towards getting closer to an answer. The part that's left unanswered for now is what are the c and k constants for integer multiplication and which parallel algorithm lends itself to this purpose?
Update #2: According to page 12 of 15 in this PDF file from a Computer Science course at Cornell University, integer multiplication in the NC complexity class takes O(log n) time on O(n^2) processors. It also explains an example algorithm on how to go about this. I'll write up a proper answer for this question shortly.
One last question to satisfy my curiosity: might anyone know something about the currently known time complexity for "just" O(n), O(sqrt(n)) or O(log n) processors?
The computational complexity of algorithms is not affected by parallelisation.
For sure, take a problem such as integer multiplication and you can find a variety of algorithms for solving the problem. These algorithms will exhibit a range of complexities. But given any algorithm running it on p processors will, at theoretical best, give a speedup of p times. In terms of computational complexity this is like multiplying the existing complexity, call it O(f(n)), by a constant, to get O((1/p)*f(n)). As you know multiplying a complexity by a constant doesn't affect the complexity classification.
To take another, perhaps more practical, line of argument, changing the number of processors doesn't change the number of basic operations that an algorithm performs for any given problem at any given size -- except for any additional operations necessitated by coordinating the operation of parallel components.
Again, bear in mind that computational complexity is a very theoretical measure of computational effort, generally measured as the number of some kind of basic operations required to compute for a given input size. One such basic operation might be the multiplication of two numbers (of limited size). Changing the definition of the basic operation to multiplication of two vectors of numbers (each of limited size) won't change the complexity of algorithms.
Of course, there is a lot of empirical evidence that parallel algorithms can operate more quickly than their serial counterparts, as you have found.

How to translate algorithm complexity to time necessary for computation

If I know complexity of an algorithm, can I predict how long it will compute in real life?
A bit more context:
I have been trying to solve university assignment which has to find the best possible result in a game from given position. I have written an algorithm and it works, however very slow. The complexity is O(n)=5^n . For 24 elements it computes a few minutes. I'm not sure if it's because my implementation is wrong, or if this algorithm is simply very slow. Is there a way for me to approximate how much time any algorithm should take?
Worst case you can base on extrapolation. So having time on N=1,2,3,4 elements (the more the better) and O-notation estimation for algorithm complexity you can estimate time for any finite number. Another question this estimation precision goes lower and lower as N increases.
What you can do with it? Search for error estimation algorithms for such approaches. In practice it usually gives good enough result.
Also please don't forget about model adequateness checks. So having results for N=1..10 and O-notation complexity you should check 'how good' your results correlate with your O-model (if you can select numbers for O-notation formula that meets your results). If you cannot get numbers, you need either more numbers to get wider picture or ... OK, you can have wrong complexity estimation :-).
Useful links:
Brief review on algorithm complexity.
Time complexity catalogue
Really good point to start - look for examples based on code as input.
You cannot predict running time based on time complexity alone. There are many factors involved: hardware speed, programming language, implementation details, etc. The only thing you can predict using the complexity is expected time increase when the size of the input increases.
For example, personally, I've seen O(N^2) algorithms take longer than O(N^3) ones, especially on small values of N, such as it is in your case. And by, the way, 5^24 is a huge number (5.9e16). I wouldn't be surprised if that took a few hours on a supercomputer, let alone on some mid-range personal pc, which most of us are using.

Big O Notation - Orders of Magnitude

I am currently trying to work out what are the main quantitative differences between a quadratic algorithm, a cubic algorithm and one which is exponential.
What I don't understand is what it means by "Quantitative" and what it really is asking? I tried searching for information on this but no luck.
Thanks.
When you use big-O notation to estimate the computational complexity of an algorithm,
the goal is to provide a qualitative insight as to how changes in N affect the algorithmic
performance as N becomes large.
If you eliminate any term whose contribution to the total ceases to be significant as N becomes large and eliminate any constant factors, I guess you could say you are left with the main quantitative difference.
Quantitative differences just means differences in quantity -- i.e. what are the size differences between those different kinds of algorithms? It would be good idea to give numeric examples, e.g. show the running time of quadratic, cubic, and exponential algorithms for some example problem sizes.

Resources