Strassen's Algorithm proof - algorithm

I have been reading about the Strassen Algorithm for matrix multiplication.
As mentioned in Introduction to Algorithms by Cormen , the algorithm is not intuitive. However I am curious to know if there exists any rigorous mathematical proof of the algorithm and what actually went into the design of the algorithm.
I tried searching on Google and stackoverflow, but all links are only on comparing Strassen's approach to standard matrix multiplication approach or they elaborate on the procedure presented by the algorithm.

You should go to the source material. In this case, the original paper by Strassen:
Strassen, Volker, Gaussian Elimination is not Optimal, Numer. Math. 13, p. 354-356, 1969
http://link.springer.com/article/10.1007%2FBF02165411?LI=true
Even though I haven't read it myself, I would assume that there is a rigorous discussion and proof of the complexity of the algorithm.
It looks like Professor Strassen is still active (http://en.wikipedia.org/wiki/Volker_Strassen) and has a home page (http://www.math.uni-konstanz.de/~strassen/). If, after learning as much as you can about the algorithm, you are still interested in learning more, I don't think a carefully worded email to the professor would be out of the question.
Unfortunately, there does not seem to be a free version of the paper available online despite the fact that the work was completed at a public university (UC Berkeley) using federal funds (NSF grant), but that is a completely separate issue we shouldn't discuss here.
If you are a student, you will likely have access via your school, or at least your school could get you a copy without cost to you. Good luck.

The proof that Strassen's algorithm should exist is a simple dimension count (combined with a proof that the naive dimension count gives the correct answer). Consider the vector
space of all bilinear
map $C^n\times C^n \rightarrow C^n$, this is a vector space of dimension $n^3$ (in the case of matrix multiplication, we have $n=m^2$, e.g. $n=4$ for the $2\times 2$ case). The set of bilinear
maps of rank one, i.e., those computable in an algorithm using just one scalar multiplication, has dimension $3(n-1)+1$ and the set of bilinear maps of rank at
most $r$ has dimension the min of $r[3(n-1)]+r$ and $n^3$ for most values of $n,r$ (and one can check that
this is correct when $r=7,n=4$. Thus any bilinear map $C^4\times C^4\rightarrow C^4$,
with probability one has rank at most $7$, and may always be approximated to arbitrary
precision by a bilinear map of rank at most $7$.

Related

Understanding Computational Complexity in Bitonic Sort

I am taking a parallel programming class and am really struggling with understanding some of the computational complexity calculations and algebraic simplifications.Specifically for the bitonic sort algorithm I am looking when each processor is given a block of elements.
I am looking at situations when either a hypercube or 2D mesh interconnection network is used. I am given the following definitions for the calculation of Speedup, Efficiency, Iso-Efficiency and determining if the solution is cost optimal. I can understand how Speedup is determined but am totally lost as to how to solve for Efficiency and Iso-Efficiency. I think I understand cost-optimality as well. Given below are the equations.
The text which I am using for the class is
Introduction to Parallel Computing, 2nd Edition by Ananth Grama, Anshul Gupta, George Karypis & Vipin Kumar
For my question regarding the algebra of this problem, please refer to this

Find intersection(s) of any two functions - solving simultaneous equations

Imagine having any two functions. You need to find intersections of that functions. You definitely don't want to try all x values to check for f(x)==g(x).
Normally in math, you create simultaneous equations derived from f(x)==g(x). But I see no way how to implement equations in any programing language.
So once more, what am I looking for:
Conceptual algorithm to solve equations.
The same for simultaneous and quadratic equations.
I believe there should be some workaround using function derivations, but I've recently learned derivation concept at school and I have no idea how to use it in this case.
That is a much harder problem than you would imagine. A good place to start for learning about these things is the Newton-Raphson method, which gives numerical approximations to equations of the form h(x) = 0. (When you set h(x) = g(x) - f(x), this provides solutions for the problem you are asking about.)
Exact, algebraic solving of equations (as implemented in Mathematica, for example) are even more difficult, you basically have to recreate everything you would do in your head when solving an equation on a piece of paper.
Obviously this problem is not solvable in the general case because you can construct a "function" which is arbitrarily complex. For example, if you have a "function" with 5 trillion terms in it including various transcendental and complex transformations in it, the computer could take years just to compute a single value, much less intersect it with another similar function.
So, first of all you need to define what you mean by a "function". If you mean a polynomial of degree less than 4 then the problem becomes much more straightforward. In such cases you combine the terms of the polynomial and find the roots of the equation, which will be the intersections.
If the polynomial has more than 5 terms (a quintic or greater) then there is no easy symbolic solution. In this case the terms are combined and you find the roots by iterative approximation. See Root Finding Algorithms.
If the function involves transcendentals such sin/cos/log/e^x, etc, you can potentially find the intersection by representing the functions as a series or a continued fraction. You then subtract one series from the other and set the value to zero. The solution of the continuous equation yields an approximation of the root(s).

Efficient algorithm for finding largest eigenpair of small general complex matrix

I am looking for an efficient algorithm to find the largest eigenpair of a small, general (non-square, non-sparse, non-symmetric), complex matrix, A, of size m x n. By small I mean m and n is typically between 4 and 64 and usually around 16, but with m not equal to n.
This problem is straight forward to solve with the general LAPACK SVD algorithms, i.e. gesvd or gesdd. However, as I am solving millions of these problems and only require the largest eigenpair, I am looking for a more efficient algorithm. Additionally, in my application the eigenvectors will generally be similar for all cases. This lead me to investigate Arnoldi iteration based methods, but I have neither found a good library nor algorithm that applies to my small general complex matrix. Is there an appropriate algorithm and/or library?
Rayleigh iteration has cubic convergence. You may want to implement also the power method and see how they compare, since you need LU or QR decomposition of your matrix.
http://en.wikipedia.org/wiki/Rayleigh_quotient_iteration
Following #rchilton's comment, you can apply this to A* A.
The idea of looking for the largest eigenpair is analogous to finding a large power of the matrix, as the lower frequency modes get damped out during the iteration. The Lanczos algorithm, is one of a few such algorithms that rely on the so-called Ritz eigenvectors during the decomposition. From Wikipedia:
The Lanczos algorithm is an iterative algorithm ... that is an adaptation of power methods to find eigenvalues and eigenvectors of a square matrix or the singular value decomposition of a rectangular matrix. It is particularly useful for finding decompositions of very large sparse matrices. In latent semantic indexing, for instance, matrices relating millions of documents to hundreds of thousands of terms must be reduced to singular-value form.
The technique works even if the system is not sparse, but if it is large and dense it has the advantage that it doesn't all have to be stored in memory at the same time.
How does it work?
The power method for finding the largest eigenvalue of a matrix A can be summarized by noting that if x_{0} is a random vector and x_{n+1}=A x_{n}, then in the large n limit, x_{n} / ||x_{n}|| approaches the normed eigenvector corresponding to the largest eigenvalue.
Non-square matrices?
Noting that your system is not a square matrix, I'm pretty sure that the SVD problem can be decomposed into separate linear algebra problems where the Lanczos algorithm would apply. A good place to ask such questions would be over at https://math.stackexchange.com/.

Running time of computing mathematical functions

Where can I turn for information regarding computing times of mathematical functions? Has any (general) study with any amount of rigor been made?
For instance, the computing time of
constant + constant
generally takes O(1).
Suppose I want to start using math like integrals, and I'd like to get an asymptotic approximation to various integrals. Has there been a standard study of this, or must I take the information I have and figure out my own approximation. I'd be very interested in a standard approach to this, and I'd like to know if it already exists.
Here's my motivation:
I'm in the middle of writing a paper that points out the equivalence between NP hard problems and certain types of mathematical equations. It seems that there might be use for a study of math computing times that is generalized like a new science.
EDIT:
I guess I'm wondering if there is a standard computational complexity to any given math that cannot be avoided. I'm wondering if anyone has studied this question. I'd love to see what others have tried.
EDIT 2:
Wikipedia lists "Computational Complexity Theory" in their encyclopedia, which I think may fit the bill. I'm still wondering if someone who has studied this could affirm this.
"Standard" math has no notion of algorithmic complexity. That's reserved for computer algorithms.
There are ways to analyze the dynamic behavior of solutions of equations. Things like convergence matter a great deal to mathematicians.
You can ask what the algorithmic complexity of euler integration versus fifth-order Runge-Kutta for integration. They would compare based on number of function evaluations required and time step stability.
But what's the "running time" of the solution to Fermat's Last Theorem? What about the last of David Hilbert's challenge problems? Is the "running time" for those a century and counting? What's your running time for solving a partial differential equation using separation of variables?
When you think about it that way, do you have a better understanding of why people would be put off by your question?
Yes, for various mathematical functions, the computational complexity (running time) of computing the function has been studied. This can differ depending on the model of computation.
For example adding two n-bit numbers takes Θ(n) time, multiplying them takes Θ(n log n) time (using the FFT), finding their gcd takes Θ(n2) time with the usual Euclidean algorithm and Θ(n(log n)2 (log log n)) with better algorithms, etc. For more complicated stuff like integrals, obviously it depends on what algorithm you use to do it.
There isn't a collected body of work, but work on approximating functions comes close. For example, you'd like to know that approximating sin(x) to within an epsilon error can be done in time proportional to some polynomial in log(x) and 1/epsilon. There isn't a general theory here (you should look up information complexity though), and focusing on specific functions might help.
user389117,
I think that subconsciously you want to deduce the complexity of computing a mathematical type from the form of this mathematical type.
E.g. A math type which concerns the square of the variable (x^2) you think (at least subconsciously) that the complexity of the computation is anologous to x^2 so the complexity should be something like O(n^2) or there is a standard process to deduce the form of complexity from the form of the mathematical equation.
These both are different qualities and one cannot deduce the one quality from the other.
I will give you an example: In papers all algorithms are written in pseudo code and then the scientists deduce the complexity of the pseudo code.
The pseudo code must be inevitably written and then you compute the complexity.
There is no a magical way to have the complexity derived from the form of the thing you want to compute.
Even if you compute the complexity and you find that the form is analogous to the form of the equation computed then I think it would be hard, at least at first place, for you to convert that remark from pseudo-science to science.
Good Luck!

How is linear algebra used in algorithms?

Several of my peers have mentioned that "linear algebra" is very important when studying algorithms. I've studied a variety of algorithms and taken a few linear algebra courses and I don't see the connection. So how is linear algebra used in algorithms?
For example what interesting things can one with a connectivity matrix for a graph?
Three concrete examples:
Linear algebra is the fundament of modern 3d graphics. This is essentially the same thing that you've learned in school. The data is kept in a 3d space that is projected in a 2d surface, which is what you see on your screen.
Most search engines are based on linear algebra. The idea is to represent each document as a vector in a hyper space and see how the vector relates to each other in this space. This is used by the lucene project, amongst others. See VSM.
Some modern compression algorithms such as the one used by the ogg vorbis format is based on linear algebra, or more specifically a method called Vector Quantization.
Basically it comes down to the fact that linear algebra is a very powerful method when dealing with multiple variables, and there's enormous benefits for using this as a theoretical foundation when designing algorithms. In many cases this foundation isn't as appearent as you might think, but that doesn't mean that it isn't there. It's quite possible that you've already implemented algorithms which would have been incredibly hard to derive without linalg.
A cryptographer would probably tell you that a grasp of number theory is very important when studying algorithms. And he'd be right--for his particular field. Statistics has its uses too--skip lists, hash tables, etc. The usefulness of graph theory is even more obvious.
There's no inherent link between linear algebra and algorithms; there's an inherent link between mathematics and algorithms.
Linear algebra is a field with many applications, and the algorithms that draw on it therefore have many applications as well. You've not wasted your time studying it.
Ha, I can't resist putting this here (even though the other answers are good):
The $25 billion dollar eigenvector.
I'm not going to lie... I never even read the whole thing... maybe I will now :-).
I don't know if I'd phrase it as 'linear algebra is very important when studying algorithms". I'd almost put it the other way around. Many, many, many, real world problems end up requiring you to solve a set of linear equations. If you end up having to tackle one of those problems you are going to need to know about some of the many algorithms for dealing with linear equations. Many of those algorithms were developed when computers was a job title, not a machine. Consider gaussian elimination and the various matrix decomposition algorithms for example. There is a lot of very sophisticated theory on how to solve those problems for very large matrices for example.
Most common methods in machine learning end up having an optimization step which requires solving a set of simultaneous equations. If you don't know linear algebra you'll be completely lost.
Many signal processing algorithms are based on matrix operations, e.g. Fourier transform, Laplace transform, ...
Optimization problems can often be reduced to solving linear equation systems.
Linear algebra is also important in many algorithms in computer algebra, as you might have guessed. For example, if you can reduce a problem to saying that a polynomial is zero, where the coefficients of the polynomial are linear in the variables x1, …, xn, then you can solve for what values of x1, …, xn make the polynomial equal to 0 by equating the coefficient of each x^n term to 0 and solving the linear system. This is called the method of undetermined coefficients, and is used for example in computing partial fraction decompositions or in integrating rational functions.
For the graph theory, the coolest thing about an adjacency matrix is that if you take the nth power of an adjacency Matrix for an unweighted graph (each entry is either 0 or 1), M^n, then each entry i,j will be the number of paths from vertex i to vertex j of length n. And if that isn't just cool, then I don't know what is.
All of the answers here are good examples of linear algebra in algorithms.
As a meta answer, I will add that you might be using linear algebra in your algorithms without knowing it. Compilers that optimize with SSE(2) typically vectorize your code by having many data values manipulated in parallel. This is essentially elemental LA.
It depends what type of "algorithms".
Some examples:
Machine-Learning/Statistics algorithms: Linear Regressions (least-squares, ridge, lasso).
Lossy compression of signals and other processing (face recognition, etc). See Eigenfaces
For example what interesting things can one with a connectivity matrix for a graph?
A lot of algebraic properties of the matrix are invariant under permutations of vertices (for example abs(determinant)), so if two graphs are isomorphic, their values will be equal.
This is a source for good heuristics for determining whether two graphs
are not isomorphic, since of course equality does not guarantee existance of isomorphism.
Check algebraic graph theory for a lot of other interesting techniques.

Resources