DCT with low complexity

DCT with low complexity - algorithm

I successfully computed 2d dct of an image using the classic algorithm and also using as a combination of 1d arrays. Those 2 methods have a time complexity of n^4 and n^3 respectively.
While implementing on an image it takes a very long time to compute. like 7 minutes for a 512 x 512 image using the one with n^3 complexity.
Is there any other algorithms to compute DCT having a minimal time complexity?
How does matlab do it so quickly though?

there are 2 common approaches for fast DCT.
DCT from DFT
1D DCT can be derived from 1D DFT in O(n) so when FFT algorithm is applied you got O(n.log(n)) for 1D and O(n^2.log(n)) for 2D. For more info see:
I am looking for a simple algorithm for fast DCT and IDCT of matrix [NxM]
This approach is more used as it is a bit more easy to implement. There are more ways on how to derive DCT from DFT some use the same array size and the other uses double size for DFT.
Fast DCT
There are also fast DCT equations out there but they are not commonly used because they are not very well known and not well documented online. Another more inmportant point is that the recursive breakdown involves both DCT and DST and the split is done usually into 3 therms instead of 2 which makes the implementation much harder. And also we need fast DST implementation which is analogy to DCT so it also breaks down to 3 therms and is using both DCT and DST. The bright side is that it does not involve complex domain but as you can imagine there is much more code needed in comparison to #1.
From a quick search I found this
The fast DCT-IV/DST-IV computation via the MDCT
But to find relevant info about fast DCT in real domain is a problem because most articles are either hard wired (constant n) implementations or are using approach #1. And when you find something it usually does contains errors and does not work. Your best bet for this approach is to find some old paper or book on computer graphics or discrete math.

Related

'Squaring a polynomial' versus 'Multiplying the polynomial with itself using FFT'

The fastest known algorithm for polynomial multiplication is using Fast Fourier Transformation (FFT).
In a special case of multiplying a polynomial with itself, I am interested in knowing if any squaring algorithm performs better than FFT. I couldn't find any resource which deals with this aspect.

Number Theoretic Transform (NTT) is faster than FFT
Why? Because you using just integer modular arithmetics on some ring instead of floating point complex numbers while the properties stays the same (as NTT is sort of form of DFT ...). So if your polynomials are integer use NTT which is faster... if they are float you need to use FFT
FFT based squaring is faster than FFT based multiplying by itself
Why? Because you need just 1x NTT/FFT and 1x iNTT/iFFT while multiplying needs 2xNTT/FFT and 1x iNTT/iFFT so you spare one transformation ... the rest is the same
for small enough polynomials is squaring without FFT fastest
For more info see:
Fast bignum square computation
its not the same problem but very similar ... as bignum data words are similar to your polynomial coefficients. So most of it applies to your problem too

Find Independent Vectors (High Performance)

I'm in desperate need of a high performance algorithm to reduce a matrix to its independent vectors (row echelon form), aka find the basis vectors. I've seen the Bareiss algorithm and Row Reduction but they are all too slow, if anyone could recommend a faster implementation I'd be grateful!!! Happy to use TBB parallelisation.
Thanks!

What are you trying to do with the reduced echelon form? Do you just need the basis vectors to have them or are you trying to solve a system of equation? If you're solving a system of equations you can do an LU factorization and probably get faster calculation times. Otherwise gaussian elimination with partial pivoting is your fastest option.
Also do you know if your matrix is of a special form? Like upper or lower triangular for example. If it is then you can rewrite some of these algorithms to be faster based on the type of matrix that you have.

Where is strassen's matrix multiplication useful?

Strassen's algorithm for matrix multiplication just gives a marginal improvement over the conventional O(N^3) algorithm. It has higher constant factors and is much harder to implement. Given these shortcomings, is strassens algorithm actually useful and is it implemented in any library for matrix multiplication? Moreover, how is matrix multiplication implemented in libraries?

Generally Strassen’s Method is not preferred for practical applications for following reasons.
The constants used in Strassen’s method are high and for a typical application Naive method works better.
For Sparse matrices, there are better methods especially designed
for them.
The submatrices in recursion take extra space.
Because of the limited precision of computer arithmetic on
noninteger values, larger errors accumulate in Strassen’s algorithm
than in Naive Method.

So the idea of strassen's algorithm is that it's faster (asymptotically speaking). This could make a big difference if you are dealing with either huge matrices or else a very large number of matrix multiplications. However, just because it's faster asymptotically doesn't make it the most efficient algorithm practically. There are all sorts of implementation considerations such as caching and architecture specific quirks. Also there is also parallelism to consider.
I think your best bet would be to look at the common libraries and see what they are doing. Have a look at BLAS for example. And I think that Matlab uses MAGMA.
If your contention is that you don't think O(n^2.8) is that much faster than O(n^3) this chart shows you that n doesn't need to be very large before that difference becomes significant.

It's very important to stop at the right moment.
With 1,000 x 1,000 matrices, you can multiply them by doing seven 500 x 500 products plus a few additions. That's probably useful. With 500 x 500, maybe. With 10 x 10 matrices, most likely not. You'd just have to do some experiments first at what point to stop.
But Strassen's algorithm only saves a factor 2 (at best) when the number of rows grows by a factor 32, the number of coefficients grows by 1,024, and the total time grows by a factor 16,807 instead of 32,768. In practice, that's a "constant factor". I'd say you gain more by transposing the second matrix first so you can multiply rows by rows, then look carefully at cache sizes, vectorise as much as possible, and distribute over multiple cores that don't step on each others' feet.

Marginal improvement: True, but growing as the matrix sizes grow.
Higher constant factors: Practical implementations of Strassen's algorithm use conventional n^3 for blocks below a particular size, so this doesn't really matter.
Harder to implement: whatever.
As for what's used in practice: First, you have to understand that multiplying two ginormous dense matrices is unusual. Much more often, one or both of them is sparse, or symmetric, or upper triangular, or some other pattern, which means that there are quite a few specialized tools which are essential to the efficient large matrix multiplication toolbox. With that said, for giant dense matrices, Strassen's is The Solution.

Is Fast Fourier transform algorithm appropriate for image gradient computation?

I've a matrix of size mXn and a filter [-1 0 1] on which I need to perform convolution. I'm able to do this in O(n^2) steps, but on further googling fast fourier transform keeps on popping up everywhere. I would like to know if FFT is appropriate for this problem. The matrix has random integers only. But if I were to have floating values, will it make a difference? Is FFT meant for a problem like this?

If your filter has only two nonzero elements, computing the convolution by definition will only take O(n*m) steps (which is the size of your data). FFT isn't gonna help you in that case: a 2D FFT would take something like O(n*m*(log n+log m)).
To sum up: when you have a simple, localized filter, the best way to perform convolution is computing the sum directly. When you need to compute convolutions or correlations with bigger data (think correlation with another image) or perform complex operations, FFT can help you.

Efficient algorithm for finding largest eigenpair of small general complex matrix

I am looking for an efficient algorithm to find the largest eigenpair of a small, general (non-square, non-sparse, non-symmetric), complex matrix, A, of size m x n. By small I mean m and n is typically between 4 and 64 and usually around 16, but with m not equal to n.
This problem is straight forward to solve with the general LAPACK SVD algorithms, i.e. gesvd or gesdd. However, as I am solving millions of these problems and only require the largest eigenpair, I am looking for a more efficient algorithm. Additionally, in my application the eigenvectors will generally be similar for all cases. This lead me to investigate Arnoldi iteration based methods, but I have neither found a good library nor algorithm that applies to my small general complex matrix. Is there an appropriate algorithm and/or library?

Rayleigh iteration has cubic convergence. You may want to implement also the power method and see how they compare, since you need LU or QR decomposition of your matrix.
http://en.wikipedia.org/wiki/Rayleigh_quotient_iteration
Following #rchilton's comment, you can apply this to A* A.

The idea of looking for the largest eigenpair is analogous to finding a large power of the matrix, as the lower frequency modes get damped out during the iteration. The Lanczos algorithm, is one of a few such algorithms that rely on the so-called Ritz eigenvectors during the decomposition. From Wikipedia:
The Lanczos algorithm is an iterative algorithm ... that is an adaptation of power methods to find eigenvalues and eigenvectors of a square matrix or the singular value decomposition of a rectangular matrix. It is particularly useful for finding decompositions of very large sparse matrices. In latent semantic indexing, for instance, matrices relating millions of documents to hundreds of thousands of terms must be reduced to singular-value form.
The technique works even if the system is not sparse, but if it is large and dense it has the advantage that it doesn't all have to be stored in memory at the same time.
How does it work?
The power method for finding the largest eigenvalue of a matrix A can be summarized by noting that if x_{0} is a random vector and x_{n+1}=A x_{n}, then in the large n limit, x_{n} / ||x_{n}|| approaches the normed eigenvector corresponding to the largest eigenvalue.
Non-square matrices?
Noting that your system is not a square matrix, I'm pretty sure that the SVD problem can be decomposed into separate linear algebra problems where the Lanczos algorithm would apply. A good place to ask such questions would be over at https://math.stackexchange.com/.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio