Efficient Algorithms for Computing a matrix times its transpose [closed] - algorithm

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
For a class, a question that was posed by my teacher was the algorithmic cost of multiplying a matrix times its transpose. With the standard 3 loop matrix multiplication algorithm, the efficiency is O(N^3), and I wonder if there was a way to manipulate or take advantage of matrix * matrix transpose to get a faster algorithm. I understand that when you multiply a matrix by its transpose you have to calculate less of the matrix because its symmetrical, but I can't think of how to manipulate an algorithm that could take less than O(n^3).
i know there's algorithms like Coppensmith and Straussen that are faster general matrix multiplication algorithms but could anyone give any hints or insights on how to computationally take advantage of the transpose?
Thanks

As of right now there aren't any aymptotic barrier-breaking properties of this particular multiplication.
The obvious optimization is to take advantage of the symmetry of the product. That is to say, the [i][j]th entry is equal to the [j][i]th entry.
For implementation-specific optimizations, there is a significant amount of caching that you can do. A very significant amount of time in the multiplication of large matrices is spent transferring data to and from memory and CPU. So CPU designers implemented a smart caching system whereby recently used memory is stored in a small memory section called the cache. In addition to that, they also made it so that nearby memory is also cached. This is because a lot of the memory IO is due to reading/writing from/to arrays, which are stored sequentially.
Since the transpose of a matrix is simply the same matrix with the indices swapped, caching a value in the matrix can have over twice the impact.

Related

asymptotic bounding and control structures [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
So far in my learning of algorithms, I have assumed that asymptotic boundings are directly related to patterns in control structures.
So if we have n^2 time complexity, I was thinking that this automatically means that I have to use nested loops. But I see that this is not always correct (and for other time complexities, not just quadratic).
How to approach this relationship between time complexity and control structure?
Thank you
Rice's theorem is a significant obstacle to making general statements about analyzing running time.
In practice there's a repertoire of techniques that get applied. A lot of algorithms have nested loop structure that's easy to analyze. When the bounds of one of those loops is data dependent, you might need to do an amortized analysis. Divide and conquer algorithms can often be analyzed with the Master Theorem or Akra–Bazzi.
In some cases, though, the running time analysis can be very subtle. Take union-find, for example: getting the inverse Ackermann running time bound requires pages of proof. And then for things like the Collatz conjecture we have no idea how to even get a finite bound.

Practical uses of sparse matrices [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Lately I've been working with matrices and I started learning about sparse matrices but I don't understand why they even exist. What are some practical uses of sparse matrices? If all they mainly hold are 0's do they have any useful functions?
Sparse matrices are generally just matrices with a lot of zero-entries (typically at least > 50%).
They can be represented in a very concise way, (wiki) which can be used to do matrix operations (e.g. multiplication, transpose of a matrix, ...) in a fast and efficient way. Google Maps and other applications would be impossible without efficient sparse matrix algorithms.
If you want to dig deeper, I recommend this website & professor, who developed some of those algorithms. Apparently, it's an ongoing research topic of high interest.
I feel you're asking the question the other way around.
Sparse matrices are matrices with a high density of zeros.
Sparse matrices are very frequently encountered in many fields of scientific computing, and in veeery big sizes. It is not if they're useful, they just exist.
Now, the interesting and useful thing is how we represent them.
We tend to compress the sparse matrices to only take into account the non zero values and keep track of their location in the matrix.
This is useful ion a sense that it saves a lot of storage and the matrix operations are restricted to its non zero values.
A field where sparse matrices are very common is solving discrete partial differential equations.
In a nutshell: you have a grid of voxels and the discretized differential equation states a relation between the quantity you are solving only for neighbouring voxels. The resulting matrix equation you need to solve typically has a band structure. In 2D the structure is even simpler and you have only values on the diagonal and one off the diagonal, all the other coefficients are zero.
Consider a 100k X 100k matrix.
If you want to store all of that using double precision in a 2D array you'll need 80Gbytes not counting any memory management overhead. If many of the elements are zero you can save a lot of memory.
For instance, if you have a regular sparse matrix such as a tridiagonal this can be stored in a 3 X 100k array with 2.4Mbytes. Implicitly all elements off the tridiagonal are zero.
As another example, in electronic circuit analysis matrices are used to solve a system of PDEs using the Newton-Raphson method. Since most circuit elements are connected to very few other ones the resulting matrix is sparse. This can be represented with a 2D linked list of non-zero elements. Again anything that isn't in the linked list is implicitly zero.

How can we compare excution time in theory and in pratice [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
So from my understading, we can only evaluate algorithm with asymptotic analysis but when executing a algorithm, it can only return amount of time.
My question is how can we compare those two ?
They are comparable but not in the way you want.
If you have an implementation that you evaluate asymptotically to be say O(N^2) and you measure to run in 60 seconds for an input of N=1000, then if you change the input to N=2000 I would expect the run-time to be on the order of 60*(2^2) 4 minutes (I increase the input by a factor of two, the runtime increases by a factor of 2 square).
Now, if you have another algorithm also O(N^2) you can observe it to run for N=1000 in 10 seconds (the compiler creates faster instructions, or the CPU is better). Now when you move to N=2000 the runtime I would expect to be around 40 seconds (same logic). If you actually measure it you might still see some differences from the expected value because of system load or optimizations but they become less significant as N grows.
So you can't really say which algorithm will be faster based on asymptotic complexity alone. The asymptotic complexity guarantees that there will be an input sufficiently large where the lower complexity is going to be faster, but there's no promise what "sufficiently large" means.
Another example is search. You can do linear search O(N) or binary search O(logN). If your input is small (<128 ints) the compiler and processor make linear search faster than binary search. However grow N to say 1 million items and the binary search will be much faster than linear.
As a rule, for large inputs optimize complexity first and for small inputs optimize run-time first. As always if you care about performance do benchmarks.

Other resources beyond time and space in computational complexity [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
In general, in computational complexity, we talk about time and space complexity. That is, we think about how much time or space that is necessary for solving some problem.
I would like to know if there is another kind of resource (beyond time and space) that we could use a reference for discussing computacional complexity.
People have considered the number of references to external memory (https://www.ittc.ku.edu/~jsv/Papers/Vit.IO_book.pdf) and the use of cache memory (https://en.wikipedia.org/wiki/Cache-oblivious_algorithm). Where the computations is split between two or more nodes, the complexity of communication between those nodes is of interest (https://en.wikipedia.org/wiki/Communication_complexity) and there are some neat proofs around here.
There are also links between these measures. Most obviously, using almost any resource takes time, so anything that takes no more than T units of time is likely to take no more than O(T) units of any other resource. There is a paper "An Overview of the Theory of Computational Complexity" by Hartmanis and Hopcroft, which puts computational complexity on a firm mathematical footing. This defines a very general notion of computational complexity measures and (Theorem 4) proves that (their summary) "a function which is "easy" to compute in one measure is "easy" to compute in other measures". However this result (like most of the rest of the paper) is in mathematically abstract terms which don't necessarily have any practical consequence in the real world. The connection between the two complexities used here is loose enough that it is entirely possible that polynomial complexity in one measure could be exponential complexity (or worse) in the other measure.

Algorithm to Sort Many Arrays with Potentially Similar Features [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
In usual circumstances, sorting arrays of ~1000s of simple items like integer or floats is sufficiently fast that the small differences between implementations just doesn't matter.
But what if you need to sort N modest sized arrays that have been generated by some similar process or simply have have some relatedness?
I leave the specifics of what of the mysterious array generator and relationships of the generated arrays intentionally vague. It is up to any applicable algorithms to specify a large as possible domain where they will work when they will be most useful.
EDIT: Let's narrow this by letting the arrays be independent samples. There exists an unchanging probability distribution of arrays that will be generated. Implicitly then there's a stable probability distribution of elements in the arrays but it's conditonal -- the elements within an array might not be independent. It seems like it'd be extremely hard to make use of relationships between elements within the arrays but I could be wrong. We can narrow further if needed by letting the elements in the arrays be independent. In that case the problem is to effectively learn and use the probability distribution of elements in the arrays.
Here is a paper on a self improving sorting algorithm. I am pretty strong with algorithms and machine learning, but this paper is definitely not an easy read for me.
The abstract says this
We investigate ways in which an algorithm can improve
its expected performance by fine-tuning itself automatically with respect to an arbitrary, unknown input distribution. We give such self-improving algorithms for
sorting and clustering. The highlights of this work:
a sorting algorithm with optimal expected limiting running time ...
In all cases, the algorithm begins with a learning phase
during which it adjusts itself to the input distribution
(typically in a logarithmic number of rounds), followed
by a stationary regime in which the algorithm settles to
its optimized incarnation.

Resources