Why Gauss Siedel uses less memory than Gauss Elimination - memory-management

I am studying numerical methods from Steven C. Charpa's book. The book says "Gauss-Siedel uses less memory than Gauss-Elimination because it does not stores "0" values in matrix", however the algorithm, written in the book, handle same matrix as Gauss Elimination. I didn't understand how Gauss-Siedel uses less memory. I searched this issue on internet people say same thing but nobody explain how.
Note: I can share algorithm in book, if won't be problem about Copyrights.

The Gauss-Elimination method has to store zeros while computing. This is because in the course of elimination of lower triangular matrix, the zeros can become non-zero values. On the other hand the Gauss-Siedel method, if written to handle sparse matrices, can only operate on non-zero values.
In simple way you can say that Gauss-Siedel method works on one equation at a time, solving for i^{th} variable with non-zero coefficient, therefore it can easily skip the terms with zero coefficient.
Gauss-Elimination works on complete matrix making all the coefficients below the i^{th} coefficient zero, but in the process the coefficients in the upper triangular matrix are changed. I think that there is no easy way of writing Gauss-Elimination method for sparse matrices.

Related

Should one calculate QR decomposition before Least Squares to speed up the process?

I am reading the book "Introduction to linear algebra" by Gilbert Strang. The section is called "Orthonormal Bases and Gram-Schmidt". The author several times emphasised the fact that with orthonormal basis it's very easy and fast to calculate Least Squares solution, since Qᵀ*Q = I, where Q is a design matrix with orthonormal basis. So your equation becomes x̂ = Qᵀb.
And I got the impression that it's a good idea to every time calculate QR decomposition before applying Least Squares. But later I figured out time complexity for QR decomposition and it turned out to be that calculating QR decomposition and after that applying Least Squares is more expensive than regular x̂ = inv(AᵀA)Aᵀb.
Is that right that there is no point in using QR decomposition to speed up Least Squares? Or maybe I got something wrong?
So the only purpose of QR decomposition regarding Least Squares is numerical stability?
There are many ways to do least squares; typically these vary in applicability, accuracy and speed.
Perhaps the Rolls-Royce method is to use SVD. This can be used to solve under-determined (fewer obs than states) and singular systems (where A'*A is not invertible) and is very accurate. It is also the slowest.
QR can only be used to solve non-singular systems (that is we must have A'*A invertible, ie A must be of full rank), and though perhaps not as accurate as SVD is also a good deal faster.
The normal equations ie
compute P = A'*A
solve P*x = A'*b
is the fastest (perhaps by a large margin if P can be computed efficiently, for example if A is sparse) but is also the least accurate. This too can only be used to solve non singular systems.
Inaccuracy should not be taken lightly nor dismissed as some academic fanciness. If you happen to know that the problems ypu will be solving are nicely behaved, then it might well be fine to use an inaccurate method. But otherwise the inaccurate routine might well fail (ie say there is no solution when there is, or worse come up with a totally bogus answer).
I'm a but confused that you seem to be suggesting forming and solving the normal equations after performing the QR decomposition. The usual way to use QR in least squares is, if A is nObs x nStates:
decompose A as A = Q*(R )
(0 )
transform b into b~ = Q'*b
(here R is upper triangular)
solve R * x = b# for x,
(here b# is the first nStates entries of b~)

How are sparse Ax = b systems solved in practice?

Let A be an n x n sparse matrix, represented by a sequence of m tuples of the form (i,j,a) --- with indices i,j (between 0 and n-1) and a being a value a in the underlying field F.
What algorithms are used, in practice, to solve linear systems of equations of the form Ax = b? Please describe them, don't just link somewhere.
Notes:
I'm interested both in exact solutions for finite fields, and in exact and bounded-error solutions for reals or complex numbers using floating-point representation. I suppose exact or bounded-solutions for rational numbers are also interesting.
I'm particularly interested in parallelizable solutions.
A is not fixed, i.e. you don't just get different b's for the same A.
The main two algorithms that I have used and parallelised are the Wiedemann algorithm and the Lanczos algorithm (and their block variants for GF(2) computations), both of which are better than structured gaussian elimination.
The LaMacchia-Odlyzo paper (the one for the Lanczos algorithm) will tell you what you need to know. The algorithms involve repeatedly multiplying your sparse matrix by a sequence of vectors. To do this efficiently, you need to use the right data structure (linked list) to make the matrix-vector multiply time proportional to the number of non-zero values in the matrix (i.e. the sparsity).
Paralellisation of these algorithms is trivial, but optimisation will depend upon the architecture of your system. The parallelisation of the matrix-vector multiply is done by splitting the matrix into blocks of rows (each processor gets one block), each block of rows multiplies by the vector separately. Then you combine the results to get the new vector.
I've done these types of computations extensively. The original authors that broke the RSA-129 factorisation took 6 weeks using structured gaussian elimination on a 16,384 processor MasPar. On the same machine, I worked with Arjen Lenstra (one of the authors) to solve the matrix in 4 days with block Wiedemann and 1 day with block Lanczos. Unfortunately, I never published the result!

Definition of matrix-vector division operator of Julia

I stumbled upon something, which I consider very strange.
As an example consider the code
A = reshape(1:6, 3,2)
A/[1 1]
which gives
3×1 Array{Float64,2}:
2.5
3.5
4.5
As I understand, in general such division gives the weighted average of columns, where each weight is inversely proportional to the corresponding element of the vector.
So my question is, why is it defined such way?
What is the mathematical justification of this definition?
It's the minimum error solution to |A - v*[1 1]|₂ – which, being overconstrained, has no exact solution in general (i.e. value v such that the norm is precisely zero). The behavior of / and \ is heavily overloaded, solving both under and overconstrained systems by a variety of techniques and heuristics. Whether this kind of overloading is a good idea or not is debatable, but it's what people have come to expect from these operations in Matlab and Octave, and it's often quite convenient to have so much functionality available in a single operator.
Let A be an NxN matrix and b be a Nx1 column vector. Then \ solves Ax=b, and / solves xA=b.
As Stefan mentions, this is extended to underdetermined cases as the least squares solution. This is done via the QR or SVD decompositions. See the details on these algorithms to see why this is the case. Hint: the linear form of the OLS estimator can actually be written as the solution to matrix decompositions, so it's the same thing.
Now you might ask, how does it actually solve it? That's a complicated question. Essentially, it uses a matrix factorization. But which matrix factorization is used is dependent on the matrix type. The reason for this is because Gaussian elimination is O(n^3), and so treating the problem generally is usually not good. But whenever you can specialize, you can get speedups. So essentially \ (and /, which transposes and calls \) check for a bunch of special types and pick a factorization or other algorithm (LU, QR, SVD, Cholesky, etc.) based on the matrix type. The flow chart from MATLAB explains this very well. There's a lot of details here, and it gets even more details when the matrix is sparse. Also IterativeSolvers.jl should be mentioned because it's another set of algorithms for solving Ax=b.
Most applied math problems reduce down to linear algebra, with solving Ax=b being one of the most important and difficult problems, which is why there is tons of research on the subject. In fact, you can probably say that the vast majority of the field of numerical linear algebra is devoted to finding fast methods for solving Ax=b on specific matrix types. \ essentially puts all of the direct (non-iterative) methods into one convenient operator.

Fast find of all local maximums in C++

Problem
I have a formula for calculation of 1D polynomial, joint function. I want to find all local maximums of that function within a given range.
My approach
My current solution is that i evaluate my function in a certain number of points from the range and then I go through these points and remember points where function changed from rising to decline. Of cause I can change number of samples within the interval, but I want to find all maximums with as lowest number of samples as possible.
Question
Can you suggest any effetive algorithm to me?
Finding all the maxima of an unknown function is hard. You can never be sure that a maximum you found is really just one maximum or that you have not overlooked a maximum somewhere.
However, if something is known about the function, you can try to exploit that. The simplest one is, of course, is if the function is known to be rational and bounded in grade. Up to a rational function of grade five it is possible to derive all four extrema from a closed formula, see http://en.wikipedia.org/wiki/Quartic_equation#General_formula_for_roots for details. Most likely, you don't want to implement that, but for linear, square, and cubic roots, the closed formula is feasible and can be used to find maxima of a quartic function.
That is only the most simple information that might be known, other interesting information is whether you can give a bound to the second derivative. This would allow you to reduce the sampling density when you find a strong slope.
You may also be able to exploit information from how you intend to use the maxima you found. It can give you clues about how much precision you need. Is it sufficient to know that a point is near a maximum? Or that a point is flat? Is it really a problem if a saddle point is classified as a maximum? Or if a maximum right next to a turning point is overlooked? And how much is the allowable error margin?
If you cannot exploit information like this, you are thrown back to sampling your function in small steps and hoping you don't make too much of an error.
Edit:
You mention in the comments that your function is in fact a kernel density estimation. This gives you at least the following information:
Unless the kernel is not limited in extend, your estimated function will be a piecewise function: Any point on it will only be influenced by a precisely calculable number of measurement points.
If the kernel is based on a rational function, the resulting estimated function will be piecewise rational. And it will be of the same grade as the kernel!
If the kernel is the uniform kernel, your estimated function will be a step function.
This case needs special handling because there won't be any maxima in the mathematical sense. However, it also makes your job really easy.
If the kernel is the triangular kernel, your estimated function will be a piecewise linear function.
If the kernel is the Epanechnikov kernel, your estimated function will be a piecewise quadratic function.
In all these cases it is next to trivial to produce the piecewise functions and to find their maxima.
If the kernel is of too high grade or transcendental, you still know the measurements that your estimation is based on, and you know the kernel properties. This allows you to derive a heuristic on how dense your maxima can get.
At the very least, you know the first and second derivative of the kernel.
In principle, this allows you to calculate the first and second derivative of the estimated function at any point.
In the case of a local kernel, it might be more prudent to calculate the first derivative and an upper bound to the second derivative of the estimated function at any point.
With this information, it should be possible to constrain the search to the regions where there are maxima and avoid oversampling of the slopes.
As you see, there is a lot of useful information that you can derive from the knowledge of your function, and which you can use to your advantage.
The local maxima are among the roots of the first derivative. To isolate those roots in your working interval you can use the Sturm theorem, and proceed by dichotomy. In theory (using exact arithmetic) it gives you all real roots.
An equivalent approach is to express your polynomial in the Bezier/Bernstein basis and look for changes of signs of the coefficients (hull property). Dichotomic search can be efficiently implemented by recursive subdivision of the Bezier.
There are several classical algorithms available for polynomials, such as Laguerre, that usually look for the complex roots as well.

Fourier motzkin elimination

I have implemented fm-elim in c using matrix.
I am wondering if following modification to the original algorithm is allowed or not?
In the original version of the algorithm one takes a row with positive coeff. in front of xr and subtracts it with the one with the negative coeff to create new equations. Matrix can grow in size.
see page 32-33 http://fileadmin.cs.lth.se/cs/Education/EDAF15/F07.pdf
But is it allowed to choose one equation with negative coeff and use gauss elim to eliminate the xr.
I tried solving some small system and it seems to give me correct answer but I don't know if this method is correct or not.
With this method my matrix won't grow in size. This way I will be doing ordinary gauss elimination.

Resources