Representation of binary matrix with space complexity of O(n) - algorithm

I have nxn binary matrices (i.e. a matrices whose elements are 0 or 1). Using a two dimensional array (that is, storing the value of each element) have a space complexity of O(n^2).
Is there any way to store them in a way such that the space complexity is O(n)? All operations like summation, subtraction, etc. is welcome.
The matrices are not sparse so using list of non-zero elements is out of question.

No, you can not store an n x n binary matrix in O(n) space.
The proof is just pigeonhole principle.
Suppose you devise a way to store an arbitrary n x n binary matrix.
There are 2n x n possible binary matrices of such size.
If you use k bits for the storage, there would be 2k possible contents of your storage.
Now, if k < n x n, we have 2k < 2n x n, and by pigeonhole principle, there exist two different matrices (say, A and B) which are stored the same way (say, X is stored).
So, when you have that X stored, you can not say whether the matrix you actually intended to store was A or B (or maybe some other matrix).
Thus you cannot uniquely decode your storage back into the form of the stored matrix, which destroys the whole purpose of storing it.

First proof: A n*n bit matrix has n*n states. However with a n-bit string you can only store n states. So unless n>=n*n (e.g. n=1), there is no way to encode n*n bits in an n bit sequence.
Second proof, less abstract but also less complete:
Imagine you have a 16*16 matrix with 256 bits, and somehow manage to store this in 16 bits.
Now, of course, your could take those 16 bits and store them in a 4x4 matrix, using your algorithm, resulting in 4bits. Now your store the 4 bits in 2x2 matrix and compress them in 2 bits.
--> Essentially, such an algorithm would be able to compress any imaginable amount of data in just 2 bits. While this is not an actual proof, it is still quite obvious that such an algorithm cannot exist.

I don't think it can guarantee you O(n) space, but you can look for a compression algorithm called LZW (Lempel-Ziv-Welch).
It's quite simple to code and it's easy to understand why and how it works, and it should work very well for binary arrays, and the biggest your matrix is, the best the compression rate will be.
Anyways, if you know some information about the matrix, you can try to represent it in an array somehow you can restore, for example:
if you matrix is 32x32 dimension, you can get any row of it and represent as a single int, so a whole row will become a single number and you may have your O(n)


Distribution of pairwise distances between many integers

We have M unique integers between 1 and N. In real life, N is a few millions, and M is between N/10 and N/3. I need to compute a distribution of pairwise distances between the M integers.
The brute-force complexity of the problem is M^2, but the output is just N numbers. So the natural question is whether there is a faster algorithm. Even an algorithm as fast as N * sqrt(M) should be sufficient for our purposes.
The problem appeared as a subset of the following problem. We have a large virtual square symmetric matrix, few million by few million elements. Some rows and columns of the matrix are masked out. We need to find how many masked-out elements are in each diagonal of the matrix. One can easily calculate how many masked-out bins intersect each diagonal. But often a masked-out row and column would intersect right on the diagonal, thus masking out only one bin. To avoid double-counting these, we need pairwise distribution of distances between masked-out columns.
You can do this in O(NlogN) using the Fourier transform.
The idea is that you first compute a histogram H(x) of your M integers where H(x) is the number of times the value x appears in your input (which will be either 0 or 1 if all M are distinct - but this is not essential).
Then what you want to compute is A(d), where A(d) defined as the number of pairs of integers that are exactly d apart.
This can be computed as A(d) = sum(H(x)*H(x+d) for all x)
This type of function is called a convolution and can be efficiently computed by taking the Fourier transform, multiplying the output by itself, and then computing the inverse transform. Care needs to be taken to pad appropriately, for example see this question.
If you use Python, this is particularly easy as you can call scipy.signal.fftconvolve to do this operation.

Adding square matrices in O(n) time?

Say we have two square matrices of the same size n, named A and B.
A and B share the property that each entry in their main diagonal diagonals is the same value (i.e., A[0,0] = A[1,1] = A[2,2] ... = A[n,n] and B[0,0] = B[1,1] = B[2,2] ... = B[n,n]).
Is there a way to represent A and B so that they can be added to each other in O(n) time, rather than O(n^2)?
In general: No.
For an nxn matrix, there are n^2 output values to populate; that takes O(n^2) time.
In your case: No.
Even if O(n) of the input/output values are dependent, that leaves O(n^2) that are independent. So there is no representation that can reduce the overall runtime below O(n^2).
In order to reduce the runtime, it is necessary (but not necessarily sufficient) to increase the number of dependent values to O(n^2). Obviously, whether or not this is possible is dictated by the particular scenario...
To complement Oli Cherlesworth answer, I'd like to point out that in the specific case of sparse matrices, you can often obtain a runtime of O(n).
For instance, if you happen to know that your matrices are diagonal, you also know that the resulting matrix will be diagonal, and hence you only need to compute n values.
Similarly, there are band matrices that can be added in O(n), as well as more "random" sparse matrices. In general, in a sparse matrix, the number of non-zero elements per row is more or less constant (you obtain these elements from a finite element computation for example, or from graph adjacency matrices etc.), and as such, using an appropriate representation such as "Compressed row storage" or "Compressed column storage", you will end up using O(n) operations to add your two matrices.
Also a special mention for sublinear randomized algorithms, that only propose you to know the final value that is "not-too-far" from the real solution, up to random errors.

How do I fill a 2D array with a constant value, with a better efficiency than n^2?

This is a general question, which could be applicable to any given language like C,C++,Java etc.
I figured any way you implement it, you can't get more efficient than using 2 loops, which gives an efficiency of n^2.
I was asked this at an interview recently, and couldn't think of anything more efficient. All I got from the interviewer was that I could use recursion or convert the 2D array to a linked list to make it more efficient than n^2. Anyone know if this is possible, and if yes, how? At least theoretically, if not practically.
edit: The actual question gives me the coordinates of two cells, and I have to fill the paths taken by all possible shortest routes with 1.
eg, if i have a 5x5 matrix, and my two coordinates are (2,0) and (3,3), I'd have to fill:
while leaving the rest of the cells as they were.
It depends on what you mean. If the question is about plain arrays, meaning a sequence of contiguos memory locations and for initialization you mean putting a value in every memory location of this "matrix" then the answer is no, better than O(n*m) is not possible and we can prove it:
Let us assume that algorithm fill(A[n][m], init_val) is correct(i.e. fills all the memory locations of A) has complexity g(n,m) which is less than O(n*m)(meaning g(n,m) is not part of Ω(n*m)), then for big enough n and m we will have that g(n,m) < n*m = number of memory locations. Since filling a memory location requires one operation the algorithm fill can fill at most g(n,m) locations[actually half because it must also do at least an operation to "select" a different memory location, except if the hardware provides a combined operation] which is strictly less than n*m, which imply that the algorithm fill is not correct.
The same applies if filling k memory locations takes constant time, you simply have to choose bigger n and m values.
As other already suggested you can use other data-structures to avoid the O(n^2) initialization time. amit suggestion uses some kind of lazy-evaluation, which allows you to not initialize the array at all but do it only when you access the elements.
Note that this removes the Ω(n^2) cost at the beginning, but requires more complex operations to access the array's elements and also requires more memory.
It is not clear what your interviewer meant: converting an array into a linked-list requires Ω(L) time(where L is the length of the array), so simply converting the whole matrix into a linked-list would require Ω(n^2) time plus the real initialization. Using recursion does not help at all,
you simply end up in recurrences such as T(n) = 2T(n/2) + O(1) which would again result in no benefit for the asymptotic complexity.
As a general rule all algorithms have to scan at least all of their input, except it they have some form of knowledge beforehand(e.g. elements are sorted). In your case the space to scan is Θ(n^2) and thus every algorithm that wants to fill it must be at least Ω(n^2). Anything with less than this complexity either make some assumption(e.g. the memory contains the initializer value by default -> O(1)), or solves a different problem(e.g. use lazy arrays, or other data structures).
You can initialize an array in O(1), but it consumes triple the amount of space, and extra "work" for each element access in the matrix.
Since in practice, a matrix is a 1D array in memory, the same principles still hold.
The page describes how it can be done in details.
When you fill a 2d-array with same element, if you really will fill every element at least n^2 operations should be made.(given 2-d array is n*n).
The only way to decrease complexity is use a parallel programming approach.For example, given n processor, first input is is assigned the first row of the array.This is n operations. Then each processor Pi assigns array[i] of row k to array[i] of row k+1 for k=0 to n-1. This will be again O(n) since we have n processor working parallel.
If you really want to implement this approach you can look for free parallel programming environments like OpenMPI and mpich

Reference for lowest order complexity of sparse symmetric matrix premultiplying full vector

In a paper I'm writing I make use of an n x n matrix multiplying a dense vector of dimension n. In its natural form, this matrix has O(n^2) space complexity and the multiplication takes time O(n^2).
However, it is known that the matrix is symmetric, and has zero values along its diagonal. The matrix is also highly sparse: the majority of non-diagonal entries are zero.
Could anyone link me to an algorithm/paper/data structure which uses a sparse symmetric matrix representation to approach O(nlogn) or maybe even O(n), in cases of high sparsity?
I would have a look at the csparse library by Tim Davis. There's also a corresponding book that describes a whole range of sparse matrix algorithms.
In the sparse case the A*x operation can be made to run in O(|A|) complexity - i.e. linear in the number of non-zero elements in the matrix.
Are you interested in parallel algorithms of this sort

What is the complexity of matrix addition?

I have found some mentions in another question of matrix addition being a quadratic operation. But I think it is linear.
If I double the size of a matrix, I need to calculate double the additions, not quadruple.
The main diverging point seems to be what is the size of the problem. To me, it's the number of elements in the matrix. Others think it is the number of columns or lines, hence the O(n^2) complexity.
Another problem I have with seeing it as a quadratic operation is that that means adding 3-dimensional matrices is cubic, and adding 4-dimensional matrices is O(n^4), etc, even though all of these problems can be reduced to the problem of adding two vectors, which has an obviously linear solution.
Am I right or wrong? If wrong, why?
As you already noted, it depends on your definition of the problem size: is it the total number of elements, or the width/height of the matrix. Which ever is correct actually depends on the larger problem of which the matrix addition is part of.
NB: on some hardware (GPU, vector machines, etc) the addition might run faster than expected (even though complexity is still the same, see discussion below), because the hardware can perform multiple additions in one step. For a bounded problem size (like n < 3) it might even be one step.
It's O(M*N) for a 2-dimensional matrix with M rows and N columns.
Or you can say it's O(L) where L is the total number of elements.
Usually the problem is defined using square matrices "of size N", meaning NxN. By that definition, matrix addition is an O(N^2) since you must visit each of the NxN elements exactly once.
By that same definition, matrix multiplication (using square NxN matrices) is O(N^3) because you need to visit N elements in each of the source matrices to compute each of the NxN elements in the product matrix.
Generally, all matrix operations have a lower bound of O(N^2) simply because you must visit each element at least once to compute anything involving the whole matrix.
think of the general case implementation:
for 1 : n
for 1 : m
c[i][j] = a[i][j] + b[i][j]
if we take the simple square matrix, that is n x n additions
