Special case of sparse matrices multiplication - performance

I'm trying to come up with fast algorithm to find result of operation, where
L - is symmetric n x n matrix with real numbers.
A - is sparse n x m matrix, m < n. Each row has one and only one non-zero element, and it's equal to 1. It's also guaranteed that every column has at most two non-zero elements.
I come up with one algorithm, but I feel like there should be something faster than this.
Let's represent every column of A as pair of row numbers with non-zero elements. If a column has only one non-zero element, its row number listed twice. E.g. for the following matrix
Such representation would be
column 0: [0, 2]; column 1: [1, 3]; column 2: [4, 4]
Or we can list it as a single array: A = [0, 2, 1, 3, 4, 4]; Now, can be calculated as:
for (i = 0; i < A.length; i += 2):
if A[i] != A[i + 1]:
# sum of two column vectors, i/2-th column of L'
L'[i/2] = L[A[i]] + L[A[i + 1]]
else:
L'[i/2] = L[A[i]]
To calculate we do it one more time:
for (i = 0; i < A.length; i += 2):
if A[i] != A[i + 1]:
# sum of two row vectors, i/2-th row of L''
L''[i/2] = L'[A[i]] + L'[A[i + 1]]
else:
L''[i/2] = L'[A[i]]
The time complexity of such approach is O(mn + mn), and space complexity (to get final result) is O(nn). I'm wondering if it's possible to improve it to O(mm) in terms of space and/or performance?

The second loop combines at most 2m rows of L', so if m is much smaller than n there will be several rows of L' that are never used.
One way to avoid calculating and storing these unused entries is to change your first loop into a function and only calculate the individual elements of L' as they are needed.
def L'(row,col):
i=col*2
if A[i] != A[i + 1]:
# sum of two column vectors, i/2-th column of L'
return L[row][A[i]] + L[row][A[i + 1]]
else:
return L[row][A[i]]
for (i = 0; i < A.length; i += 2):
if A[i] != A[i + 1]:
for (k=0;k<m;k++):
L''[i/2][k] = L'(A[i],k) + L'(A[i + 1],k)
else:
for (k=0;k<m;k++):
L''[i/2][k] = L'(A[i],k)
This should then have space and complexity O(m*m)

The operation Transpose(A) * L works as follows:
For each column of A we see:
column 1 has `1` in row 1 and 3
column 2 has `1` in row 2 and 4
column 3 has `1` in row 5
The output matrix B = Transpose(A) * L has three rows which are equal to:
Row(B, 1) = Row(A, 1) + Row(A, 3)
Row(B, 2) = Row(A, 2) + Row(A, 4)
Row(B, 3) = Row(A, 5)
If we multiply C = B * A:
Column(C, 1) = Column(B, 1) + Column(B, 3)
Column(C, 2) = Column(B, 2) + Column(B, 4)
Column(C, 3) = Column(B, 5)
If you follow through this in a algorithmic way, you should achieve something very similar to what Peter de Rivaz has suggested.

The time complexity of your algorithm is O(n^2), not O(m*n). The rows and columns of L have length n, and the A array has length 2n.
If a[k] is the column where row k of A has a 1, then you can write:
A[k,i] = δ(a[k],i)
and the product, P = A^T*L*A is:
P[i,j] = Σ(k,l) A^T[i,k]*L[k,l]*A[l,j]
= Σ(k,l) A[k,i]*L[k,l]*A[l,j]
= Σ(k,l) δ(a[k],i)*L[k,l]*δ(a[l],j)
If we turn this around and look at what happens to the elements of L, we see that L[k,l] is added to P[a[k],a[l]], and it's easy to get O(m^2) space complexity using O(n^2) time complexity.
Because a[k] is defined for all k=0..n-1, we know that every element of L must appear somewhere in the product. Because there are O(n^2) distinct elements in L, you can't do better than O(n^2) time complexity.

Related

Counting valid sequences with dynamic programming

I am pretty new to Dynamic Programming, but I am trying to get better. I have an exercise from a book, which asks me the following question (slightly abridged):
You want to construct sequence of length N from numbers from the set {1, 2, 3, 4, 5, 6}. However, you cannot place the number i (i = 1, 2, 3, 4, 5, 6) more than A[i] times consecutively, where A is a given array. Given the sequence length N (1 <= N <= 10^5) and the constraint array A (1 <= A[i] <= 50), how many sequences are possible?
For instance if A = {1, 2, 1, 2, 1, 2} and N = 2, this would mean you can only have one consecutive 1, two consecutive 2's, one consecutive 3, etc. Here, something like "11" is invalid since it has two consecutive 1's, whereas something like "12" or "22" are both valid. It turns out that the actual answer for this case is 33 (there are 36 total two-digit sequences, but "11", "33", and "55" are all invalid, which gives 33).
Somebody told me that one way to solve this problem is to use dynamic programming with three states. More specifically, they say to keep a 3d array dp(i, j, k) with i representing the current position we are at in the sequence, j representing the element put in position i - 1, and k representing the number of times that this element has been repeated in the block. They also told me that for the transitions, we can put in position i every element different from j, and we can only put j in if A[j] > k.
It all makes sense to me in theory, but I've been struggling with implementing this. I have no clue how to begin with the actual implementation other than initializing the matrix dp. Typically, most of the other exercises had some sort of "base case" that were manually set in the matrix, and then a loop was used to fill in the other entries.
I guess I am particularly confused because this is a 3D array.
For a moment let's just not care about the array. Let's implement this recursively. Let dp(i, j, k) be the number of sequences with length i, last element j, and k consecutive occurrences of j at the end of the array.
The question now becomes how do we write the solution of dp(i, j, k) recursively.
Well we know that we are adding a j the kth time, so we have to take each sequence of length i - 1, and has j occurring k - 1 times, and add another j to that sequence. Notice that this is simply dp(i - 1, j, k - 1).
But what if k == 1? If that's the case we can add one occurence of j to every sequence of length i - 1 that doesn't end with j. Essentially we need the sum of all dp(i, x, k), such that A[x] >= k and x != j.
This gives our recurrence relation:
def dp(i, j, k):
# this is the base case, the number of sequences of length 1
# one if k is valid, otherwise zero
if i == 1: return int(k == 1)
if k > 1:
# get all the valid sequences [0...i-1] and add j to them
return dp(i - 1, j, k - 1)
if k == 1:
# get all valid sequences that don't end with j
res = 0
for last in range(len(A)):
if last == j: continue
for n_consec in range(1, A[last] + 1):
res += dp(i - 1, last, n_consec)
return res
We know that our answer will be all valid subsequences of length N, so our final answer is sum(dp(N, j, k) for j in range(len(A)) for k in range(1, A[j] + 1))
Believe it or not this is the basis of dynamic programming. We just broke our main problem down into a set of subproblems. Of course, right now our time is exponential because of the recursion. We have two ways to lower this:
Caching, we can simply keep track of the result of each (i, j, k) and then spit out what we originally computed when it's called again.
Use an array. We can reimplement this idea with bottom-up dp, and have an array dp[i][j][k]. All of our function calls just become array accesses in a for loop. Note that using this method forces us iterate over the array in topological order which may be tricky.
There are 2 kinds of dp approaches: top-down and bottom-up
In bottom up, you fill the terminal cases in dp table and then use for loops to build up from that. Lets consider bottom-up algo to generate Fibonacci sequence. We set dp[0] = 1 and dp[1] = 1 and run a for loop from i = 2 to n.
In top down approach, we start from the "top" view of the problem and go down from there. Consider the recursive function to get n-th Fibonacci number:
def fib(n):
if n <= 1:
return 1
if dp[n] != -1:
return dp[n]
dp[n] = fib(n - 1) + fib(n - 2)
return dp[n]
Here we don't fill the complete table, but only the cases we encounter.
Why I am talking about these 2 types is because when you start learning dp, it is often difficult to come up with bottom-up approaches (like you are trying to). When this happens, first you want to come up with a top-down approach, and then try to get a bottom up solution from that.
So let's create a recursive dp function first:
# let m be size of A
# initialize dp table with all values -1
def solve(i, j, k, n, m):
# first write terminal cases
if k > A[j]:
# this means sequence is invalid. so return 0
return 0
if i >= n:
# this means a valid sequence.
return 1
if dp[i][j][k] != -1:
return dp[i][j][k]
result = 0
for num = 1 to m:
if num == j:
result += solve(i + 1, num, k + 1, n)
else:
result += solve(i + 1, num, 1, n)
dp[i][j][k] = result
return dp[i][j][k]
So we know what terminal cases are. We create a dp table of size dp[n + 1][m][50]. Initialize it with all values 0, not -1.
So we can do bottom-up as:
# initially all values in table are zero. With loop below, we set the valid endings as 1.
# So any state trying to reach valid terminal states will get 1, but invalid states will
# return the values 0
for num = 1 to m:
for occour = 1 to A[num]:
dp[n][num][occour] = 1
# now to build up from bottom, we start by filling n-1 th position
for i = n-1 to 1:
for num = 1 to m:
for occour = 1 to A[num]:
for next_num = 1 to m:
if next_num != num:
dp[i][num][occour] += dp[i + 1][next_num][1]
else:
dp[i][num][occour] += dp[i + 1][num][occour + 1]
The answer will be:
sum = 0
for num = 1 to m:
sum += dp[1][num][1]
I am sure there must be some more elegant dp solution, but I believe this answers your question. Note that I considered that k is the number of times j-th number has been repeated consecutively, correct me if I am wrong with this.
Edit:
With the given constraints the size of the table will be, in the worst case, 10^5 * 6 * 50 = 3e7. This would be > 100MB. It is workable, but can be considered too much space use (I think some kernels doesn't allow that much stack space to a process). One way to reduce it would be to use a hash-map instead of an array with top down approach since top-down doesn't visit all the states. That would be mostly true in this case, for example if A[1] is 2, then all the other states where 1 has occoured more that twice need not be stored. Ofcourse this would not save much space if A[i] has large values, say [50, 50, 50, 50, 50, 50]. Another approach would be to modify our approach a bit. We dont actually need to store the dimension k, i.e. the times j has appeared consecutively:
dp[i][j] = no of ways from i-th position if (i - 1)th position didn't have j and i-th position is j.
Then, we would need to modify our algo to be like:
def solve(i, j):
if i == n:
return 1
if i > n:
return 0
if dp[i][j] != -1
return dp[i][j]
result = 0
# we will first try 1 consecutive j, then 2 consecutive j's then 3 and so on
for count = 1 to A[j]:
for num = 1 to m:
if num != j:
result += solve(i + count, num)
dp[i][j] = result
return dp[i][j]
This approach will reduce our space complexity to O(10^6) ~= 2mb, while time complexity is still the same : O(N * 6 * 50)

Counting Inversions In An Array - Special Case

Inversion Count for an array indicates – how far (or close) the array is from being sorted. If array is already sorted then inversion count is 0. If array is sorted in reverse order that inversion count is the maximum.
Formally speaking, two elements a[i] and a[j] form an inversion if a[i] > a[j] and i < j Example:
The sequence 2, 4, 1, 3, 5 has three inversions (2, 1), (4, 1), (4, 3).
Now, there are various algorithms to solve this in O(n log n).
There is a special case where the array only has 3 types of elements - 1, 2 and 3. Now, is it possible to count the inversions in O(n) ?
Eg 1,1,3,2,3,1,3
Yes it is. Just take 3 integers a,b,c where a is number of 1's encountered till now, b is number of 2's encountered till now and c is number of 3's encountered till now. Given this follow the algorithm below ( I assume numbers are given in array arr and the size is n, with 1 based indexing, also following is just a pseudocode )
no_of_inv = 0
a = 0
b = 0
c = 0
for i from 1 to n:
if arr[i] == 1:
no_of_inv = no_of_inv + b + c
a++
else if arr[i] == 2:
no_of_inv = no_of_inv + c
b++
else:
c++
(This algorithm is extremely similar to Sasha's. I just wanted to provide an explanation as well.)
Every inversion (i, j) satisfies 0 ≤ i < j < n. Let's define S[j] to be the number of inversions of the form (i, j); that is, S[j] is the number of times A[i] > A[j] for 0 ≤ i < j. Then the total number of inversions is T = S[0] + S[1] + … + S[n - 1].
Let C[x][j] be the number of times A[i] > x for 0 ≤ i < j. Then S[j] = C[A[j]][j] for all j. If we can compute the 3n values C[x][j] in linear time, then we can compute S in linear time.
Here is some Python code:
>>> import numpy as np
>>> A = np.array([1, 1, 3, 2, 3, 1, 3])
>>> C = {x: np.cumsum(A > x) for x in np.unique(A)}
>>> T = sum(C[A[j]][j] for j in range(len(A)))
>>> print T
4
This could be made more efficient—although not in asmpytotic terms—by not storing all C values at once. The algorithm really only needs a single pass through the array. I have chosen to present it this way because it is most concise.

Given 2 arrays of non-negative numbers, find the minimum sum of products

Given two arrays A and B, each containing n non-negative numbers, remove a>0 elements from the end of A and b>0 elements from the end of B. Evaluate the cost of such an operation as X*Y where X is the sum of the a elements removed from A and Y the sum of the b elements removed from B. Keep doing this until both arrays are empty. The goal is to minimize the total cost.
Using dynamic programming and the fact that an optimal strategy will always take exactly one element from either A or B I can find an O(n^3) solution. Now I'm curious to know if there is an even faster solution to this problem?
EDIT: Stealing an example from #recursive in the comments:
A = [1,9,1] and B = [1, 9, 1]. Possible to do with a cost of 20. (1) *
(1 + 9) + (9 + 1) * (1)
Here's O(n^2). Let CostA(i, j) be the min cost of eliminating A[1..i], B[1..j] in such a way that the first removal takes only one element from B. Let CostB(i, j) be the min cost of eliminating A[1..i], B[1..j] in such a way that the first removal takes only one element from A. We have mutually recursive recurrences
CostA(i, j) = A[i] * B[j] + min(CostA(i - 1, j),
CostA(i - 1, j - 1),
CostB(i - 1, j - 1))
CostB(i, j) = A[i] * B[j] + min(CostB(i, j - 1),
CostA(i - 1, j - 1),
CostB(i - 1, j - 1))
with base cases
CostA(0, 0) = 0
CostA(>0, 0) = infinity
CostA(0, >0) = infinity
CostB(0, 0) = 0
CostB(>0, 0) = infinity
CostB(0, >0) = infinity.
The answer is min(CostA(n, n), CostB(n, n)).

Sum of products of elements of all subarrays of length k

An array of length n is given. Find the sum of products of elements of the sub-array.
Explanation
Array A = [2, 3, 4] of length 3.
Sub-array of length 2 = [2,3], [3,4], [2,4]
Product of elements in [2, 3] = 6
Product of elements in [3, 4] = 12
Product of elements in [2, 4] = 8
Sum for subarray of length 2 = 6+12+8 = 26
Similarly, for length 3, Sum = 24
As, products can be larger for higher lengths of sub-arrays calculate in modulo 1000000007.
What is an efficient way for finding these sums for subarrays of all possible lengths, i.e., 1, 2, 3, ......, n where n is the length of the array.
There is rather simple way:
Construct product of terms (1 + A[i] * x):
P = (1 + A[0] * x) * (1 + A[1] * x) * (1 + A[2] * x)...*(1 + A[n-1] * x)
If we open the brackets, then we'll get polynomial
P = 1 + B[1] * x + B[2] * x^2 + ... + B[n] * x^n
Kth coefficient, B[k], is equal to the sum of products of sets with length K - for example, B[n] = A[0]*A[1]*A[2]*..A[n-1], B[2] = A[0]*A[1] + A[0]*A[2] + ... + A[n-2]*A[n-1] and so on.
So to find sum of products of all possible sets, we have to find value of polynomial P for x = 1, then subtract 1 to remove leading 0th term. If we don't want to take into consideration single-element sets, then subtract B1 = sum of A[i].
Example:
(1+2)(1+3)(1+4) = 60
60 - 1 = 59
59 - (2 + 3 + 4) = 50 = 24 + 26 - as your example shows
We first create a recursive relation. Let f(n, k) be the sum of all products of sub-arrays of length k from an array a of length n. The base cases are simple:
f(0, k) = 0 for all k
f(n, 0) = 1 for all n
The second rule might seem a little counter-intuitive, but 1 is the zero-element of multiplication.
Now we find a recursive relation for f(n+1, k). We want the product of all subarrays of size k. There are two types of subarrays here: the ones including a[n+1] and the ones not including a[n+1]. The sum of the ones not including a[n+1] is exactly f(n, k). The ones including a[n+1] are exactly all subarrays of length k-1 with a[n+1] added, so their summed product is a[n+1] * f(n, k-1).
This completes our recurrence relation:
f(n, k) = 0 if n = 0
= 1 if k = 0
= f(n-1, k) + a[n] * f(n-1, k-1) otherwise
You can use a neat trick to use very limited memory for your dynamic programming, because function f only depends on two earlier values:
int[] compute(int[] a) {
int N = a.length;
int[] f = int[N];
f[0] = 1;
for (int n = 1; n < N; n++) {
for (int k = n; k >= 1; k--) {
f[k] = (f[k] + a[n] * f[k-1]) % 1000000007;
}
}
return f;
}

Sum of continuous sequences

Given an array A with N elements, I want to find the sum of minimum elements in all the possible contiguous sub-sequences of A. I know if N is small we can look for all possible sub sequences but as N is upto 10^5 what can be best way to find this sum?
Example: Let N=3 and A[1,2,3] then ans is 10 as Possible contiguous sub sequences {(1),(2),(3),(1,2),(1,2,3),(2,3)} so Sum of minimum elements = 1 + 2 + 3 + 1 + 1 + 2 = 10
Let's fix one element(a[i]). We want to know the position of the rightmost element smaller than this one located to the left from i(L). We also need to know the position of the leftmost element smaller than this one located to the right from i(R).
If we know L and R, we should add (i - L) * (R - i) * a[i] to the answer.
It is possible to precompute L and R for all i in linear time using a stack. Pseudo code:
s = new Stack
L = new int[n]
fill(L, -1)
for i <- 0 ... n - 1:
while !s.isEmpty() && s.top().first > a[i]:
s.pop()
if !s.isEmpty():
L[i] = s.top().second
s.push(pair(a[i], i))
We can reverse the array and run the same algorithm to find R.
How to deal with equal elements? Let's assume that a[i] is a pair <a[i], i>. All elements are distinct now.
The time complexity is O(n).
Here is a full pseudo code(I assume that int can hold any integer value here, you should
choose a feasible type to avoid an overflow in a real code. I also assume that all elements are distinct):
int[] getLeftSmallerElementPositions(int[] a):
s = new Stack
L = new int[n]
fill(L, -1)
for i <- 0 ... n - 1:
while !s.isEmpty() && s.top().first > a[i]:
s.pop()
if !s.isEmpty():
L[i] = s.top().second
s.push(pair(a[i], i))
return L
int[] getRightSmallerElementPositions(int[] a):
R = getLeftSmallerElementPositions(reversed(a))
for i <- 0 ... n - 1:
R[i] = n - 1 - R[i]
return reversed(R)
int findSum(int[] a):
L = getLeftSmallerElementPositions(a)
R = getRightSmallerElementPositions(a)
int res = 0
for i <- 0 ... n - 1:
res += (i - L[i]) * (R[i] - i) * a[i]
return res
If the list is sorted, you can consider all subsets for size 1, then 2, then 3, to N. The algorithm is initially somewhat inefficient, but an optimized version is below. Here's some pseudocode.
let A = {1, 2, 3}
let total_sum = 0
for set_size <- 1 to N
total_sum += sum(A[1:N-(set_size-1)])
First, sets with one element:{{1}, {2}, {3}}: sum each of the elements.
Then, sets of two element {{1, 2}, {2, 3}}: sum each element but the last.
Then, sets of three elements {{1, 2, 3}}: sum each element but the last two.
But this algorithm is inefficient. To optimize to O(n), multiply each ith element by N-i and sum (indexing from zero here). The intuition is that the first element is the minimum of N sets, the second element is the minimum of N-1 sets, etc.
I know it's not a python question, but sometimes code helps:
A = [1, 2, 3]
# This is [3, 2, 1]
scale = range(len(A), 0, -1)
# Take the element-wise product of the vectors, and sum
sum(a*b for (a,b) in zip(A, scale))
# Or just use the dot product
np.dot(A, scale)

Resources