How to implement a cumulative product table? - algorithm

Given the following problem:
There is a sequence of k integers, named s for which there can be 2 operations,
1) Sum[i,j] -
What is the value of s[i]+s[i+1]+...+s[j]?
2) Update[i,val] -
Change the value of s[i] to val.
I am sure most people here have heard of using a cumulative frequency table/fenwick tree to optimize the complexity.
Now, if I don't want to query the sum but instead I want to perform the following:
Product[i,j] -
What is the value of s[i] * s[i+1] * ... * s[j]?
The new problem seems trivial at first, at least for the first operation Product[i,j].
Assuming I am using a cummulative product table named f:
At first thought, when we call Update[i,val], we should divide the cummulative products at f[z] for z from i -> j by the old value of s[i] then multiply by the new value.
But we will face 2 issues if the old value of s[i] is 0:
Division by 0. But this is easily tackled by checking if the old value of s[i] is 0.
The product of any real number with 0 is 0. This result will cause all other values from f[i] to f[j] to be 0. So we are unable to successfully perform Update[i,val]. This problem is not so trivial as it affects other values besides f[i].
Does anyone have any ideas how I could implement a cummulative product table that supports the 2 operations mentioned above?

Maintain 2 tables:
A cumulative product table, in which all zero entries have been stored as ones instead (to avoid affecting other entries).
A cumulative sum storing the number of zero entries. Each entry s[i] is 1 if f[i] is 0 and 0 if non-zero.
To compute the cumulative product, first calculate the cumulative sum of zero entries in the given range. If non-zero (i.e. there is 1 or more zero in the range) then the cumulative product is zero. If zero then calculate the cumulative product as you describe.
It might be more accurate to store your factors as logarithms in some base and compute the cumulative product as a sum of log values. You'd just be computing 2 cumulative sums. In that case you would need to store zero entries in the product table as log values of 0 (i.e. values of 1).
Here's an example, using a simple cumulative sum (not Fenwick trees, but you could easily use them instead):
row f cum_f isZero cum_isZero log(f) cum_log(f)
-1 1 1 0 0 0 0
0 3 3 0 0 0.477 0.477
1 0 3 1 1 -inf 0.477
2 4 12 0 1 0.602 1.079
3 2 24 0 1 0.301 1.38
4 3 72 0 1 0.477 1.857
row is the index, f is the factor, cum_f is the cumulative product of f treating zeros as if they were ones, isZero is a flag to indicate if f is zero, cum_isZero is the cumulative sum of the isZero flags, log(f) is the log of f in base 10, cum_log(f) is the cumulative sum of log_f, treating -inf as zero.
To calculate the sum or product of a range from row i to row j (inclusive), subtract row[i-1] from row[j], using row -1 as a "virtual" row.
To calculate the cumulative product of f in rows 0-2, first find the cumulative sum of isZero: cum_isZero[2] - cum_isZero[-1] = 1 - 0 = 1. That's non-zero, so the cumulative product is 0
To calculate the cumulative product of f in rows 2-4, do as above: cum_isZero[4] - cum_isZero[1] = 0 - 0 = 0. That's zero, so we can calculate the product.
Using cum_f: cum_f[4] / cum_f[1] = 72 / 3 = 24 = 4 x 2 x 3
Using cum_log_f: cum_log(f)[4] - cum_log(f)[1] = 1.857 - 0.477 = 1.38
101.38 = approx 24

Related

How to optimize search of rows x columns combination in a matrix?

Given a matrix of 1's and 0's, I want to find a combination of rows and columns with least or none 0's, maximizing the n_of_rows * n_of_columns picked.
For example, rows (0,1,2) and columns (0,1,3) have only one zero in col #0 row #1, and the rest 8 values are 1's.
1 1 0 1 0
0 1 1 1 0
1 1 0 1 1
0 0 1 0 0
Pracical task is to search over 1000's to 1000000's of rows and columns, finding the maximal biclique in a bipartite graph – rows and cols can be viewed as verticles, and values as connections.
The problem in NP-complete, as far as I learned.
Please advice an approach / algorithm that would speed up the task and reduce requirements to CPU and memory.
Not sure you could minimise thism
However, easy way to work this out would be...
Multiple your matrix by a 1 column and n rows full of 1's. This will give you number of ones in each row. Next do a 1 row by n columns multiplcation (at frot of) your matrix full of 1's. This will give you totals of 1's for each column, From there it's a pretty easy compairson........
ie original matrix...
1 0 1
0 1 1
0 0 0
do
1 0 1 x 1 = 2 (row totals)
o 1 1 1 2
0 0 0 1 0
do
1 1 1 x 1 0 1 = 1 (Column totals)
0 1 1 2
0 0 0 0
nb max sum is 2 (which you would keep track of as you work it out.
Actually given the following assumptions:
1. You don't care how many 0's are in each row or column
2. You don't need to keep track of their order....
Then you only really need to store values to count the total in each row/column as you read the values in and don't actually store the matrix itself.
If you are given the number of rows and columns prior to reading in the matrix you can do the following heuristics to reduce computational time...
Keep track of the current max. If the current row cannot reach this potential max stop counting for the row (but continue in the columns). Vice versa is true for the columns
But you still have a worst case scenario in which all rows and columns have sme number of 1's and 0's.... :)

Calculate index for number combinations

I have a vector that includes a value for every possible combination of two numbers out of a bigger group of n numbers (from 0 to (n-1)), excluding combinations where both numbers are the same.
For instance, if n = 4, combinations will be the ones shown in columns number1 and number2.
number1 number2 vector-index value
0 1 0 3
0 2 1 98
0 3 2 0
1 0 3 44
1 2 4 6
1 3 5 3
2 0 6 2
2 1 7 43
2 3 8 23
3 0 9 11
3 1 10 54
3 2 11 7
There are always n*(n-1) combinations and therefore that is the number of elements in the vector (12 elements in the example above).
Problem
In order to access the values in the vector I need a expression that allows me to figure out the corresponding index number for every combination.
If combinations where number1=number2 were included, the index number could be figured our using:
index = number1*(n-1)+number2
This question is related but includes also combinations where number1=number2.
Is there any expression to calculate the index in this case?
First, notice that all the pairs can be grouped into blocks of size (n-1), where n is the number of different indices. This means that given a pair (i, j), the index of the block containing it will be i(n-1). Within that block the indices are laid out sequentially, skipping over index i. If j < i, then we just look j steps past the start of the block. Otherwise, we look j-1 steps past it. Overall this gives the formula
int index = i * (n - 1) + (j < i? j : j - 1);
Note that the only difference is when number2 is greater than number1, when this happens a value from number2 sequence was skipped, so you will need to decrease the count, something like this:
index = number1 * (n - 1) + number2 - (number2 > number1 ? 1 : 0)

Why does this maximum product subarray algorithm work?

The problem is to find the contiguous subarray within an array (containing at least one number) which has the largest product.
For example, given the array [2,3,-2,4],
the contiguous subarray [2,3] has the largest product 6.
Why does the following work? Can anyone provide any insight on how to prove its correctness?
if(nums == null || nums.Length == 0)
{
throw new ArgumentException("Invalid input");
}
int max = nums[0];
int min = nums[0];
int result = nums[0];
for(int i = 1; i < nums.Length; i++)
{
int prev_max = max;
int prev_min = min;
max = Math.Max(nums[i],Math.Max(prev_max*nums[i], prev_min*nums[i]));
min = Math.Min(nums[i],Math.Min(prev_max*nums[i], prev_min*nums[i]));
result = Math.Max(result, max);
}
return result;
Start from the logic-side to understand how to solve the problem. There are two relevant traits for each subarray to consider:
If it contains a 0, the product of the subarray is aswell 0.
If the subarray contains an odd number of negative values, it's total value is negative aswell, otherwise positive (or 0, considering 0 as a positive value).
Now we can start off with the algorithm itself:
Rule 1: zeros
Since a 0 zeros out the product of the subarray, the subarray of the solution mustn't contain a 0, unless only negative values and 0 are contained in the input. This can be achieved pretty simple, since max and min are both reset to 0, as soon as a 0 is encountered in the array:
max = Math.Max(0 , Math.Max(prev_max * 0 , prev_min * 0));
min = Math.Min(0 , Math.Min(prev_max * 0 , prev_min * 0));
Will logically evaluate to 0, no matter what the so far input is.
arr: 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 0
result: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
min: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
max: 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 0
//non-zero values don't matter for Rule 1, so I just used 1
Rule 2: negative numbers
With Rule 1, we've already implicitly splitted the array into subarrays, such that a subarray consists of either a single 0, or multiple non-zero values. Now the task is to find the largest possible product inside that subarray (I'll refer to that as array from here on).
If the number of negative values in the array is even, the entire problem becomes pretty trivial: just multiply all values in the array and the result is the maximum-product of the array. For an odd number of negative values there are two possible cases:
The array contains only a single negative value: In that case either the subarray with all values with smaller index than the negative value or the subarray with all values with larger index than the negative value becomes the subarray with the maximum-value
The array contains at least 3 negative values: In that case we have to eliminate either the first negative number and all of it's predecessors, or the last negative number and all of it's successors.
Now let's have a look at the code:
max = Math.Max(nums[i] , Math.Max(prev_max * nums[i] , prev_min * nums[i]));
min = Math.Min(nums[i] , Math.Min(prev_max * nums[i] , prev_min * nums[i]));
Case 1: the evaluation of min is actually irrelevant, since the sign of the product of the array will only flip once, for the negative value. As soon as the negative number is encountered (= nums[i]), max will be nums[i], since both max and min are at least 1 and thus multiplication with nums[i] results in a number <= nums[i]. And for the first number after the negative number nums[i + 1], max will be nums[i + 1] again. Since the so far found maximum is made persistent in result (result = Math.Max(result, max);) after each step, this will automatically result in the correct result for that array.
arr: 2 3 2 -4 4 5
result: 2 6 12 12 12 20
max: 2 6 12 -4 4 20
//Omitted min, since it's irrelevant here.
Case 2: Here min becomes relevant too. Before we encounter the first negative value, min is the smallest number encountered so far in the array. After we encounter the first positive element in the array, the value turns negative. We continue to build both products (min and max) and swap them each time a negative value is encountered and keep updating result. When the last negative value of the array is encountered, result will hold the value of the subarray that eliminates the last negative value and it's successor. After the last negative value, max will be the product of the subarray that eliminates the first negative value and it's predecessors and min becomes irrelevant. Now we simply continue to multiply max with the remaining values in the array and update result until the end of the array is reached.
arr: 2 3 -4 3 -2 5 -6 3
result: 2 6 6 6 144 770 770 770
min: 2 6 -24 -72 -6 -30 -4620 ...
max: 2 6 -4 3 144 770 180 540
//min becomes irrelevant after the last negative value
Putting the pieces together
Since min and max are reset every time we encounter a 0, we can easily reuse them for each subarray that doesn't contain a 0. Thus Rule 1 is applied implicitly without interfering with Rule 2. Since result isn't reset each time a new subarray is inspected, the value will be kept persistent over all runs. Thus this algorithm works.
Hope this is understandable (To be honest, I doubt it and will try to improve the answer, if any questions appear). Sry for that monstrous answer.
Lets take assume the contiguous subarray, which produces the maximal product, is a[i], a[i+1], ..., a[j]. Since it is the array with the largest product, it is also the one suffix of a[0], a[1], ..., a[j], that produces the largest product.
The idea of your given algorithm is the following: For every prefix-array a[0], ..., a[j] find the largest suffix array. Out of these suffix arrays, take the maximal.
At the beginning, the smallest and biggest suffix-product are simply nums[0]. Then it iterates over all other numbers in the array. The largest suffix-array is always build in one of three ways. It's just the last numbers nums[i], it's the largest suffix-product of the shortened list multiplied by the last number (if nums[i] > 0), or it's the smallest (< 0) suffix-product multiplied by the last number (if nums[i] < 0). (*)
Using the helper variable result, you store the maximal such suffix-product you found so far.
(*) This fact is quite easy to proof. If you have a different case, for instance there exists a different suffix-product that produces a bigger number, than together with the last number nums[i] you create an even bigger suffix, which would be a contradiction.

Matrix with equal sum of rows and columns

I have NxM matrix with integer elements, greater or equal than 0.
From any cell I can transfer 1 to another one (-1 to the source cell, +1 to the destination).
Using this operation, I have to make sums for all rows and columns equal. The question is how to find the minimal amount of such operations to achieve my task. During the processing cells may be negative.
For example, for
1 1 2 2
1 0 1 1
0 0 1 1
1 1 1 2
The answer is 3.
P.s.: I've tried to solve it on my own, but came only to brute-force solution.
First, find the expected sum per row and per column 1.
rowSum = totalSum / numRows
colSum = totalSum / numCols
Then, iterate through the rows and the columns and compute the following values:
rowDelta = 0
for each row r
if sum(r) > rowSum
rowDelta += sum(r) - rowSum
colDelta = 0
for each col c
if sum(c) > colSum
colDelta += sum(c) - colSum
The number of the minimum moves to equilibrate all the rows and columns is:
minMoves = max(rowDelta, colDelta)
This works because you have to transfer from rows that exceed rowSum into rows that don't exceed it, and from columns that exceed colSum into columns that don't exceed it.
If initially rowDelta was lower than colDelta, then you will attain a stage where you equilibrated all the rows, but the columns are still not equilibrated. At this case, you will continue transferring from cells to other cells in the same row. The same applies if initially colDelta was lower than rowDelta, and that's why we selected the maximum between them as the expected result.
1 If totalSum is not a multiple of numRows or numCols, then the problem has no solution.
Let us consider the one dimensional case: you have an array of numbers and you are allowed a single operation: take 1 from the value of one of the elements of the array and add it to other element. The goal is to make all elements equal with minimal operations. Here the solution is simple: you choose random "too big number" and add one to random "too small" number. Let me now describe how this relates to the problem at hand.
You can easily calculate the sum that is needed for every column and every row. This is the total sum of all elements in the matrix divided by the number of columns or rows respectively. From then on you can calculate which rows and columns need to be reduced and which - increased. see here:
1 1 2 2 -2
1 0 1 1 +1
0 0 1 1 +2
1 1 1 2 -1
+1+2-1-2
Expected sum of a row: 4
Expected sum of a column: 4
So now we generate two arrays: the array of displacements in the rows: -2,+1,+2,-1 and the number of displacements in the columns: +1,+2,-1,-2. For this two arrays we solve the simpler task described above. It is obvious that we can not solve the initial problem in fewer steps than the ones required for the simpler task (otherwise the balance in the columns or rows will not be 0).
However I will prove that the initial task can be solved in exactly as many steps as is the maximum of steps needed to solve the task for the columns and rows:
Every step in the simpler task generates two indices i and j: the index from which to subtract and the index to which to add. Lets assume in a step in the column task we have indices ci and cj and in the row task we have indices ri and rj. Then we assign a correspondence of this in the initial task: take 1 from (ci, ri) and add one to (cj, rj). At certain point we will reach a situation in which there might be still more steps in, say, the columns task and no more in the rows task. So we get ci and cj, but what do we do for ri and rj? We just choose ri=rj so that we do not screw up the row calculations.
In this solution I am making use of the fact I am allow to obtain negative numbers in the matrix.
Now lets demonstrate:
Solution for columns:
4->1;3->2;4->2
Solution for rows:
1->3;1->3;2->4
Total solution:
(4,1)->(1,3);(3,1)->(2,3);(4,2)->(2,4)
Supose thar r1 is the index of a row with maximal sum, while r2 is the row with minimal sum. c1 column with maximal sum and c2 column with minimal.
You need to repeat the following operation:
if Matrix[r1][c1] == Matrix[r2][c2] we're done!
Otherwise, Matrix[r1][c1] -= 1 and Matrix[r2][c2] += 1

Monotonically increasing 2-d array

Give an algorithm to find a given element x (give the co-ordinates), in an n by n matrix where the rows and columns are monotonically increasing.
My thoughts:
Reduce problem set size.
In the 1st column, find the largest element <= x. We know x must be in this row or after (lower). In the last column of the matrix, find the smallest element >= x. We know x must be in this row or before. Do the same thing with the first and last rows of the matrix. We have now defined a sub-matrix such that if x is in the matrix at all, it is in this sub-matrix. Now repeat the algo on this sub-matrix... Something along these lines.
[YAAQ: Yet another arrays question.]
I think you cannot hope for more than O(N), which is attainable. (N is the width of the matrix).
Why you cannot hope for more
Imagine a matrix like this:
0 0 0 0 0 0 ... 0 0 x
0 0 0 0 0 0 ... 0 x 2
0 0 0 0 0 0 ... x 2 2
.....................
0 0 0 0 0 x ... 2 2 2
0 0 0 0 x 2 ... 2 2 2
0 0 0 x 2 2 ... 2 2 2
0 0 x 2 2 2 ... 2 2 2
0 x 2 2 2 2 ... 2 2 2
x 2 2 2 2 2 ... 2 2 2
where x is an unknown number (not the same number, ie. it might be a different one in every column). To satisfy the monotonicity of the matrix, you can place any of 0, 1, or 2 in all of the x places. So, to find if there is 1 in the matrix, you have to check all the x places, and there are N of them.
How to make it O(n)
Imagine you have to find first column indicies with number > q (a given number) for all rows. You start in the upper right corner of the matrix; if the number you see is greater, you go left; else go down. End when you are in the last row. The points where you went down are the places you search for. If any of them have the number you search for, you've found it.
This algorithm is O(n), because in each step, you either go left or down. Totally, it cannot go more than N times left and N times down. Therefore it's O(n).
Pick a corner element, one that is greatest in its row and smallest in its column (or the other way). Compare with x. Depending on the result of the comparison, you can exclude the row or the column from further search.
The new matrix has sum of dimensions decreased by 1, compared to the original one. Apply the above iteratively. After 2*n steps you end up with a 1x1 matrix.
If "the rows and columns are monotonically increasing" means that the values in each (row,col) increase such that for any row, (rowM,col1) < (rowM,col2) < ... < (rowM,colN) < (rowM+1,col1) ...
Then you can just treat it as a 1 dimensional array that is sorted from smallest to largest, and do a standard binary search, by sampling the item that is 1/2(rows * cols) fron the start, then sampling the element that is 1/4(rows * cols) behind (if the first element sampled is > x) or ahead (if the first element sampled is < x), and so forth.

Resources