Finding the longest common sequence - algorithm

I having reading the article finding the longest common between two giving string.
I came to know about an algorithm which code as follow:
for(int i=0;i<=n;i++)
for(int j=0;j<=m;j++){
if(i==0 || j==0) dd[i][j]=0;
else if(a[i-1]==b[j-1])
dd[i][j] = 1 + dd[i-1][j-1];
else{
dd[i][j] = Math.max(dd[i-1][j], dd[i][j-1]);
}
}
I quit understand this but i could not understand how it work , i.e. what is the proof of working it correctly. Why this thing work please help me to understand the algorithm

There are lots of resources you can find, if you google..
Word Aligned
GeeksforGeeks
YouTube
PDF
This is from the 1st link [Word Aligned]. There is good explanation with animation

let,
str1 = "bcd"
str2 = "dabc"
here n=3 and m=4
The outer loop is for str1 and inner loop is for str2
1st iteration is used to initialize the array.
Here is the first look of dd array :
0, 0, 0, 0, 0
0, 0, 0, 0, 0
0, 0, 0, 0, 0
0, 0, 0, 0, 0
In 2nd iteration of the outer loop, it compares whole str2 with 1st character of str1, that is "dabc" with 'b'
now when 'b' compares to "dabc",we see that at 4th iteration there will be a match, so array value will be changed with respect to the corner value of current position. Current position is [1][3]. As corner ([0][2]) value is 0, current value will be 1 :
0, 0, 0, 0, 0
0, 0, 0, 1, 0
0, 0, 0, 0, 0
0, 0, 0, 0, 0
At the end of the iteration the array will be like this :
0, 0, 0, 0, 0
0, 0, 0, 1, 1
0, 0, 0, 0, 0
0, 0, 0, 0, 0
value of position [1][4] changed, because if a character of str2 doesn't match with the current character of str1, then the value of the current position will be the max between the upper and left position of current position. Here max is 1.
Now check yourself, if str1="b" maximum match will be 1. That is we calculating the maximum match for each entry of str1 in each iteration.
In 3rd iteration of the outer loop, it compares whole str2 again with 2nd character of str1, that is "dabc" with 'c'
now at 5th iteration there will be a match, so array value will be changed with respect to the corner value of current position. Current position is [2][4]. As corner ([1][3]) value is 1, current value will be 2 :
0, 0, 0, 0, 0
0, 0, 0, 1, 1
0, 0, 0, 1, 2
0, 0, 0, 0, 0
Now check yourself, if str1="bc" maximum match will be 2. That is we calculating the maximum match for each entry of str1 in each iteration.
At the end of the iteration the array will be like this :
0, 0, 0, 0, 0
0, 0, 0, 1, 1
0, 0, 0, 1, 2
0, 0, 0, 0, 0
In 4th iteration of the outer loop, it compares whole str2 again with 3rd character of str1, that is "dabc" with 'd'
now at 2nd iteration there will be a match, so array value will be changed with respect to the corner value of current position. Current position is [3][1]. As corner ([2][0]) value is 0, current value will be 1 :
0, 0, 0, 0, 0
0, 0, 0, 1, 0
0, 0, 0, 0, 2
0, 1, 0, 0, 0
no further matching will occur.
At the end of the iteration the array will be like this :
0, 0, 0, 0, 0
0, 0, 0, 1, 1
0, 0, 0, 1, 2
0, 1, 1, 1, 2
So the LCS of these two string is dd[3][4] = 2.
Hope you understand this. You can also get help from here
Note : Here i described only about matching situation. If one character of str2 doesn't match with the current character of str1, then the value of the current position will be the max between the upper and left position of current position. Try it yourself, you will understand why we do this :)

Related

How to check if a specific point is within a figure in a matrix?

So I need a check function to see if a specific point in a matrix, say arr[3][4], is within a border, or a figure of characters. For clarification, imagine matrix char arr[10][10] below:
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 1 0 0
0 0 0 1 0 0 0 1 0 0
0 0 0 1 0 0 0 1 0 0
0 0 0 1 0 0 0 1 0 0
0 0 0 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
As you can see, the '1' characters form a square of side length 5. I would like a bool function to return that true for arr[5][5] (0-indexed) because it is within the figure, but false for arr[1][1] because it is not. If relevant, the total size of the matrix will always be a constant 100*100, no matter the size of the '1' figure within. Also, please note that the figure will not always be a perfect polygon like the square in the example.
I could not solve this problem because in my example above, clearly both points (arr[5][5] and arr[1][1]) have the same surrounding squares, and the space is large enough so that I cannot just check if the four directions of up, right, down, and left (yes, diagonals can be ignored here) is a '1' because the '0' inside would be next to other '0's.
EDIT: I also want to clarify according to some shortcomings of answers that the thickness of sides may vary. The shape very well could be:
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 1 1 1 1 1 0 0
0 0 0 0 0 1 1 1 0 0
0 0 0 0 1 0 0 1 1 0
0 0 1 1 1 0 1 1 0 0
0 0 1 0 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Therefore, counting whether the '1's on top of and to the left of the point is odd would not work.
So a flood fill would work, but it is quite heavy unless you want to know all the encased points. If you just want to check one point then you could do:
Count the number of ones in the vertical segment between points (x,0) and (x,y)
Count the number of ones in the horizontal segment between points (0,y) and (x,y)
If both are odd then you are inside.
Keep in mind that overlapping shapes or shapes with holes will not work with this algorithm.
So the function would look like this:
int inside(int x, int y)
{
int x_count = 0;
for(int i=0;i<x;i++)
if(matrix[y][i])
x_count++;
int y_count = 0;
for(int i=0;i<y;i++)
if(matrix[i][x])
y_count++;
return x_count%2 && y_count%2;
};
A full test program looks like:
#include <stdio.h>
int matrix1[10][10] = {
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{0, 0, 0, 1, 1, 1, 1, 1, 0, 0},
{0, 0, 0, 1, 0, 0, 0, 1, 0, 0},
{0, 0, 0, 1, 0, 0, 0, 1, 0, 0},
{0, 0, 0, 1, 0, 0, 0, 1, 0, 0},
{0, 0, 0, 1, 1, 1, 1, 1, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
};
int matrix2[10][10] = {
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{0, 1, 1, 1, 1, 1, 1, 1, 0, 0},
{0, 1, 0, 0, 0, 0, 0, 1, 0, 0},
{0, 1, 0, 1, 1, 1, 1, 1, 0, 0},
{0, 1, 0, 1, 0, 0, 0, 0, 0, 0},
{0, 1, 0, 1, 1, 1, 1, 1, 0, 0},
{0, 1, 0, 0, 0, 0, 0, 1, 0, 0},
{0, 1, 0, 0, 0, 0, 0, 1, 0, 0},
{0, 1, 1, 1, 1, 1, 1, 1, 0, 0},
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
};
int inside(int matrix[10][10],int x, int y)
{
int x_count = 0;
for(int i=0;i<x;i++)
if(matrix[y][i])
x_count++;
int y_count = 0;
for(int i=0;i<y;i++)
if(matrix[i][x])
y_count++;
return x_count%2 && y_count%2;
};
int main()
{
printf("2,2 is %s matrix1\n",inside(matrix1,2,2)?"inside":"outside");
printf("5,5 is %s matrix1\n",inside(matrix1,5,5)?"inside":"outside");
printf("8,8 is %s matrix1\n",inside(matrix1,8,8)?"inside":"outside");
printf("3,3 is %s matrix2\n",inside(matrix2,3,3)?"inside":"outside");
printf("5,5 is %s matrix2\n",inside(matrix2,5,5)?"inside":"outside");
printf("7,7 is %s matrix2\n",inside(matrix2,7,7)?"inside":"outside");
return 0;
}
Try it online https://onlinegdb.com/UkkaA3vWZ
The standard algorithm only needs to scan left to right along the row you wish to check.
First, check if the element is a 1. If it is you are “inside or on the edge”.
Otherwise, scanning from 0 to x:
If you count an odd number of edges, you are inside.
If you count an even number of edges, you are outside.
You must be careful how you count edges. An edge is one where you have a 1 both above and below in the surrounding 8 elements. Otherwise you have not crossed an edge (you have passed a point).
Likewise, if you hit a run of 1s, you must still apply the above and below for both the left and right side of the run.
BTW, the only sure way to check is the flood-fill algorithm explained by Jonathan S.. Everything else can be tricked.
Here's a simple algorithm that'll do that:
Iterate over all elements at the edges of the matrix.
Change all 0 elements at the edge of the matrix to 2. (Leave any 1 elements intact.)
Within the entire matrix, whenever a 0 borders a 2, change that 0 to a 2 as well. Repeat this until there are no 0 elements left that are adjacent to a 2.
Any elements that are still 0 now are encased by 1 elements.
This is a flood fill starting at the edges of the matrix. It gives you all "encased" elements at once.

Fast approximation of simple cases of relaxed bipartite dimension of graph problem

Given boolean matrix M, I need to find a set of submatrices A = {A1, ..., An} such that matrices in A contain all True values in matrix M and only them. Submatrices don't have to be continuous, i.e. each submatrix is defined by the two sets of indices {i1, ..., ik}, {j1, ..., jt} of M. (For example submatrix could be something like [{1, 2, 5}, {4, 7, 9, 13}] and it is all cells in intersection of these rows and columns.) Optionally submatrices can intersect if this results in better solution. The total number of submatrices n should be minimal.
Size of the matrix M can be up to 10^4 x 10^4, so I need an effective algorithm. I suppose that this problem may not have an effective exact algorithm, because it reminds me some NP-hard problems. If this is true, then any good and fast approximation is OK. We can also suggest that the amount of true values is not very big, i.e. < 1/10 of all values, but to not have accidental DOS in prod, the solution not using this fact is better.
I don't need any code, just a general idea of the algorithm and justification of its properties, if it's not obvious.
Background
We are calculating some expensive distance matrices for logistic applications. Points in these requests are often intersecting, so we are trying do develop some caching algorithm to not calculate parts of some requests. And to split big requests into smaller ones with only unknown submatrices. Additionally some distances in the matrix may be not needed for the algorithm. On the one hand the small amount of big groups calculates faster, on the other hand if we include a lot of "False" values, and our submatrices are unreasonably big, this can slow down the calculation. The exact criterion is intricate and the time complexity of "expensive" matrix requests is hard to estimate. As far as I know for square matrices it is something like C*n^2.5 with quite big C. So it's hard to formulate a good optimization criterion, but any ideas are welcome.
About data
True value in matrix means that the distance between these two points have never been calculated before. Most of the requests (but not all) are square matrices with the same points on both axes. So most of the M is expected to be almost symmetric. And also there is a simple case of several completely new points and the other distances are cached. I deal with this cases on preprocessing stage. All the other values can be quite random. If they are too random we can give up cache and calculate the full matrix M. But sometimes there are useful patterns. I think that because of the nature of the data it is expected to contain more big sumbatrices then random data. Mostly True values are occasional, but form submatrix patterns, that we need to find. But we cannot rely on this completely, because if algorithm gets too random matrix it should be able to at least detect it to not have too long and complex calculations.
Update
As stated in wikipedia this problem is called Bipartite Dimension of a graph and is known to be NP-hard. So we can reformulate it info finding fast relaxed approximations for the simple cases of the problem. We can allow some percentage of false values and we can adapt some simple, but mostly effective greedy heuristic.
I started working on the algorithm below before you provided the update.
Also, in doing so I realised that while one is looking for blocks of true values, the problem is not one of a block transformation, as you have also now updated.
The algorithm is as as follows:
count the trues in each row
for any row with the maximum count of trues, sort the columns in the
matrix so that the row's trues all move to the left
sort the matrix rows in descending order of congruent trues on the
left (there will now be an upper left rough triangle of congruent trues)
get the biggest rectangle of trues cornered at the upper left
store the row ids and column ids for that rectangle (this is a sub-matrix definition)
change the the sub-matrix's trues to falses
repeat from the top until the upper left triangle has no trues
This algorithm will produce a complete cover of the boolean matrix consisting of row-column intersection sub-matrices containing only true values.
I am not sure if allowing some falses in a sub-matrix will help. While it will allow bigger sub-matrices to be found and hence reduce the number of passes of the boolean matrix to find a cover, it will presumably take longer to find the biggest such sub-matrices because there will be more combinations to check. Also, I am not sure how one might stop falsey sub-matrices from overlapping. It might need the maintenance of a separate mask matrix rather than using the boolean matrix as its own mask, in order to ensure disjoint sub-matrices.
Below is a first cut implementation of the above algorithm in python.
I ran it on Windows 10 on a Intel Pentium N3700 # 1.60Ghz with 4GB RAM
As is, it will do, with randomly generated ~10% trues:
100 rows x 1000 columns < 7 secs
1000 rows x 100 columns < 6 secs
300 rows x 300 columns < 14 secs
3000 rows x 300 columns < 3 mins
300 rows x 3000 columns < 15 mins
1000 rows x 1000 columns < 8 mins
I have not tested it on approximately symmetric matrices, nor have I tested it on matrices with relatively large sub-matrices. It might perform well with relatively large sub-martrices, eg, in the extreme case, ie, the entire boolean matrix is true, only two passes of the algorithm loop are required.
One area I think there can be considerable optimisation is in the row sorting. The implementation below uses the in-built phython sort with a comparator function. A custom crafted sort function will probably do much better, and possibly especially so if it is a virtual sort similar to the column sorting.
If you can try it on some real data, ie, square, approximately symmetric matrix, with relatively large sub-matrices, it would be good to know how it goes.
Please advise if you would like to me to try some optimisation of the python. I presume to handle 10^4 x 10^4 boolean matrices it will need to be a lot faster.
from functools import cmp_to_key
booleanMatrix0 = [
( 0, 0, 0, 0, 1, 1 ),
( 0, 1, 1, 0, 1, 1 ),
( 0, 1, 0, 1, 0, 1 ),
( 1, 1, 1, 0, 0, 0 ),
( 0, 1, 1, 1, 0, 0 ),
( 1, 1, 0, 1, 0, 0 ),
( 0, 0, 0, 0, 0, 0 )
]
booleanMatrix1 = [
( 0, )
]
booleanMatrix2 = [
( 1, )
]
booleanMatrix3 = [
( 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0 )
]
booleanMatrix4 = [
( 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1 )
]
booleanMatrix14 = [
( 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0 ),
( 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1 ),
( 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0 ),
( 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0 ),
( 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1 ),
( 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1 ),
( 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1 ),
( 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1 ),
( 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1 ),
( 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0 ),
( 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 )
]
booleanMatrix15 = [
( 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0 ),
]
booleanMatrix16 = [
( 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1 ),
]
import random
booleanMatrix17 = [
]
for r in range(11):
row = []
for c in range(21):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix17.append(tuple(row))
booleanMatrix18 = [
]
for r in range(21):
row = []
for c in range(11):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix18.append(tuple(row))
booleanMatrix5 = [
]
for r in range(50):
row = []
for c in range(200):
row.append(random.randrange(2))
booleanMatrix5.append(tuple(row))
booleanMatrix6 = [
]
for r in range(200):
row = []
for c in range(50):
row.append(random.randrange(2))
booleanMatrix6.append(tuple(row))
booleanMatrix7 = [
]
for r in range(100):
row = []
for c in range(100):
row.append(random.randrange(2))
booleanMatrix7.append(tuple(row))
booleanMatrix8 = [
]
for r in range(100):
row = []
for c in range(1000):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix8.append(tuple(row))
booleanMatrix9 = [
]
for r in range(1000):
row = []
for c in range(100):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix9.append(tuple(row))
booleanMatrix10 = [
]
for r in range(317):
row = []
for c in range(316):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix10.append(tuple(row))
booleanMatrix11 = [
]
for r in range(3162):
row = []
for c in range(316):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix11.append(tuple(row))
booleanMatrix12 = [
]
for r in range(316):
row = []
for c in range(3162):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix12.append(tuple(row))
booleanMatrix13 = [
]
for r in range(1000):
row = []
for c in range(1000):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix13.append(tuple(row))
booleanMatrices = [ booleanMatrix0, booleanMatrix1, booleanMatrix2, booleanMatrix3, booleanMatrix4, booleanMatrix14, booleanMatrix15, booleanMatrix16, booleanMatrix17, booleanMatrix18, booleanMatrix6, booleanMatrix5, booleanMatrix7, booleanMatrix8, booleanMatrix9, booleanMatrix10, booleanMatrix11, booleanMatrix12, booleanMatrix13 ]
def printMatrix(matrix, colOrder):
for r in range(rows):
row = ""
for c in range(cols):
row += str(matrix[r][0][colOrder[c]])
print(row)
print()
def rowUp(matrix):
rowCount = []
maxRow = [ 0, 0 ]
for r in range(rows):
rowCount.append([ r, sum(matrix[r][0]) ])
if rowCount[-1][1] > maxRow[1]:
maxRow = rowCount[-1]
return rowCount, maxRow
def colSort(matrix):
# For a row with the highest number of trues, sort the true columns to the left
newColOrder = []
otherCols = []
for c in range(cols):
if matrix[maxRow[0]][0][colOrder[c]]:
newColOrder.append(colOrder[c])
else:
otherCols.append(colOrder[c])
newColOrder += otherCols
return newColOrder
def sorter(a, b):
# Sort rows according to leading trues
length = len(a)
c = 0
while c < length:
if a[0][colOrder[c]] == 1 and b[0][colOrder[c]] == 0:
return -1
if b[0][colOrder[c]] == 1 and a[0][colOrder[c]] == 0:
return 1
c += 1
return 0
def allTrues(rdx, cdx, matrix):
count = 0
for r in range(rdx+1):
for c in range(cdx+1):
if matrix[r][0][colOrder[c]]:
count += 1
else:
return
return rdx, cdx, count
def getBiggestField(matrix):
# Starting at (0, 0) find biggest rectangular field of 1s
biggestField = (None, None, 0)
cStop = cols
for r in range(rows):
for c in range(cStop):
rtn = allTrues(r, c, matrix)
if rtn:
if rtn[2] > biggestField[2]:
biggestField = rtn
else:
cStop = c
break;
if cStop == 0:
break
return biggestField
def mask(matrix):
maskMatrix = []
for r in range(rows):
row = []
for c in range(cols):
row.append(matrix[r][0][c])
maskMatrix.append([ row, matrix[r][1] ])
maskRows = []
for r in range(biggestField[0]+1):
maskRows.append(maskMatrix[r][1])
for c in range(biggestField[1]+1):
maskMatrix[r][0][colOrder[c]] = 0
maskCols= []
for c in range(biggestField[1]+1):
maskCols.append(colOrder[c])
return maskMatrix, maskRows, maskCols
# Add a row id to each row to keep track of rearranged rows
rowIdedMatrices = []
for matrix in booleanMatrices:
rowIdedMatrix = []
for r in range(len(matrix)):
rowIdedMatrix.append((matrix[r], r))
rowIdedMatrices.append(rowIdedMatrix)
import time
for matrix in rowIdedMatrices:
rows = len(matrix)
cols = len(matrix[0][0])
colOrder = []
for c in range(cols):
colOrder.append(c)
subMatrices = []
startTime = time.thread_time()
loopStart = time.thread_time()
loop = 1
rowCount, maxRow = rowUp(matrix)
ones = 0
for row in rowCount:
ones += row[1]
print( "_________________________\n", "Rows", rows, "Columns", cols, "Ones", str(int(ones * 10000 / rows / cols) / 100) +"%")
colOrder = colSort(matrix)
matrix.sort(key=cmp_to_key(sorter))
biggestField = getBiggestField(matrix)
if biggestField[2] > 0:
maskMatrix, maskRows, maskCols = mask(matrix)
subMatrices.append(( maskRows, maskCols ))
while biggestField[2] > 0:
loop += 1
rowCount, maxRow = rowUp(maskMatrix)
colOrder = colSort(maskMatrix)
maskMatrix.sort(key=cmp_to_key(sorter))
biggestField = getBiggestField(maskMatrix)
if biggestField[2] > 0:
maskMatrix, maskRows, maskCols = mask(maskMatrix)
subMatrices.append(( maskRows, maskCols) )
if loop % 100 == 0:
print(loop, time.thread_time() - loopStart)
loopStart = time.thread_time()
endTime = time.thread_time()
print("Sub-matrices:", len(subMatrices), endTime - startTime)
for sm in subMatrices:
print(sm)
print()
input("Next matrix")
LOOP over true values
Can you grow the submatrix containing the true value in any direction
( i.e can you go from
t
to
tt
tt
)
Keep growing for as long as possible
Set all cells in M that are in the new submatrix to false
Repeat until every cell in M is false.
Here is a simple example of how it works
The top picture shows the large Matrix M containing a few true values
The bottom rows show the first few iteration, with the blus submatric growing as it finds more adjacent cells with true values. In this case I have stopped because it cannot grow any durther without including false cells. If a few cells in a submatrix can be false, then you could continue a bit further.
Let's say M is an s by t matrix. The trivial (but possibly useful) solution is just to take all the non-empty columns (or rows) as your submatrices. This will result in at most min(s,t) submatrices.

How can you efficiently flip a large range of indices's values from 1 to 0 or vice versa

You're given an N sized array arr. Suppose there's a contiguous interval arr[a....b] where you want to flip all the 1s to 0s and vice versa. Now suppose that there are a large (millions or billions) of these intervals (they could have different starting and end points) that you need to process. Is there an efficient algorithm to get this done?
Note that a and b are inclusive. N can be any finite size essentially. The purpose of the question was just to practice algorithms.
Consider arr = [0,0,0,0,0,0,0]
Consider that we want to flips the following inclusive intervals [1,3], [0,4]
After process [1,3], we have arr = [0,1,1,1,0,0,0] and after processing [0,4], we have arr = [1,0,0,0,1,0,0], which is the final array.
The obvious efficient way to do that is to not do that. Instead first collect at what indices the flipping changes, and then do one pass to apply the collected flipping information.
Python implementation of a naive solution, the efficient solution, and testing:
def naive(arr, intervals):
for a, b in intervals:
for i in range(a, b+1):
arr[i] ^= 1
def efficient(arr, intervals):
flips = [0] * len(arr)
for a, b in intervals:
flips[a] ^= 1
flips[b+1] ^= 1
xor = 0
for i, flip in enumerate(flips):
xor ^= flip
arr[i] ^= xor
def test():
import random
n = 30
arr = random.choices([0, 1], k=n)
intervals = []
while len(intervals) < 100:
a = random.randrange(n-1)
b = random.randrange(n-1)
if a <= b:
intervals.append((a, b))
print(f'{arr = }')
expect = arr * 1
naive(expect, intervals)
print(f'{expect = }')
result = arr * 1
efficient(result, intervals)
print(f'{result = }')
print(f'{(result == expect) = }')
test()
Demo output:
arr = [1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0]
expect = [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
result = [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
(result == expect) = True
Cast to Int Array and use bitwise not if you are using C or C++. But this is an SIMD task so its parallelizable if you wish.

How to sort array of 0 and 1 of length n in O(n) time and O(1) space? And can we generalize this to array of 0, 1, 2, ...?

I would like to sort an array of 0 and 1. I have to sort it in linear time and in constant space. How can I do this without explicitly counting the number of 0 and 1?
I did something like this:
sort(array):
Q0 = Queue()
Q1 = Queue()
for i in (0, n-1):
if array[i] == 0:
Q0.push(array[i])
if array[i] == 1:
Q1.push(array[i])
j = 0
while Q0:
array[j] = Q0.pop()
j += 1
while Q1:
array[j] = Q1.pop()
j += 1
I think my solution is correct and has O(n) time but I am not sure of O(1) space. Any help?
Also, can we generalize the sorting to 0, 1, 2 arrays?
Here is (tested/working) Python:
# sort array of length n containing values of only 0 or 1
# in time O(n) and space O(1)
a = [1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
first = 0
last = len(a)-1
print a
# note: don't need temp since values are only 0 and 1
while(last>first):
if a[first] == 1:
a[first] = a[last]
a[last] = 1
last -= 1
else:
first += 1
print a
The idea is to swap all the ones to the end of the array and the zeros to the beginning of the array by keeping two pointers. i points to the first index that has a 1.
Here is a pseudo-code:
i = 1
for (j = 1 to n)
if(a[j] == 0)
swap(a[i], a[j])
i++

how many ways are there to see if a number is even, and which one is the fastest and clearest?

given any number, what's the best way to determine it is even? how many methods can you think of, and what is the fastest way and clearest way?
bool isEven = ((number & 0x01) == 0)
The question said "any number", so one could either discard floats or handle them in another manner, perhaps by first scaling them up to an integral value first - watching out for overflow - i.e. change 2.1 to 21 (multiply by 10 and convert to int) and then test. It may be reasonable to assume, however, that by mentioning "any number" the person who posed the question is actually referring to integral values.
bool isEven = number % 2 == 0;
isEven(n) = ((-1) ^ n) == 1
where ^ is the exponentiation/pow function of your language.
I didn't say it was fast or clear, but it has novelty value.
The answer depends on the position being applied for. If you're applying for an Enterprise Architect position, then the following may be suitable:
First, you should create a proper Service-Oriented Architecture, as certainly the even-odd service won't be the only reusable component in your enterprise. An SOA consists of a service, interface, and service consumers. The service is function which can be invoked over the network. It exposes an interface contract and is typically registered with a Directory Service.
You can then create a Simple Object Access Protocol (SOAP) HTTP Web Service to expose your service.
Next, you should prevent clients from directly calling your Web Service. If you allow this, then you will end up with a mess of point-to-point communication, which is very hard to maintain. Clients should access the Web Service through an Enterprise Service Bus (ESB).
In addition to providing a standard plug-able architecture, additional components like service orchestration can occur on the bus.
Generally, writing a bespoke even/odd service should be avoided. You should write a Request for proposal (RFP), and get several vendors to show you their even/odd service. The vendor's product should be able to plug into your ESB, and also provide you with an Service level agreement (SLA).
This is even easier in ruby:
isEven = number.even?
Yes.. The fastest way is to check the 1 bit, because it is set for all odd numbers and unset for all even numbers..
Bitwise ANDs are pretty fast.
If your type 'a' is an integral type, then we can define,
even :: Integral a => a -> Bool
even n = n `rem` 2 == 0
according to the Haskell Prelude.
For floating points, of course within a reasonable bound.
modf(n/2.0, &intpart, &fracpart)
return fracpart == 0.0
With some other random math functions:
return gcd(n,2) == 2
return lcm(n,2) == n
return cos(n*pi) == 1.0
If int is 32 bits then you could do this:
bool is_even = ((number << 31) >> 31) == 0;
With using bit shifts you'll shift the right-most bit to the left-most position and then back again, thus making all other bits 0's. Then the number you're left with is either 0 or 1. This method is somewhat similar to "number & 1" method where you again make all bits 0's except the first one.
Another approach, similar to this one is this:
bool is_even = (number << 31) == 0;
or
bool is_odd = (number << 31) < 0;
If the number is even (the right-most bit is 0), then shifting it 31 positions will make the whole number 0. If the bit is 1, i.e. the number is odd, then the resulting number would be negative (every integer with left-most bit 1 is negative except if the number is of type unsigned, where it won't work). To fix signed/unsigned bug, you can just test:
bool is_odd = (number << 31) != 0;
Actually I think (n % 2 == 0) is enough, which is easy to understand and most compilers will convert it to bit operations as well.
I compiled this program with gcc -O2 flag:
#include <stdio.h>
int main()
{
volatile int x = 310;
printf("%d\n", x % 2);
return 0;
}
and the generated assembly code is
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $32, %esp
movl $310, 28(%esp)
movl 28(%esp), %eax
movl $.LC0, (%esp)
movl %eax, %edx
shrl $31, %edx
addl %edx, %eax
andl $1, %eax
subl %edx, %eax
movl %eax, 4(%esp)
call printf
xorl %eax, %eax
leave
ret
which we can see that % 2 operation is already converted to the andl instruction.
Similar to DeadHead's comment, but more efficient:
#include <limits.h>
bool isEven(int num)
{
bool arr[UINT_MAX] = { 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
// ...and so on
};
return arr[num];
}
As fast as an array index, which may or may not be faster than bitwise computations (it's difficult to test because I don't want to write the full version of this function). For what it's worth, that function above only has enough filled in to find even numbers up to 442, but would have to go to 4294967295 to work on my system.
With reservations for limited stack space. ;) (Is this perhaps a candidate for tail calls?)
public static bool IsEven(int num) {
if (num < 0)
return !IsEven(-num - 1);
if (num == 0)
return true;
return IsEven(-num);
}
a % 2.
It's clear
It's fast on every decent compiler.
Everyone who cries "But! But! What if compiler doesn't optimize it" should find normal compiler, shut up and read about premature optimization, read again, read again.
If it's low level check if the last (LSB) bit is 0 or 1 :)
0 = Even
1 = Odd
Otherwise, +1 #sipwiz: "bool isEven = number % 2 == 0;"
Assumming that you are dealing with an integer, the following will work:
if ((testnumber & -2)==testnumber) then testnumber is even.
basically, -2 in hex will be FFFE (for 16 bits) if the number is even, then anding with with -2 will leave it unchanged.
** Tom **
You can either using integer division and divide it by two and inspect the remainder or use a modulus operator and mod it by two and inspect the remainder. The "fastest" way depends on the language, compiler, and other factors but I doubt there are many platforms for which there is a significant difference.
Recursion!
function is_even (n number) returns boolean is
if n = 0 then
return true
elsif n = 1 then
return false
elsif n < 0 then
return is_even(n * -1)
else
return is_even(n - 2)
end if
end
Continuing the spirit of "how many ways are there...":
function is_even (n positive_integer) returns boolean is
i := 0
j := 0
loop
if n = i then
return (j = 0)
end if;
i := i + 1
j := 1 - j
end loop
end
In response to Chris Lutz, an array lookup is significantly slower than a BITWISE_AND operation. In an array lookup you're doing a memory lookup which will always be slower than a bitwise operation because of memory latency. This of course doesn't even factor in the problem of putting all possible int values into your array which has a memory complexity of O(2^n) where n is your bus size (8,16,32,64).
The odd/even property is only defined in integers. So any answer dealing with floating point is invalid. The abstract representation of this problem is Int -> bool (to use Haskell notation).
Another useless novelty solution:
if (2 * (n/2) == n)
return true;
else
return false;
Only with integers, and it depends on how the langugage handles integer division.
n/2 == n/2 if it's even or n/2-.5 if it's odd.
So 2*(n/2) == n if it's even or n - 1 if it's odd.
Here's a recursive way to do it in python:
def is_even(n: int) -> bool:
if n == 0:
return True
else:
return is_odd(n-1)
def is_odd(n: int) -> bool:
if n == 0:
return False
else:
return is_even(n-1)
Of course, you can add in logic to check if n is negative as well.

Resources