Reduce by Keys performance using linear index to rows index

Reduce by Keys performance using linear index to rows index - performance

I have been checking the code rows_sum.cu from thrust examples and I could not understand what is exactly happening in linear_index_to_rows_index.
Can someone please explain me with some example what does convert a linear index to row index mean?
Reference: https://github.com/thrust/thrust/blob/master/examples/sum_rows.cu

In thrust, a common storage format is a vector container, (e.g. device_vector and host_vector). In typical usage, these containers are a one-dimensional storage format. The concept of "row" and "column" don't really apply to a 1D vector. We simply talk about the index of an element. In this case, it is a linear (1D) index.
In order to store a 2 dimensional item, such as a matrix where the concept of "row" and "column" indexing of elements is applicable, a common approach to using the 1D containers in thrust is to "flatten" or "linearize" the storage:
Matrix A:
column:
0 1
row: 0 1 2
1 3 4
Vector A:
index: 0 1 2 3
element: 1 2 3 4
At this point, we can conveniently use thrust operations. But what if we want to do operations on specific rows or columns of the original matrix? That information (row, column) is "lost" when we linearize or flatten it into a 1D vector. But if we know the dimensions of the original matrix, we can use that to convert a linear index:
0 1 2 3
into a row/column index:
(0,0) (0,1) (1,0) (1,1)
and the reverse (row/column to linear index).
In the case of the thrust example you linked, the functor linear_index_to_row_index converts a given linear index to its associated row index. This allows the programmer to write a thrust operation that works on specific rows of data, such as summing the "rows" of the original matrix, even though it is now stored in a linear 1D vector.
Specifically, when given the following linear indicies:
0 1 2 3
For my example, the functor would return:
0 0 1 1
Because that represents the row of the original matrix that the specific element in the vector belongs to.
If I want to sum all the elements of each row together, producing one sum per row, then I can use the row index generated by that functor to identify the row of each element. At that point, reduce_by_key can easily sum together the elements that have the same row index, producing one result per row.

Related

Best mapping between 2 sequences

I have two sequences of items:
S1 = [ A B C D E F ]
S2 = [ 1 2 3 4 5 6 7 8 ]
And I can determine "similarity" for each pair of items (s1, s2) as a number (for example on scale 0 to 10).
I want to find a mapping between S1/S2 items, such that ordering of each sequence is preserved and sum of "similarity" values between mapped items is maximum. It is not required that all S1/S2 items are part of mapping.
Example:
[ A B C D E F ]
[ 1 2 3 4 5 6 7 8 ]
In example above, mapping 'A on 3', 'D on 4' and 'F on 6' gives overall maximum "similarity".
Are there any existing problems (/algorithms) this could be turned into?

Looks like the Smith–Waterman algorithm, which is traditional used for determining similar regions between two strings of nucleic acid sequences or protein sequences, should be perfect:
Smith–Waterman algorithm aligns two sequences by matches/mismatches (also known as substitutions), insertions, and deletions. Both insertions and deletions are the operations that introduce gaps, which are represented by dashes. The Smith–Waterman algorithm has several steps:
Determine the substitution matrix and the gap penalty scheme. A substitution matrix assigns each pair of items (s1, s2) a score for match or mismatch. Usually matches get positive scores, whereas mismatches get relatively lower scores. A gap penalty function determines the score cost for opening or extending gaps. It is suggested that users choose the appropriate scoring system based on the goals. In addition, it is also a good practice to try different combinations of substitution matrices and gap penalties.
Initialize the scoring matrix. The dimensions of the scoring matrix are 1+length of each sequence respectively. All the elements of the first row and the first column are set to 0. The extra first row and first column make it possible to align one sequence to another at any position, and setting them to 0 makes the terminal gap free from penalty.
Scoring. Score each element from left to right, top to bottom in the matrix, considering the outcomes of substitutions (diagonal scores) or adding gaps (horizontal and vertical scores). If none of the scores are positive, this element gets a 0. Otherwise the highest score is used and the source of that score is recorded.
Traceback. Starting at the element with the highest score, traceback based on the source of each score recursively, until 0 is encountered. The segments that have the highest similarity score based on the given scoring system is generated in this process. To obtain the second best local alignment, apply the traceback process starting at the second highest score outside the trace of the best alignment.
Just choose the substitution matrix to match yours
And I can determine "similarity" for each pair of items (s1, s2) as a number (for example on scale 0 to 10).
and set the gap and no match penalty to zero
I want to find a mapping between S1/S2 items, such that ordering of each sequence is preserved and sum of "similarity" values between mapped items is maximum. It is not required that all S1/S2 items are part of mapping.
More information can be found at: https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm#Scoring_matrix

The problem you described looks like Longest Common Subsequence Problem variation.
Use this recurrent relation instead of original:
ans[i][j] = max(
ans[i-1][j],
ans[i][j-1],
ans[i-1][j-1] + similarity(S1[i], S2[j])
)

Pairwise matching of tiles

Recently in a coding competition I came across this question.
We have a 1000 tiles where each tile is a 3x3 matrix. Each cell in the
matrix has an integer value from 0 to 9 which signifies the elevation
of the cell. The problem was to find the maximum pairs of tiles such
that they fit in perfectly. The tiles may be rotated to fit in. By fit
in it means that for tile A and tile B
A[i]+B[i]=const for i=0 to 8
The approach I thought for this problem was that I could maintain a hash value corresponding to each tile. Then I would find the possible combinations of tiles that would be
a possible fit and look it up in the hashtable.
Ex. For the tile below
5 3 2 4 6 7 5 7 8
4 8 9 matches with 5 1 0 for const = 9 & with 6 2 1 for const=10
1 4 5 8 5 4 9 6 5
for this tile the 'const' would range from 9(adding 0 to the maximum element) to 10(adding 9 to the minimum element).
So I would get two possible combinations for tiles which i would look up in the table.
But this method is greedy and does not give the desired answer and also I was unable to think of a proper hash function which would consider of all possible rotations.
So what would be a good approach for solving this problem?
I am sure there is a brute force way to solve this problem but I was actually wondering whether a viable solution to the problem exists on the lines of "pairwise equal to k" problem.

For n=1000 I would stick with the O(n^2) brute force solution. However an O(n log n) algorithm is described below.
The lexicographicalish ordering is defined by the following less-than operator:
Given two matrices M1, M2, define M1' as M1 if M1[1] is positive and -M1 if M1[1] is negative, and likewise or M2'. We say that M1<M2 if M1'[1]<M2'[1], or if M1'[1] == M2'[1] and M1'[2] < M2'[2], or if M1'[1] == M2'[1] and M1'[2] == M2'[2] and M1'[3] < M2'[3] etc.
Subtract the middle element of each matrix from the rest of the elements of the matrix i.e. A'[5] = A[5] and A'[i] = A[i] - A[5]. Then A' fits with B' if A'[i] +B'[i] = 0 for i!=5, and the elevation is A'[5] + B'[5].
Create an array of matrices and a dictionary. Rotate each matrix so that the top left corner has minimal absolute value before adding it to the array. If there are multiple corners with the same absolute value then duplicate the matrix and store both rotations in the array.
If some rotation of a matrix fits with itself and i,j are indices of rotations of this matrix, add the key-value pairs (i,j) and (j, i) to the dictionary.
Create an array S of indices 1,2... and sort S using the lexicographicalish ordering.
Instead of needing O(n^2) operations to check all possible pairs of matrices, it is only necessary to check all pairs of matrices with indices are S_i and S_(i+1). If a pair of matrices fits, use the dictionary to check that the two matrices are not rotations of the same original matrix before calculating the elevation of the pair.

Not sure if this is the most efficient way for doing this, but it sure works.
What I would do is:
Go over all tiles and check the maximum and minimum value of each tile and save it in a different array.
Check all possible pairs.
If min(A) + max(B) == min(B) + max(A) then check if some rotation of B fits perfectly on A. If it does, add 1 to your count.
Else, it does not fit so you can skip the checking for this pair.
Note: The reason for saving both maximum and minimum for each tile is that it might save us unnecessary calculations and checking rotations as in O(1) we can check if it doesn't fit.

Matrix reordering to block diagonal form

Give a sparse matrix, how to reorder the rows and columns such that it is in block diagonal like form via row and column permutation?
Row and column permutation are not necessarily coupled like reverse Cuthill-McKee ordering:
http://www.mathworks.com/help/matlab/ref/symrcm.html?refresh=true In short, you can independently perform any row or column permutation.
The overall goal is to cluster all the non zero elements towards diagonal line.

Here is one approach.
First make a graph whose vertices are rows and columns. Every non-zero value is a edge between that row and that column.
You can then use a standard graph theory algorithm to detect the connected components of this graph. The single element ones represent all zero rows and columns. Number the others. Those components may have unequal numbers of rows and columns. You can distribute some zero rows and columns to them to make them square.
Your square components will be your blocks, and from the numbering of those components you know what order to put them in. Now just reorder rows and columns to achieve this structure and, voila! (The remaining zero rows/columns will result in a bunch of 0 blocks at the bottom right of the diagonal.)

Just an idea, but if you make a new matrix Ab from the original block-matrix A that contains the block-sparsity structure of A. E.g.:
A = [B 0 0; 0 0 C; 0 D 0]; % with matrices 0 (zero elements), B,C and D
Ab = [1 0 0; 0 0 2; 0 3 0]; % with identifiers 1, 2 and 3 (1-->B, 2-->C, 3-->D)
Then Ab is a simple sparse matrix (size 3x3 in the example). You can then use the reverse Cuthill-McKee ordering to get the permutations you want, and apply these permutations to Ab.
p = symrcm(Ab);
Abperm = Ab(p,p);
Then use the identifiers to create the ordered block matrix Aperm from Abperm and you'll have the desired result, I believe.
You'll need to be clever in assigning the identifiers to the individual blocks and so on, but this should be possible.

Create Ancestor Matrix from given Binary Tree

The question is, given a Ancestor Matrix, as a bitmap of 1s and 0s, to construct the corresponding Binary Tree. Can anyone give me an idea on how to do it? I found a solution at Stackoverflow, but the line a[root->data][temp[i]]=1 seems wrong, there is no binding that the nodes will contain data 1 to n. It may contain, say 2000, in which case, there will be no a[2000][some_column], since there are only 7 nodes, hence 7 rows and columns in the matrix.

Two ways:
Normalize your node values such that they are all from 1 to n. If you have nodes 1, 2, 5000 for example, make them 1, 2, 3. You can do this by sorting or hashing your labels and keeping something like normalized[i] = normalized value of node i. normalized can be a map / hash table if you have very large labels or even text labels.
You might be able to use a sparse matrix for this, implementable with a hash table or a set: keep a hash table of hash tables. H[x] stores another hash table that stores your y values. So if in a naive matrix solution you had a[2000][5000] = 1, you would use H.get(2000) => returns a hash table H' of values stored on the 2000th row => H'.get(5000) => returns the value you want.

Is there a search algorithm for huge two-dimensional arrays?

This is not a real-life question, it is just theory-crafting.
I have a big array which consists of elements like [1,140,245,123443], all
integer or floats with low selectivity, and the number of unique values is ten
times less than the size of the array. B*tree indexing is not good in this case.
I also tried to implement bitmap indexing, but in Ruby, binary operations are not so fast.
Are there any good algorithms for searching two-dimensional arrays of fixed size vectors?
And, the main question is, how do I convert the vector in value, where the conversion function has to be monotonic, so I can apply range queries such as:
(v[0]<10, v[2]>100, v[3]=32, 0.67*10^-8<v[4]<1.2154241410*10^-6)
the only idea i have is to create separate sorted indexes for each component of vector...binary search then and merge...but it is a bad idea because in the worst case scenario it will require O(N*N) operations...

Assuming that each "column" is vaguely evenly distributed in a known range, you could keep track of a series of buckets for each column, and a list of rows that satisfy the bucket. The number of buckets for each column can be the same, or different, it's totally arbitrary. More buckets is faster, but takes slightly more memory.
my table:
range: {1to10} {1to4m} {-2mto2m}
row1: {7 3427438335 420645075}
row2: {5 3862506151 -1555396554}
row3: {1 2793453667 -1743457796}
buckets for column 1:
bucket{1-3} : row3
bucket{4-6} : row2
bucket{7-10} : row1
buckets for column 2:
bucket{1-2m} :
bucket{2m-4m} : row1, row2, row4
buckets for column 3:
bucket{-2m--1m} : row2, row3
bucket{-1m-0} :
bucket{0-1m} :
bucket{1m-2m} : row1
Then, given a series of criteria: {v[0]<=5, v[2]>3*10^10}, we pull out the buckets that match that criteria:
column 1:
v[0]<=5 matches buckets {1-3} and {4-6}, which is rows 2 and 3.
column 2:
v[2]>3*10^10} matches buckets {2m-4m} and {4-6}, which is rows 1, 2 and 3.
column 3:
"" matches all , which is rows 1, 2 and 3.
Now we know that the row(s) we're looking for meet all three criteria, so we list all the rows that are in the buckets that matched all the criteria, in this case, rows 2 and 3. At this point, the number of rows remaining will be small even for massive amounts of data, depending on the granularity of your buckets. You simply check each of the rows that is left at this point to see if they match. In this sample we see that row 2 matches, but row 3 doesn't.
This algorithm is technically O(n), but in practice, if you have large numbers of small buckets, this algorithm can be very fast.

Using an index :)
The basic idea is to turn the 2 dimensional array into a 1 dimensional sorted array(while keeping the original position) and apply binary search on the later.
This method works for any n dimensional array and is used widely by databases which can be seen as a n dimensional array with variable lengths.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio