My original goal was to do some jugglery with a couple of matrices. Its simple and I explaine it for 2D matrices below:
Given a certain matrix Matrix1 as:
, and a binary matrix Matrix2 as this:
, I want to allocate the elements from Matrix1 to Matrix2 such that I have a final matrix Matrix3, which looks like:
The following one liner worked for me:
(Matrix3 = zeros(eltype(Matrix1),size(Matrix2)))'[Matrix2'[:]] .= Matrix1'[:]
Now I need to extend it for higher dimensions i.e. 3D or more. So, suppose Matrix1 has dimension (4,6,6) and the binary matrix Matrix2 has dimension (4,12,12). The allocation problem remains the same. How would then you approach it? Can someone kindly help me in it (preferably with a one liner)? Note here that for both the matrices the size of the first dimension is the same here, 4 in this case. Along the rest of the two dimensions, individually both the matrices are square.
Related
As I understand, the Nvidia tensor cores multiplies two 4x4 matrices and adds the result to a third matrix. Multiplying two 4x4 matrices produces a 4x4 matrix, and adding two 4x4 matrices produces a 4x4 matrix. Still "Each Tensor Core provides a 4x4x4 matrix processing array".
There are 4x multiplication-accumulate operations that are needed for each row*col. I thought maybe the last x4 comes from intermediate result before the accumulation, but I don't think it quite fits with the description on Nvidias pages.
"The FP16 multiply results in a full precision result that is accumulated in FP32 operations with the other products in a given dot product for a 4x4x4 matrix multiply, as Figure 9 shows."
https://developer.nvidia.com/blog/cuda-9-features-revealed/
4x4x4 matrix multiply? I thought matrices was 2dimensions by definition.
Can someone please explain where the last x4 comes from?
4x4x4 is just the notation for multiplication of one 4x4 matrix with another 4x4 matrix.
If you were to multiply a 4x8 matrix with a 8x4 matrix, you would have 4x8x4. So if A is NxK and B is KxM, then it can be referred to as a NxKxM matrix multiply.
I just briefly looked up and found this paper, where they use this exact notation (e.g. in Section 4.6 on page 36): https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/153863/eth-6705-01.pdf
The cube itself represents the 64 element-wise products required to generate the full 4x4 product matrix" cvw.cac.cornell.edu/GPUarch/tensor_cores. It is the intermediate products before accumulation that make up the last x4.
I'm looking for a way to implement block-diagonal matrices in Tensorflow. Specifically, I have block-diagonal matrix A with N blocks of size S x S each. Further, I have a vector v of length N*S. I want to calculate A dot v. Is there any efficient way to do it in Tensorflow?
Also, I would prefer the implementation which supports a batch dimension of v (e.g. its real dimension is batch_size x (N*S)) and which is memory efficient, keeping in memory only block-diagonal parts of A.
Thanks for any help!
You can simply convert your tensor to a sparse tensor since a block-diagonal matrix is just a special case of it. Then, the operations are done in a efficient way. If you already have a dense representation of the tensor you can just cast it using sparse_tensor = tf.contrib.layers.dense_to_sparse(dense_tensor). Otherwise, you can construct it with the tf.SparseTensor(...) function. To get the indices, you might use tf.strided_slice, see this post for more information.
I want to gradually apply a matrix4 on an object, in a function that updates every frame.
If i have two matrices, what is the way to know the difference between them. let say i would like to know the matrix that represent the first one + 0.2 of the difference between the two.
You should look into THREE.Quaternion.slerp and THREE.Vector3.lerp methods.
Slerp stands for "spherical linear interpolation" while lerp stands for "linear interpolation".
A matrix then has to be constructed by three based on these two, but three handles this internally if you set myObject3D.position and myObject3D.quaternion.
If your starting point is a matrix4, you can decompose it into quaternions and vectors, and then compose one from the new results. If you want just the end result in a matrix you can use makeRotationFromQuaternion( quaternion ) followed by setPosition( position ).
Problem Given N 3-dimensional points which are {$p_1,p_2,..,p_n$} where $p_i = (x_i,y_i,z_i) $ . I have to find the value of the formula
for some given constant integers P, Q, R, S.
all numbers are between 1 and M ( = 100).
I need an efficient method for the calculation for this formula
Please give any idea about how to reduce complexity better than $O(n^2)$
Assuming that all coordinates are between 1 and 100, then you could do this via:
Compute 3d histogram of all points O(100*100*100) operations.
Use FFT to compute convolution of histograms along each of the 3 axes
This will result in a 3d histogram of 3d vectors. You can then iterate over this histogram to compute your desired value.
The main point is that computing a convolution of histogram of values computes the histogram of pairwise differences of those values. This can also be used to compute a histogram of sums of values in a similar way.
Your problem looks like a particle potential problem (the kind you have in electrodynamics for instance), where you have to find some "potential" at the location (x_j, y_j) by summing all elementary contributions from the i-th particles.
The fast algorithm specific for this class of problems is the Fast Multipole method. Look up this keyword, but I must warn you it is by no means simple to understand or implement. Strong math background needed.
I am trying to apply Random Projections method on a very sparse dataset. I found papers and tutorials about Johnson Lindenstrauss method, but every one of them is full of equations which makes no meaningful explanation to me. For example, this document on Johnson-Lindenstrauss
Unfortunately, from this document, I can get no idea about the implementation steps of the algorithm. It's a long shot but is there anyone who can tell me the plain English version or very simple pseudo code of the algorithm? Or where can I start to dig this equations? Any suggestions?
For example, what I understand from the algorithm by reading this paper concerning Johnson-Lindenstrauss is that:
Assume we have a AxB matrix where A is number of samples and B is the number of dimensions, e.g. 100x5000. And I want to reduce the dimension of it to 500, which will produce a 100x500 matrix.
As far as I understand: first, I need to construct a 100x500 matrix and fill the entries randomly with +1 and -1 (with a 50% probability).
Edit:
Okay, I think I started to get it. So we have a matrix A which is mxn. We want to reduce it to E which is mxk.
What we need to do is, to construct a matrix R which has nxk dimension, and fill it with 0, -1 or +1, with respect to 2/3, 1/6 and 1/6 probability.
After constructing this R, we'll simply do a matrix multiplication AxR to find our reduced matrix E. But we don't need to do a full matrix multiplication, because if an element of Ri is 0, we don't need to do calculation. Simply skip it. But if we face with 1, we just add the column, or if it's -1, just subtract it from the calculation. So we'll simply use summation rather than multiplication to find E. And that is what makes this method very fast.
It turned out a very neat algorithm, although I feel too stupid to get the idea.
You have the idea right. However as I understand random project, the rows of your matrix R should have unit length. I believe that's approximately what the normalizing by 1/sqrt(k) is for, to normalize away the fact that they're not unit vectors.
It isn't a projection, but, it's nearly a projection; R's rows aren't orthonormal, but within a much higher-dimensional space, they quite nearly are. In fact the dot product of any two of those vectors you choose will be pretty close to 0. This is why it is a generally good approximation of actually finding a proper basis for projection.
The mapping from high-dimensional data A to low-dimensional data E is given in the statement of theorem 1.1 in the latter paper - it is simply a scalar multiplication followed by a matrix multiplication. The data vectors are the rows of the matrices A and E. As the author points out in section 7.1, you don't need to use a full matrix multiplication algorithm.
If your dataset is sparse, then sparse random projections will not work well.
You have a few options here:
Option A:
Step 1. apply a structured dense random projection (so called fast hadamard transform is typically used). This is a special projection which is very fast to compute but otherwise has the properties of a normal dense random projection
Step 2. apply sparse projection on the "densified data" (sparse random projections are useful for dense data only)
Option B:
Apply SVD on the sparse data. If the data is sparse but has some structure SVD is better. Random projection preserves the distances between all points. SVD preserves better the distances between dense regions - in practice this is more meaningful. Also people use random projections to compute the SVD on huge datasets. Random Projections gives you efficiency, but not necessarily the best quality of embedding in a low dimension.
If your data has no structure, then use random projections.
Option C:
For data points for which SVD has little error, use SVD; for the rest of the points use Random Projection
Option D:
Use a random projection based on the data points themselves.
This is very easy to understand what is going on. It looks something like this:
create a n by k matrix (n number of data point, k new dimension)
for i from 0 to k do #generate k random projection vectors
randomized_combination = feature vector of zeros (number of zeros = number of features)
sample_point_ids = select a sample of point ids
for each point_id in sample_point_ids do:
random_sign = +1/-1 with prob. 1/2
randomized_combination += random_sign*feature_vector[point_id] #this is a vector operation
normalize the randomized combination
#note that the normal random projection is:
# randomized_combination = [+/-1, +/-1, ...] (k +/-1; if you want sparse randomly set a fraction to 0; also good to normalize by length]
to project the data points on this random feature just do
for each data point_id in dataset:
scores[point_id, j] = dot_product(feature_vector[point_id], randomized_feature)
If you are still looking to solve this problem, write a message here, I can give you more pseudocode.
The way to think about it is that a random projection is just a random pattern and the dot product (i.e. projecting the data point) between the data point and the pattern gives you the overlap between them. So if two data points overlap with many random patterns, those points are similar. Therefore, random projections preserve similarity while using less space, but they also add random fluctuations in the pairwise similarities. What JLT tells you is that to make fluctuations 0.1 (eps)
you need about 100*log(n) dimensions.
Good Luck!
An R Package to perform Random Projection using Johnson- Lindenstrauss Lemma
RandPro