Tensors with Boolean coefficients in Eigen - same problems as with matrices? - c++11

I read here that matrices with Boolean coefficients behave inconsistently. Should I be similarly worried about tensors from Eigen's unsupported module? My guess is that this article implies that the Array class is fine with Boolean types.
I ran some tests and they turned out to be fine. But in the production version of my software, I use tensors that are very large and I need to be able to rely on the results.

Related

How to make matrix calculations as fast as possible

Purely for my own knowledge and understanding of code and computers, I am trying to create an array/matrix class with multiple matrix functions, which I will then use in any projects I need a matrix or array class for. Most significantly, I would like to make a neural network library using this matrix/array class, and therefore require it to be as fast as possible.
The function I require to be fastest is the matrix product calculation of two matrices, however, I have had little luck trying to make this calculation fast with larger matrices.
My current method for calculating the dot product is:
Note, this code is in python, however, if python is not the optimal language, I can use any other
a = [[1, 2, 3], [4, 5, 6]]
b = [[1], [2], [3]]
def dot(a, b):
c = [[0 for j in range(len(b[i]))] for i in range(len(a))]
for i in range(len(c)):
for j in range(len(c[i])):
t = 0
for k in range(len(b)):
t += a[i][k] * b[k][j]
c[i][j] = t
return c
print(dot(a, b))
# [[14], [32]]
I have looked into the Intel MKL (I have an intel core i7) and other BLAS implementations like OpenBLAS, however I have not been able to get any results that worked, and no amount of googling can make them work, so my question is, what is the fastest way to calculate the dot product of two matrices? (CPU and memory usage do not matter much to me currently, however, being more efficient would be nice)
PS:
I am trying to do all of this using no external libraries (numpy, for example, in python)
***** UPDATE *****
I am using a mac
***** UPDATE 2 *****
Thank you everyone for all of your help, however, I am unsure how to implement these methods of calculating the dot product as my math skills are not yet advanced enough to understand what any of it means (I am yet to start my GCSEs), though I will keep these ideas in mind and will experiment with these ideas further.
Thank you again for everyone's help.
If it possible, you can use CUDA to utilize GPU for very fast calculations.
You can use GPU
as AbdelAziz AbdelLatef suggested in his answer. However this limits the usage of your lib to computers with GPU.
Parallelize the dot products for big matrices
use SIMD instructions
use state of the art algorithms
some operations on big data sets can be done much faster using more advanced techniques which are too slow for small matrices ... usually involving FFT or NTT ... Matrix multiplication is set of dot products and dot product is form of convolution so FFT approach should be applicable but had never done that for matrices/vectors ...
there are also special algorithms solely for matrices like Strassen algorithm
for powers you can use power by squaring, for sqr I think you can simplify even more some multiplications would be the same ...
Python is far from optimal as its slow I would do such thing in C++ or even combine with asm if there is the need for extreme speed (like the SIMD instructions use). IIRC you still can use C++ created libs in python (link as DLL,obj,...)
However if you need fast neural network then use dedicated HW. There are neural network processing ICs out there too.

Sparse matrix design in modern C++

I want to implement a sparse matrix class using modern c++ ie 14 or 17. I know there must be some trade offs between storage and runtime efficiency. Right now I'd prefer to optimize more in terms of storage efficiency. If possible, I'd prefer more work to be done at compile time rather than runtime. For example, vector does have a lot of runtime checks so it may not be optimal. Can someone suggest a container for this? I plan to support the following operations:
Matrix Multiplication, Addition, Subtraction, Inversion and Transpose
Matrix iterators ie column row
Efficient constructors
etc
Thanks!

Will non-linear regression algorithms perform better if trained with normally distributed target values?

After finding out about many transformations that can be applied on the target values(y column), of a data set, such as box-cox transformations I learned that linear regression models need to be trained with normally distributed target values in order to be efficient.(https://stats.stackexchange.com/questions/298/in-linear-regression-when-is-it-appropriate-to-use-the-log-of-an-independent-va)
I'd like to know if the same applies for non-linear regression algorithms. For now I've seen people on kaggle use log transformation for mitigation of heteroskedasticity, by using xgboost, but they never mention if it is also being done for getting normally distributed target values.
I've tried to do some research and I found in Andrew Ng's lecture notes(http://cs229.stanford.edu/notes/cs229-notes1.pdf) on page 11 that the least squares cost function, used by many algorithms linear and non-linear, is derived by assuming normal distribution of the error. I believe if the error should be normally distributed then the target values should be as well.
If this is true then all the regression algorithms using least squares cost function should work better with normally distributed target values.
Since xgboost uses least squares cost function for node splitting(http://cilvr.cs.nyu.edu/diglib/lsml/lecture03-trees-boosting.pdf - slide 13) then maybe this algorithm would work better if I transform the target values using box-cox transformations for training the model and then apply inverse box-cox transformations on the output in order to get the predicted values.
Will this theoretically speaking give better results?
Your conjecture "I believe if the error should be normally distributed then the target values should be as well." is totally wrong. So your question does not have any answer at all since it is not a valid question.
There are no assumptions on the target variable to be Normal at all.
Getting the target variable transformed does not mean the errors are normally distributed. In fact, that may ruin normality.
I have no idea what this is supposed to mean: "linear regression models need to be trained with normally distributed target values in order to be efficient." Efficient in what way?
Linear regression models are global models. They simply fit a surface to the overall data. The operations are matrix operations, so the time to "train" the model depends only on the size of data. The distribution of the target has nothing to do with model building performance. And, it has nothing to do with model scoring performance either.
Because targets are generally not normally distributed, I would certainly hope that such a distribution is not required for a machine learning algorithm to work effectively.

Calculating the Jacobian using the Eigen library

I want to be able to compute the Jacobian matrix using the Eigen C++ library, but I cannot find any documentation on how to do this.
Previously, I have achieved this using the numdifftools package in Python. The function is:
numdifftools.Jacobian(ForwardsFunction)([input 1, input 2, input 3, ....])
Here, ForwardsFunction is a user-defined function which calculates the output state given the input state ([input 1, input 2, input 3, ...]). The numdifftools.Jacobian() method then automatically calculates the Jacobian for these input values, presumable using some automatic differentiation.
Is there an equivalent function in the Eigen library?
There are some tools in Eigen library that perform numerical differentiation.
Take a look at:
https://eigen.tuxfamily.org/dox/unsupported/group__NumericalDiff__Module.html
https://eigen.tuxfamily.org/dox/unsupported/classEigen_1_1AutoDiffScalar.html
You might notice, that those modules are "unsupported" (not a part of a official Eigen library). The reasoning is the following: Eigen is a library for linear algebra, thus manipulating with sparse and dense matrices, and numerical differentiation are a little bit on the edge of its scope. - So the priority of including them inside the library is lower. Those modules, as far as I know, are used in their solvers in a very specific way. I don't have an experience of using those Eigen::Numerical Differentiation classes in my project, though you might give it a try.
https://eigen.tuxfamily.org/dox/unsupported/classEigen_1_1NumericalDiff.html

GPU and determinism

I was thinking of off-loading some math operations to the GPU. As I'm already using D3D11 I'd use a compute shader to do the work. But the thing is I need the results to be the same for the same input, no matter what GPU the user might have. (only requirement that it supports compute shader 4.0).
So is floating point math deterministic on GPUs?
If not, do GPUs support integer math?
I haven't used DirectCompute, only OpenCL.
GPUs definitely support integer math, both 32-bit and 64-bit integers. A couple of questions already have this discussion:
Integer Calculations on GPU
Performance of integer and bitwise operations on GPU
Basically, on modern GPUs 32-bit float and integer operations are equivalent in performance.
As for deterministic results, it depends on your code. For example, if you rely on multiple threads performing atomic operations on the same memory then reading that memory from other threads and performing operations depending on that value, then results may not be exactly the same every time.
From personal experience, I needed to generate random numbers but also required consistent results. So basically I had a largish array of seeds, one for each thread, and each one was completely independent. Other random number generators which rely on atomic operations and barriers would not have been.
The other half of having deterministic results is having the same result given different hardware. With integer operations you should be fairly safe. With floating point operations in OpenCL, avoiding the fast relaxed math option and the native variants of functions would increase chances of getting the same results on different hardware.

Resources