Is there a more elegant solution than to copy the values point to point?!
Something like this works for a 1D vector...
vector<float> vec(mat.data(), mat.data() + mat.rows() * mat.cols());
I tried various other alternatives that were suggested by the GCC compiler for vector< vector > but nothing worked out...
Eigen::MatrixXf uses efficient linear memory, while a vector of vector would represent a very different datatype.
For multidimentional vector, you would therefore have to read the matrix block by block and copy those values to the outmost vectors.
Another way would be to copy the values to a vector based class with specific accessors ... but that would end up reconstructing a Matrix like class.
Why do you want to do that ? What kind of access are you trying to provide ? Maybe you should rather try to do similar access using the eigen::matrix interface
Conversion
Eigen::MatrixXf m(2,3);
std::vector<std::vector<T>> v;
for (int i=0; i<m.rows(); ++i)
{
const float* begin = &m.row(i).data()[0];
v.push_back(std::vector<float>(begin, begin+m.cols()));
}
Related
I'm a total newbie to OpenCL.
I'm trying to code a reduction kernel that sums along one axis for a multi-dimensional array. I have stumbled upon that code which comes from here: https://tmramalho.github.io/blog/2014/06/16/parallel-programming-with-opencl-and-python-parallel-reduce/
__kernel void reduce(__global float *a, __global float *r, __local float *b) {
uint gid = get_global_id(0);
uint wid = get_group_id(0);
uint lid = get_local_id(0);
uint gs = get_local_size(0);
b[lid] = a[gid];
barrier(CLK_LOCAL_MEM_FENCE);
for(uint s = gs/2; s > 0; s >>= 1) {
if(lid < s) {
b[lid] += b[lid+s];
}
barrier(CLK_LOCAL_MEM_FENCE);
}
if(lid == 0) r[wid] = b[lid];
}
I don't understand the for loop part. I get that uint s = gs/2 means that we split the array in half, but then it is a complete mystery. Without understanding it, I can't really implement another version for taking the maximum of an array for instance, even less for multi-dimensional arrays.
Furthermore, as far as I understand, the reduce kernel needs to be rerun another time if "N is bigger than the number of cores in a single unit".
Could you give me further explanations on that whole piece of code? Or even guidance on how to implement it for taking the max of an array?
Complete code can be found here: https://github.com/tmramalho/easy-pyopencl/blob/master/008_localreduce.py
Your first question about the meaning of the for loop:
for(uint s = gs/2; s > 0; s >>= 1)
It means that you divide the local size gs by 2, and keep dividing by 2 (the shift part s >>= 1 is equivalent to s = s/2) while s > 0, in other words, until s = 1. This algorithm depends on your array's size being a power of 2, otherwise you'd have to deal with the excess of a power of 2 until you have reduced the whole array, or you'd have to fill your array with neutral values for the reduction until completing a power of 2 size.
Your second concern when N is bigger than the capacity of your GPU, you are right: you have to run your reduction in portions that fit and then merge the results.
Finally, when you ask for guidance on how to implement a reduction to get the max of an array, I would suggest the following:
For a simple reduction like max or sum, try using numpy, especially if you are dealing with programming the reduction by axis.
If you think that the GPU would give you an advantage, try first using pyopencl's Multidimensional Array functionality, e.g. max.
If the reduction is more math intensive, try using pyopencl's Parallel Algorithms, e.g. reduction
I think that the whole point of using pyopencl is to avoid dealing with the underlying GPU's architecture. Otherwise, it is easier to deal with CUDA or HIP directly instead of OpenCL.
I have a flattened (1D) U32 encoded image array which has r, b, and b 8-bit channel values encoded into the first 24 bits of each U32. I would like to expand this array into and array of U8s that each store a separate r, g, or b value (0-255). The issue is that I need this to happen really fast (hundreds of times per second on an old computer) and the method I created is slow.
I am a novice at labview, so I am not exactly sure what a faster way to do this is.
I have successfully accomplished this by creating a U8 array, iterating through each index of the U32 Image array and assigning the corresponding 3 rgb values to the appropriate index in the U8 array using a shift-register. I attempted to use the In Place Element Structure (which would presumably not require copying data in between loops like shift), but I did not know how to make it work inside the loop and when I tried to return the last array from the loop, only the last element was modified.
Here is the first, working method I described above:
In c/c++, it would be pretty simple (something like this):
uint8_t* convert_img(uint32_t img[640*480]){
uint8_t *img_u8 = new uint8_t[640*480*3];
for (int i=0; i<640*480; ++i){
img_u8[i*3] = img[i] & 0xff; // r
img_u8[i*3 + 1] = (img[i] >> 8) & 0xff; // g
img_u8[i*3 + 2] = (img[i] >> 16) & 0xff; // b
}
return img_u8;
}
The working labview example above only runs at 20 Hz! I think this is super slow for such a simple operation. Does anyone with more experience have a suggestion of how I can make this happen quickly with labview code?
I would do it like this:
U32 to U8
The steps are:
Flatten To String - endian chooses which order the bytes are in
Unflatten From String - into a 1D U8 array
Decimate 1D Array - creates 4 1D arrays
Reshape Array - turns into 640x480 arrays
Should be plenty fast enough.
I expect that the fastest option would be using the Split Numbers primitive to break the U32s into U8s, but you would need to actually test:
Also note that testing performance is not as easy as you might think, although if you're looking at the overall rate, you're probably fine with the basic testing.
I have a data array (double *) in memory which looks like:
[x0,y0,z0,junk,x1,y1,z1,junk,...]
I would like to map it to an Eigen vector and virtually remove the junk values by doing something like:
Eigen::Map<
Eigen::Matrix<double, Eigen::Dynamic, 1, Eigen::ColMajor>,
Eigen::Unaligned,
Eigen::OuterStride<4>
>
But it does not work because the outerstride seems to be restricted to 2D matrices.
Is there a trick to do what I want?
Many thanks!
With the head of Eigen, you can map it as a 2D matrix and then view it as a 1D vector:
auto m1 = Matrix<double,3,Dynamic>::Map(ptr, 3, n, OuterStride<4>());
auto v = m1.reshaped(); // new in future Eigen 3.4
But be aware accesses to such a v involve costly integer division/modulo.
If you want a solution compatible with Eigen 3.3, you can do something like this
VectorXd convert(double const* ptr, Index n)
{
VectorXd res(n*3);
Matrix3Xd::Map(res.data(), 3, n) = Matrix4Xd::Map(ptr, 4, n).topRows<3>();
return res;
}
But this of course would copy the data, which you probably intended to avoid.
Alternatively, you should think about whether it is possible to access your data as a 3xN array/matrix instead of a flat vector (really depends on what you are actually doing).
I have a large genetic dataset (X, Y coordinates), of which I can easily know one dimension (X) during runtime.
I drafted the following code for a matrix class which allows to specify the size of one dimension, but leaves the other one dynamic by implementing std::vector. Each vector is new'd using unique_ptr, which is embedded in a C-style array, also with new and unique_ptr.
class Matrix
{
private:
typedef std::vector<Genotype> GenVec;
typedef std::unique_ptr<GenVec> upGenVec;
std::unique_ptr<upGenVec[]> m;
unsigned long size_;
public:
// ...
// construct
Matrix(unsigned long _size): m(new upGenVec[_size]), size_(_size)
{
for (unsigned long i = 0; i < this->size_; ++i)
this->m[i] = upGenVec(new GenVec);
}
};
My question:
Does it make sense to use this instead of std::vector< std::vector<Genotype> > ?
My reasoning behind this implementation is that I only require one dimension to be dynamic, while the other should be fixed. Using std::vectors could imply more memory allocation than needed. As I am working with data that would fill up estimated ~50GB of RAM, I would like to control memory allocation as much as I can.
Or, are there better solutions?
I won't cite any paragraphs from specification, but I'm pretty sure that std::vector memory overhead is fixed, i.e. it doesn't depend on number of elements it contains. So I'd say your solution with C-style array is actually worse memory-wise, because what you allocate, excluding actual data, is:
N * pointer_size (first dimension array)
N * vector_fixed_size (second dimension vectors)
In vector<vector<...>> solution what you allocate is:
1 * vector_fixed_size (first dimension vector)
N * vector_fixed_size (second dimension vectors)
How can i initialize a 2D vector using an initialization list?
for a normal vector doing :
vector<int> myvect {1,2,3,4};
would suffice. But for a 2D one doing :
vector<vector<int>> myvect{ {10,20,30,40},
{50,60,70,80}
};
What is a correct way of doing it?
And how can i iterate through it using for?
for(auto x: myvect)
{
cout<<x[j++]<<endl;
}
this for only shows:
10,1 !
And by the way what does this mean ?
vector<int> myvect[5] {1,2,3,4};
i saw it here and cant understand it! Link
What is a correct way of doing it?
The way you showed is a possible way. You could also use:
vector<vector<int>> myvect = { {10,20,30,40},
{50,60,70,80} };
vector<vector<int>> myvect{ vector<int>{10,20,30,40},
vector<int>{50,60,70,80} };
The first one constructs a std::initializer_list<std::vector<int>> where the elements are directly initialized from the inner braced-initializer-lists. The second one explicitly constructs temporary vectors which then are moved into a std::initializer_list<std::vector<int>>. This will probably not make a difference, since that move can be elided.
In any way, the elements of the std::initializer_list<std::vector<int>> are copied back out into myvect (you cannot move out of a std::initializer_list).
And how can i iterate through it using for?
You essentially have a vector of vectors, therefore you need two loops:
for(vector<int> const& innerVec : myvect)
{
for(int element : innerVec)
{
cout << element << ',';
}
cout << endl;
}
I refrained from using auto to explicitly show the resulting types.
And by the way what does this mean ?
This is probably a typo. As it stands, it's illegal. The declaration vector<int> myvect[5]; declares an array of 5 vector<int>. The following list-initialization therefore needs to initialize the array, but the elements of this list are not implicitly convertible to vector<int> (there's a ctor that takes a size_t, but it's explicit).
That has already been pointed out in the comments of that side.
I guess the author wanted to write std::vector<int> vArray = {3, 2, 7, 5, 8};.