"Warning : Non-POD class type passed through ellipsis" for simple thrust program - c++11

In spite of reading many answers on the same kind of questions on SO I am not able to figure out solution in my case. I have written the following code to implement a thrust program. Program performs simple copy and display operation.
#include <stdio.h>
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
int main(void)
{
// H has storage for 4 integers
thrust::host_vector<int> H(4);
H[0] = 14;
H[1] = 20;
H[2] = 38;
H[3] = 46;
// H.size() returns the size of vector H
printf("\nSize of vector : %d",H.size());
printf("\nVector Contents : ");
for (int i = 0; i < H.size(); ++i) {
printf("\t%d",H[i]);
}
thrust::device_vector<int> D = H;
printf("\nDevice Vector Contents : ");
for (int i = 0; i < D.size(); i++) {
printf("%d",D[i]); //This is where I get the warning.
}
return 0;
}

Thrust implements certain operations to facilitate using elements of a device_vector in host code, but this apparently isn't one of them.
There are many approaches to addressing this issue. The following code demonstrates 3 possible approaches:
explicitly copy D[i] to a host variable, and thrust has an appropriate method defined for that.
copy the thrust device_vector back to a host_vector before print-out.
use thrust::copy to directly copy the elements of the device_vector to a stream.
Code:
#include <stdio.h>
#include <iostream>
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
int main(void)
{
// H has storage for 4 integers
thrust::host_vector<int> H(4);
H[0] = 14;
H[1] = 20;
H[2] = 38;
H[3] = 46;
// H.size() returns the size of vector H
printf("\nSize of vector : %d",H.size());
printf("\nVector Contents : ");
for (int i = 0; i < H.size(); ++i) {
printf("\t%d",H[i]);
}
thrust::device_vector<int> D = H;
printf("\nDevice Vector Contents : ");
//method 1
for (int i = 0; i < D.size(); i++) {
int q = D[i];
printf("\t%d",q);
}
printf("\n");
//method 2
thrust::host_vector<int> Hnew = D;
for (int i = 0; i < Hnew.size(); i++) {
printf("\t%d",Hnew[i]);
}
printf("\n");
//method 3
thrust::copy(D.begin(), D.end(), std::ostream_iterator<int>(std::cout, ","));
std::cout << std::endl;
return 0;
}
Note that for methods like these, thrust is generating various kinds of device-> host copy operations to facilitate the use of device_vector in host code. This has performance implications, so you might want to use the defined copy operations for large vectors.

Related

How to access map in vector for building trie

#include <string>
#include <iostream>
#include <vector>
#include <map>
using std::map;
using std::vector;
using std::string;
typedef map<char, int> edges;
typedef vector<edges> trie;
trie build_trie(vector<string> & patterns) {
trie t;
// write your code here
for(int j=0;j<patterns.size();j++){
//current NOde = root;
int currNI=0;
for(int i=0;i<patterns[j].size();i++){
char currS = patterns[j][i];
auto it = t[currNI].begin();
//auto mit = it->edges.begin();
if(t[currNI].edges.find(currS)!=t[currNI].edges.end()){
currNI = t[currNI].edges.find(currS)->second;
}else{
t.push_back(edges.insert(currS,t.size()));
t[currNI].edges.insert(currS,t.size());
currNI = t.size();
}
}
}
return t;
}
int main() {
size_t n;
std::cin >> n;
vector<string> patterns;
for (size_t i = 0; i < n; i++) {
string s;
std::cin >> s;
patterns.push_back(s);
}
trie t = build_trie(patterns);
for (size_t i = 0; i < t.size(); ++i) {
for (const auto & j : t[i]) {
std::cout << i << "->" << j.second << ":" << j.first << "\n";
}
}
return 0;
}
Hi I was trying to build trie using vector and map but I am not able to access elements of map.
I have also added image of pseudo code for better clarity.
I have used iterators and many other tricks that are already present on stackoverflow and on other platform but somehow I am not able to access what I want.
Thank You
To answer your questions: in your code, edges is a type, not an object.
t[currNI] is of type edges, thus is a map<char, int>.
You should try t[currNI].find(currS) and t[currNI].end() directly.
NB: after you fix this, there are still other errors in your code.

Matrix Multiplication OpenMP Counter-Intuitive Results

I am currently porting some code over to OpenMP at my place of work. One of the tasks I am doing is figuring out how to speed up matrix multiplication for one of our applications.
The matrices are stored in row-major format, so A[i*cols +j] gives the A_i_j element of the matrix A.
The code looks like this (uncommenting the pragma parallelises the code):
#include <omp.h>
#include <iostream>
#include <iomanip>
#include <stdio.h>
#define NUM_THREADS 8
#define size 500
#define num_iter 10
int main (int argc, char *argv[])
{
// omp_set_num_threads(NUM_THREADS);
int *A = new int [size*size];
int *B = new int [size*size];
int *C = new int [size*size];
for (int i=0; i<size; i++)
{
for (int j=0; j<size; j++)
{
A[i*size+j] = j*1;
B[i*size+j] = i*j+2;
C[i*size+j] = 0;
}
}
double total_time = 0;
double start = 0;
for (int t=0; t<num_iter; t++)
{
start = omp_get_wtime();
int i, k;
// #pragma omp parallel for num_threads(10) private(i, k) collapse(2) schedule(dynamic)
for (int j=0; j<size; j++)
{
for (i=0; i<size; i++)
{
for (k=0; k<size; k++)
{
C[i*size+j] += A[i*size+k] * B[k*size+j];
}
}
}
total_time += omp_get_wtime() - start;
}
std::setprecision(5);
std::cout << total_time/num_iter << std::endl;
delete[] A;
delete[] B;
delete[] C;
return 0;
}
What is confusing me is the following: why is dynamic scheduling faster than static scheduling for this task? Timing the runs and taking an average shows that static scheduling is slower, which to me is a bit counterintuitive since each thread is doing the same amount of work.
Also, am I correctly speeding up my matrix multiplication code?
Parallel matrix multiplication is non-trivial (have you even considered cache-blocking?). Your best bet is likely to be to use a BLAS Library for this, rather than writing it yourself. (Remember, "The best code is the code I do not have to write").
Wikipedia: Basic Linear Algebra Subprograms points to many implementations, a lot of which (including Intel Math Kernel Library) have free licenses.

C++11 app that uses dispatch_apply not working under Mac OS Sierra

I had a completely functioning codebase written in C++11 that used Grand Central Dispatch parallel processing, specifically dispatch_apply to do the basic parallel for loop for some trivial game calculations.
Since upgrading to Sierra, this code still runs, but each block is run in serial -- the cout statement shows that they are being executed in serial order, and CPU usage graph shows no parallel working on.
Queue is defined as:
workQueue = dispatch_queue_create("workQueue", DISPATCH_QUEUE_CONCURRENT);
And the relevant program code is:
case Concurrency::Parallel: {
dispatch_apply(stateMap.size(), workQueue, ^(size_t stateIndex) {
string thisCode = stateCodes[stateIndex];
long thisCount = stateCounts[stateIndex];
GameResult sliceResult = playStateOfCode(thisCode, thisCount);
results[stateIndex] = sliceResult;
if ((stateIndex + 1) % updatePeriod == 0) {
cout << stateIndex << endl;
}
});
break;
}
I strongly suspect that this either a bug, but if this is GCD forcing me to use new C++ methods for this, I'm all ears.
I'm not sure if it is a bug in Sierra or not. But it seems to work if you explicitly associate a global concurrent queue as target:
dispatch_queue_t target =
dispatch_get_global_queue(QOS_CLASS_USER_INITIATED, 0);
dispatch_queue_t workQueue =
dispatch_queue_create_with_target("workQueue", DISPATCH_QUEUE_CONCURRENT, target);
// ^~~~~~~~~~~ ^~~~~~
Here is a working example
#include <iostream>
#include <fstream>
#include <vector>
#include <cmath>
#include <sstream>
#include <dispatch/dispatch.h>
void load_problem(const std::string, std::vector<std::pair<double,double>>&);
int main() {
// n-factor polynomial - test against a given problem provided as a set of space delimited x y values in 2d.txt
std::vector<std::pair<double,double>> problem;
std::vector<double> test = {14.1333177226503,-0.0368874860476915,
0.0909424058436257,2.19080982673558,1.24632025036125,0.0444549880462031,
1.06824631867947,0.551482840616757, 1.04948148731933};
load_problem("weird.txt",problem); //a list of space delimited doubles representing x, y.
size_t a_count = test.size();
dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
__block double diffs = 0.0; //sum of all values..
dispatch_apply(problem.size(), queue, ^(size_t i) {
double g = 0;
for (size_t j=0; j < a_count - 1; j++) {
g += test[j]*pow(problem[i].first,a_count - j - 1);
}
g += test[a_count - 1];
diffs += pow(g - problem[i].second,2);
});
double delta = 1/(1+sqrt(diffs));
std::cout << "test: fit delta: " << delta << std::endl;
}
void load_problem(const std::string file, std::vector<std::pair<double,double>>& repo) {
repo.clear();
std::ifstream ifs(file);
if (ifs.is_open()) {
std::string line;
while(getline(ifs, line)) {
double x= std::nan("");
double y= std::nan("");
std::istringstream istr(line);
istr >> std::skipws >> x >> y;
if (!isnan(x) && !isnan(y)) {
repo.push_back({x, y});
};
}
ifs.close();
}
}

Save state of c++11 random generator without using iostream

What is the best way to store the state of a C++11 random generator without using the iostream interface. I would like to do like the first alternative listed here[1]? However, this approach requires that the object contains the PRNG state and only the PRNG state. In partucular, it fails if the implementation uses the pimpl pattern(at least this is likely to crash the application when reloading the state instead of loading it with bad data), or there are more state variables associated with the PRNG object that does not have to do with the generated sequence.
The size of the object is implementation defined:
g++ (tdm64-1) 4.7.1 gives sizeof(std::mt19937)==2504 but
Ideone http://ideone.com/41vY5j gives 2500
I am missing member functions like
size_t state_size();
const size_t* get_state() const;
void set_state(size_t n_elems,const size_t* state_new);
(1) shall return the size of the random generator state array
(2) shall return a pointer to the state array. The pointer is managed by the PRNG.
(3) shall copy the buffer std::min(n_elems,state_size()) from the buffer pointed to by state_new
This kind of interface allows more flexible state manipulation. Or are there any PRNG:s whose state cannot be represented as an array of unsigned integers?
[1]Faster alternative than using streams to save boost random generator state
I've written a simple (-ish) test for the approach I mentioned in the comments of the OP. It's obviously not battle-tested, but the idea is represented - you should be able to take it from here.
Since the amount of bytes read is so much smaller than if one were to serialize the entire engine, the performance of the two approaches might actually be comparable. Testing this hypothesis, as well as further optimization, are left as an exercise for the reader.
#include <iostream>
#include <random>
#include <chrono>
#include <cstdint>
#include <fstream>
using namespace std;
struct rng_wrap
{
// it would also be advisable to somehow
// store what kind of RNG this is,
// so we don't deserialize an mt19937
// as a linear congruential or something,
// but this example only covers mt19937
uint64_t seed;
uint64_t invoke_count;
mt19937 rng;
typedef mt19937::result_type result_type;
rng_wrap(uint64_t _seed) :
seed(_seed),
invoke_count(0),
rng(_seed)
{}
rng_wrap(istream& in) {
in.read(reinterpret_cast<char*>(&seed), sizeof(seed));
in.read(reinterpret_cast<char*>(&invoke_count), sizeof(invoke_count));
rng = mt19937(seed);
rng.discard(invoke_count);
}
void discard(unsigned long long z) {
rng.discard(z);
invoke_count += z;
}
result_type operator()() {
++invoke_count;
return rng();
}
static constexpr result_type min() {
return mt19937::min();
}
static constexpr result_type max() {
return mt19937::max();
}
};
ostream& operator<<(ostream& out, rng_wrap& wrap)
{
out.write(reinterpret_cast<char*>(&(wrap.seed)), sizeof(wrap.seed));
out.write(reinterpret_cast<char*>(&(wrap.invoke_count)), sizeof(wrap.invoke_count));
return out;
}
istream& operator>>(istream& in, rng_wrap& wrap)
{
wrap = rng_wrap(in);
return in;
}
void test(rng_wrap& rngw, int count, bool quiet=false)
{
uniform_int_distribution<int> integers(0, 9);
uniform_real_distribution<double> doubles(0, 1);
normal_distribution<double> stdnorm(0, 1);
if (quiet) {
for (int i = 0; i < count; ++i)
integers(rngw);
for (int i = 0; i < count; ++i)
doubles(rngw);
for (int i = 0; i < count; ++i)
stdnorm(rngw);
} else {
cout << "Integers:\n";
for (int i = 0; i < count; ++i)
cout << integers(rngw) << " ";
cout << "\n\nDoubles:\n";
for (int i = 0; i < count; ++i)
cout << doubles(rngw) << " ";
cout << "\n\nNormal variates:\n";
for (int i = 0; i < count; ++i)
cout << stdnorm(rngw) << " ";
cout << "\n\n\n";
}
}
int main(int argc, char** argv)
{
rng_wrap rngw(123456790ull);
test(rngw, 10, true); // this is just so we don't start with a "fresh" rng
uint64_t seed1 = rngw.seed;
uint64_t invoke_count1 = rngw.invoke_count;
ofstream outfile("rng", ios::binary);
outfile << rngw;
outfile.close();
cout << "Test 1:\n";
test(rngw, 10); // test 1
ifstream infile("rng", ios::binary);
infile >> rngw;
infile.close();
cout << "Test 2:\n";
test(rngw, 10); // test 2 - should be identical to 1
return 0;
}

C++11: How to Get A Multidimensional Array Through vector and to Assign it to auto?

I am a lazy programmer. I want to use C++ vector to create a multidimensional array. For example, this code create a 3x2 2D array:
int nR = 3;
int nC = 2;
vector<vector<double> > array2D(nR);
for(int c = 0; c < nC; c++)
array2D.resize(nC, 0);
However, I am too lazy to
declare array2D's data type: vector<vector<double> >
C++ auto could solve this problem.
However, I am too lazy to
write loop(s) to allocate the space(s) for each object like array2D.
Writing a function could solve this problem.
However, I am too lazy to
write each function for each N-dimensional array.
write nested N-1 loops for allocating spaces.
wirte each function for each data type.
The C++11 variadic template with function recursion could solve this problem.
Is it possible ...?
This is what you want. (Tested on Microsoft Visual C++ 2013 Update 1)
#include <iostream>
#include <vector>
using namespace std;
template<class elemType> inline vector<elemType> getArrayND(int dim) {
// Allocate space and initialize all elements to 0s.
return vector<elemType>(dim, 0);
}
template<class elemType, class... Dims> inline auto getArrayND(
int dim, Dims... resDims
) -> vector<decltype(getArrayND<elemType>(resDims...))> {
// Allocate space for this dimension.
auto parent = vector<decltype(getArrayND<elemType>(resDims...))>(dim);
// Recursive to next dimension.
for (int i = 0; i < dim; i++) {
parent[i] = getArrayND<elemType>(resDims...);
}
return parent;
}
int main() {
auto test3D = getArrayND<double>(2, 3, 4);
auto test4D = getArrayND<double>(2, 3, 4, 2);
test3D[0][0][1] = 3;
test4D[1][2][3][1] = 5;
cout << test3D[0][0][1] << endl;
cout << test4D[1][2][3][1] << endl;
return 0;
}

Resources