So, this time, I have a matrix called zvalue[10][10], and I would like to draw a pm3d map, with normal x-y grids, and with zvalue[i][j] to be the value of point (i,j).
In gnuplot, I just use:
set pm3d map
splot zvalue matrix using 1:2:3 with pm3d
And in C++ gnuplot pipes, I can use a function gp.file1d() to achieve this.
gp << "splot" << gp.file1d(matrix) << "matrix using 2:1:3" << "with pm3d\n"
But now, I am in C gnuplot pipe, of course I can write the matrix into a file called zvalue.txt, and use the following:
fprintf(gp, "splot \"zvalue.txt\" matrix using 1:2:3 with pm3d\n")
But is there other way? I tried #Christ's suggestion when I deal with normal splot with matrix, that is do something like:
int main(void)
{
FILE *gp = popen(GNUPLOT, "w");
fprintf(gp, "splot '-' matrix using 1:2:3\n");
int i, j;
for (i = 0; i < 10; i++) {
for (j = 0; j < 10; j++)
fprintf(gp, "%d ", i*j);
fprintf(gp, "\n");
}
pclose(gp);
return 0;
}
But when it's pm3d, it does not work well. I also tried with splot point by point, which works well for nomal splot with matrix, but here with pm3d, nothing works.
The following script works fine with pm3d:
#include <stdio.h>
#define GNUPLOT "gnuplot -persist"
int main(void)
{
FILE *gp = popen(GNUPLOT, "w");
fprintf(gp, "set pm3d map\n");
fprintf(gp, "splot '-' matrix using 1:2:3 with pm3d\n");
int i, j;
for (i = 0; i < 10; i++) {
for (j = 0; j < 10; j++)
fprintf(gp, "%d ", i*j);
fprintf(gp, "\n");
}
pclose(gp);
return 0;
}
So if you have a program which makes use of pm3d and doesn't work, it would really help to see exactly that full file (so that one can copy&paste the code to compile and test it).
Related
I am trying to run a c program using cuda the code does some math operations on an array of consecutive numbers (where every thread add elements of a row and check the last array element and return a value of the sum or zero if the conditions are met).
I don't have NVIDIA GPU so I wrote my code on google colab notebook.
The problem that I have encountered was not being able to debug the program. It outputs nothing at all no error messages and no output.
There's something wrong with the code but I cannot know where after reviewing it a few times.
Here's the code:
#include <iostream>
__global__ void matrixadd(int *l,int *result,int digits ,int possible_ids )
{
int sum=0;
int zeroflag=1;
int identicalflag=1;
int id= blockIdx .x * blockDim .x + threadIdx .x;
if(id<possible_ids)
{
if (l[(digits*id)+digits-1]==0) zeroflag=0;/*checking if the first number is zero*/
for(int i=0; i< digits-1;i++)/*edited:for(int i=0; i< digits;i++) */
{
if(l[(digits*id)+i]-l[(digits*id)+i+1]==0)
identicalflag+=1; /* checking if 2 consequitive numbers are identical*/
sum = sum + l[(digits*id)+i]; /* finding the sum*/
}
if (identicalflag!=1)identicalflag=0;
result[id]=sum*zeroflag*identicalflag;
}
}
int main()
{
int digits=6;
int possible_ids=pow(10,digits);
/*populate the array */
int* a ;
a= (int *)malloc((possible_ids * digits) * sizeof(int));
int the_id,temp=possible_ids;
for (int i = 0; i < possible_ids; i++)
{
temp--;
the_id=temp;
for (int j = 0; j < digits; j++)
{
a[i * digits + j] = the_id % 10;
if(the_id !=0) the_id /= 10;
}
}
/*the numbers will appear in reversed order */
/*allocate memory on host and device then invoke the kernel function*/
int *d_a,*d_c,*c;
int size=possible_ids * digits;
c= (int *)malloc(possible_ids * sizeof(int));/*results matrix*/
cudaMalloc((void **)&d_a,size*sizeof(int));
cudaMemcpy(d_a,a,size*sizeof(int),cudaMemcpyHostToDevice);
cudaMalloc((void **)&d_c,possible_ids*sizeof(int));
/*EDITED: cudaMalloc((void **)&d_c,digits*sizeof(int));*/
matrixadd<<<ceil(possible_ids/1024.0),1024>>>(d_a,d_c,digits,possible_ids);
cudaMemcpy(c,d_c,possible_ids*sizeof(int),cudaMemcpyDeviceToHost);
int acc=0;
for (int k=0;k<possible_ids;k++)
{
if (c[k]==7||c[k]==17||c[k]==11||c[k]==15)continue;
acc += c[k];
}
printf("The number of possible ids %d",acc);
}
You are doing invalid indexing into the l array in this line of code: if(l[(digits*id)+i]-l[(digits*id)+i+1]==0)
From comment by Robert Covella
If you are using python code, you can use 'pdb' built-in breakpoint function. put the following line of command at the top of your script.
import pdb
then before the line, you want to debug put the following command
pdb.set_trace()
you will get '(Pdb), then empty box' to insert the command. If you want to continue to the next line put 'n' or you can use 's' to see the detailed work of your current line command.
Suppose you are interested in debugging python code. Enjoy it!
For accessing single point, I am using this line of code and it works
int intensity = gray_image.at<uchar>(Point(100, 100));
However when I use this code to access all the pixels in image, it gives memory error,
for (int i = 0; i < gray_image.rows;i++)
{
for (int j = 0; j < gray_image.cols; j++) {
intensity += gray_image.at<uchar>(Point(i, j));
}
}
When I run above code, it does not give compile time error but gives memory exception. Where am I going wrong?
You can just skip the use of Point and do the following.
for (int i = 0; i < gray_image.rows;i++)
{
for (int j = 0; j < gray_image.cols; j++) {
intensity += gray_image.at<uchar>(i, j);
}
}
You're requesting a pixel (j,i) that doesn't exist. This wouldn't have been an error in a square image (where the number of rows = number of columns), but you're using a rectangular image.
The Mat::at function has multiple prototypes, the two that you're concerned with are:
C++: template<typename T> T& Mat::at(int i, int j)
C++: template<typename T> T& Mat::at(Point pt)
The documentation for Mat::at states that Point pt is defined as the Element position specified as Point(j,i), so you've effectively swapped your rows and columns.
The reason this happens is because the image is stored in a 1D array of pixels, and to get a pixel Point (r,c) is translated to p = r * image.cols + c;
Consider the following code.
const int N = 100;
const float alpha = 0.9;
Eigen::MatrixXf myVec = Eigen::MatrixXf::Random(N,1);
Eigen::MatrixXf symmetricMatrix(N, N);
for(int i=0; i<N; i++)
for(int j=0; j<=i; j++)
symmetricMatrix(i,j) = symmetricMatrix(j,i) = i+j;
symmetricMatrix *= alpha;
symmetricMatrix += ((1-alpha)*myVec*myVec.adjoint());
It essentially implements the exponential averaging.
I know that the last line may be optimized in the following way.
symmetricMatrix_copy.selfadjointView<Eigen::Upper>().rankUpdate(myVec, 1-alpha);
I would like to know whether I can combine the last two lines in an efficient way.
In short, I would like to compute A = alpha*A+(1-alpha)*(x*x').
Most importantly, you should declare myVec as Eigen::VectorXf, if it is guaranteed to be a vector. And make sure you compile with -O3 -march=native -DNDEBUG.
You can try these alternatives (I'm using A and v to save typing), which one is the fastest likely depends on your problem-size and your CPU:
A.noalias() = alpha * A + (1.0f-alpha)*v*v.adjoint();
A.noalias() = alpha * A + (1.0f-alpha)*v.lazyProduct(v.adjoint());
A.noalias() = alpha * A + ((1.0f-alpha)*v).lazyProduct(v.adjoint());
A.noalias() = alpha * A + v.lazyProduct((1.0f-alpha)*v.adjoint());
A.triangularView<Eigen::Upper>() = alpha * A + (1.0f-alpha)*v*v.adjoint();
// or any `lazyProduct` as above.
Unfortunately, .noalias() and .triangularView() can't be combined at the moment.
You can also consider computing this:
A.selfadjointView<Eigen::Upper>().rankUpdate(v, (1.0f-alpha)/alpha);
and every N iterations scale your A matrix by pow(alpha, N)
I've got an error, regarding calling JacobiSVD in my cuda function.
This is the part of the code that causing the error.
Eigen::JacobiSVD<Eigen::Matrix3d> svd( cov_e, Eigen::ComputeThinU | Eigen::ComputeThinV);
And this is the error message.
CUDA_voxel_building.cu(43): error: calling a __host__
function("Eigen::JacobiSVD , (int)2> ::JacobiSVD") from a __global__
function("kernel") is not allowed
I've used the following command to compile it.
nvcc -std=c++11 -D_MWAITXINTRIN_H_INCLUDED -D__STRICT_ANSI__ -ptx CUDA_voxel_building.cu
I'm using code 8.0 with eigen3 on ubuntu 16.04.
It seems like other functions such as eigen value decomposition also gives the same error.
Anyone knows a solution? I'm enclosing my code below.
//nvcc -ptx CUDA_voxel_building.cu
#include </usr/include/eigen3/Eigen/Core>
#include </usr/include/eigen3/Eigen/SVD>
/*
#include </usr/include/eigen3/Eigen/Sparse>
#include </usr/include/eigen3/Eigen/Dense>
#include </usr/include/eigen3/Eigen/Eigenvalues>
*/
__global__ void kernel(double *p, double *breaks,double *ind, double *mu, double *cov, double *e,double *v, int *n, char *isgood, int minpts, int maxgpu){
bool debuginfo = false;
int idx = threadIdx.x + blockIdx.x * blockDim.x;
if(debuginfo)printf("Thread %d got pointer\n",idx);
if( idx < maxgpu){
int s_ind = breaks[idx];
int e_ind = breaks[idx+1];
int diff = e_ind-s_ind;
if(diff >minpts){
int cnt = 0;
Eigen::MatrixXd local_p(3,diff) ;
for(int k = s_ind;k<e_ind;k++){
int temp_ind=ind[k];
//Eigen::Matrix<double, 3, diff> local_p;
local_p(1,cnt) = p[temp_ind*3];
local_p(2,cnt) = p[temp_ind*3+1];
local_p(3,cnt) = p[temp_ind*3+2];
cnt++;
}
Eigen::Matrix3d centered = local_p.rowwise() - local_p.colwise().mean();
Eigen::Matrix3d cov_e = (centered.adjoint() * centered) / double(local_p.rows() - 1);
Eigen::JacobiSVD<Eigen::Matrix3d> svd( cov_e, Eigen::ComputeThinU | Eigen::ComputeThinV);
/* Eigen::Matrix3d Cp = svd.matrixU() * svd.singularValues().asDiagonal() * svd.matrixV().transpose();
mu[idx]=p[ind[s_ind]*3];
mu[idx+1]=p[ind[s_ind+1]*3];
mu[idx+2]=p[ind[s_ind+2]*3];
e[idx]=svd.singularValues()(0);
e[idx+1]=svd.singularValues()(1);
e[idx+2]=svd.singularValues()(2);
n[idx] = diff;
isgood[idx] = 1;
for(int x = 0; x < 3; x++)
{
for(int y = 0; y < 3; y++)
{
v[x+ 3*y +idx*9]=svd.matrixV()(x, y);
cov[x+ 3*y +idx*9]=cov_e(x, y);
//if(debuginfo)printf("%f ",R[x+ 3*y +i*9]);
if(debuginfo)printf("%f ",Rm(x, y));
}
}
*/
} else {
mu[idx]=0;
mu[idx+1]=0;
mu[idx+2]=0;
e[idx]=0;
e[idx+1]=0;
e[idx+2]=0;
n[idx] = 0;
isgood[idx] = 0;
for(int x = 0; x < 3; x++)
{
for(int y = 0; y < 3; y++)
{
v[x+ 3*y +idx*9]=0;
cov[x+ 3*y +idx*9]=0;
}
}
}
}
}
First of all, Ubuntu 16.04 provides Eigen 3.3-beta1, which is not really recommended to be used. I would suggest upgrading to a more recent version. Furthermore, to include Eigen, write (e.g.):
#include <Eigen/Eigenvalues>
and compile with -I /usr/include/eigen3 (if you use the version provided by the OS), or better -I /path/to/local/eigen-version.
Then, as talonmies noted, you can't call host-functions from kernels, (I'm not sure at the moment, why JacobiSVD is not marked as device function), but in your case it would make much more sense to use Eigen::SelfAdjointEigenSolver, anyway. Since the matrix you are decomposing is fixed-size 3x3 you should actually use the optimized computeDirect method:
Eigen::SelfAdjointEigenSolver<Eigen::Matrix3d> eig; // default constructor
eig.computeDirect(cov_e); // works for 2x2 and 3x3 matrices, does not require loops
It seems the computeDirect even works on the beta version provided by Ubuntu (I'd still recommend to update).
Some unrelated notes:
The following is wrong, since you should start with index 0:
local_p(1,cnt) = p[temp_ind*3];
local_p(2,cnt) = p[temp_ind*3+1];
local_p(3,cnt) = p[temp_ind*3+2];
Also, you can write this in one line:
local_p.col(cnt) = Eigen::Vector3d::Map(p+temp_ind*3);
This line will not fit (unless diff==3):
Eigen::Matrix3d centered = local_p.rowwise() - local_p.colwise().mean();
What you probably mean is (local_p is actually 3xn not nx3)
Eigen::Matrix<double, 3, Eigen::Dynamic> centered = local_p.colwise() - local_p.rowwise().mean();
And when computing cov_e you need to .adjoint() the second factor, not the first.
You can avoid both 'big' matrices local_p and centered, by directly accumulating Eigen::Matrix3d sum2 and Eigen::Vector3d sum with sum2 += v*v.adjoint() and sum +=v and computing
Eigen::Vector3d mu = sum / diff;
Eigen::Matrix3d cov_e = (sum2 - mu*mu.adjoint()*diff)/(diff-1);
I want to parallelize an OpenMP raytracing algorithm that contains two for loops.
Is there anything more I can do than just setting omp_set_num_threads(omp_get_max_threads()) and putting #pragma omp parallel for in front of the first for loop?
So far I've reached a 2.13-times faster algorithm.
Code:
start = omp_get_wtime();
#pragma omp parallel for
for (int i = 0; i < (viewport.xvmax - viewport.xvmin); i++)
{
for (int j = 0; j < (viewport.yvmax - viewport.yvmin); j++)
{
int intersection_object = -1; // none
int reflected_intersection_object = -1; // none
double current_lambda = 0x7fefffffffffffff; // maximum positive double
double current_reflected_lambda = 0x7fefffffffffffff; // maximum positive double
RAY ray, shadow_ray, reflected_ray;
PIXEL pixel;
SPHERE_INTERSECTION intersection, current_intersection, shadow_ray_intersection, reflected_ray_intersection, current_reflected_intersection;
double red, green, blue;
double theta, reflected_theta;
bool bShadow = false;
pixel.i = i;
pixel.j = j;
// 1. compute ray:
compute_ray(&ray, &view_point, &viewport, &pixel, &camera_frame, focal_distance);
// 2. check if ray hits an object:
for (int k = 0; k < NSPHERES; k++)
{
if (sphere_intersection(&ray, &sphere[k], &intersection))
{
// there is an intersection between ray and object
// 1. Izracunanaj normalu...
intersection_normal(&sphere[k], &intersection, &ray);
// 2. ako je lambda presjecista manji od trenutacnog:
if (intersection.lambda_in < current_lambda)
{
current_lambda = intersection.lambda_in;
intersection_object = k;
copy_intersection_struct(¤t_intersection, &intersection);
}
// izracunaj current lambda current_lambda =
// oznaci koji je trenutacni object : intersection_object =
// kopiraj strukturu presjeka : copy_intersection_struct();
}
}
// Compute the color of the pixel:
if (intersection_object > -1)
{
compute_shadow_ray(&shadow_ray, &intersection, &light);
theta = dotproduct(&(shadow_ray.direction), &(intersection.normal));
for (int l = 0; l<NSPHERES; l++)
{
if (l != intersection_object)
{
if (sphere_intersection(&shadow_ray, &sphere[l], &shadow_ray_intersection) && (theta>0.0))
bShadow = true;
}
}
if (bShadow)
{ // if in shadow, add only ambiental light to the surface color
red = shadow(sphere[intersection_object].ka_rgb[CRED], ambi_light_intensity);
green = shadow(sphere[intersection_object].ka_rgb[CGREEN], ambi_light_intensity);
blue = shadow(sphere[intersection_object].ka_rgb[CBLUE], ambi_light_intensity);
}
else
{
// the intersection is not in shadow:
red = blinnphong_shading(¤t_intersection, &light, &view_point,
sphere[intersection_object].kd_rgb[CRED], sphere[intersection_object].ks_rgb[CRED], sphere[intersection_object].ka_rgb[CRED], sphere[intersection_object].shininess,
light_intensity, ambi_light_intensity);
green = blinnphong_shading(¤t_intersection, &light, &view_point,
sphere[intersection_object].kd_rgb[CGREEN], sphere[intersection_object].ks_rgb[CGREEN], sphere[intersection_object].ka_rgb[CGREEN], sphere[intersection_object].shininess,
light_intensity, ambi_light_intensity);
blue = blinnphong_shading(¤t_intersection, &light, &view_point,
sphere[intersection_object].kd_rgb[CBLUE], sphere[intersection_object].ks_rgb[CBLUE], sphere[intersection_object].ka_rgb[CBLUE], sphere[intersection_object].shininess,
light_intensity, ambi_light_intensity);
}
tabelaPixlov[i][j].red = red;
tabelaPixlov[i][j].green = green;
tabelaPixlov[i][j].blue = blue;
glColor3f(tabelaPixlov[i][j].red, tabelaPixlov[i][j].green, tabelaPixlov[i][j].blue);
intersection_object = -1;
bShadow = false;
}
else
{
// draw the pixel with the background color
tabelaPixlov[i][j].red = 0;
tabelaPixlov[i][j].green = 0;
tabelaPixlov[i][j].blue = 0;
intersection_object = -1;
bShadow = false;
}
current_lambda = 0x7fefffffffffffff;
current_reflected_lambda = 0x7fefffffffffffff;
}
}
//glFlush();
stop = omp_get_wtime();
for (int i = 0; i < (viewport.xvmax - viewport.xvmin); i++)
{
for (int j = 0; j < (viewport.yvmax - viewport.yvmin); j++)
{
glColor3f(tabelaPixlov[i][j].red, tabelaPixlov[i][j].green, tabelaPixlov[i][j].blue);
glBegin(GL_POINTS);
glVertex2i(i, j);
glEnd();
}
}
printf("%f\n št niti:%d\n", stop - start, omp_get_max_threads());
glutSwapBuffers();
}
With ray tracing you should use schedule(dynamic). Besides that I would suggest fusing the loop
#pragma omp parallel for schedule(dynamic) {
for(int n=0; n<((viewport.xvmax - viewport.xvmin)*(viewport.yvmax - viewport.yvmin); n++) {
int i = n/(viewport.yvmax - viewport.yvmin);
int j = n%(viewport.yvmax - viewport.yvmin)
//...
}
Also, why are you setting the number of threads? Just use the default which should be set to the number of logical cores. If you have Hyper Threading ray tracing is one of the algorithms that will benefit from Hyper Threading so you don't want to set the number of threads to the number of physical cores.
In addition to using MIMD with OpenMP I would suggest looking into using SIMD for ray tracing. See Ingo Wald's PhD thesis for an example on how to do this http://www.sci.utah.edu/~wald/PhD/. Basically you shoot four (eight) rays in one SSE (AVX) register and then go down the ray tree for each ray in parallel. However, if one ray finishes you hold it and wait until all four are finished (this is similar to what is done on the GPU). There have been many papers written since which have more advanced tricks based on this idea.