how to debug a CUDA google colab notebook? - debugging

I am trying to run a c program using cuda the code does some math operations on an array of consecutive numbers (where every thread add elements of a row and check the last array element and return a value of the sum or zero if the conditions are met).
I don't have NVIDIA GPU so I wrote my code on google colab notebook.
The problem that I have encountered was not being able to debug the program. It outputs nothing at all no error messages and no output.
There's something wrong with the code but I cannot know where after reviewing it a few times.
Here's the code:
#include <iostream>
__global__ void matrixadd(int *l,int *result,int digits ,int possible_ids )
{
int sum=0;
int zeroflag=1;
int identicalflag=1;
int id= blockIdx .x * blockDim .x + threadIdx .x;
if(id<possible_ids)
{
if (l[(digits*id)+digits-1]==0) zeroflag=0;/*checking if the first number is zero*/
for(int i=0; i< digits-1;i++)/*edited:for(int i=0; i< digits;i++) */
{
if(l[(digits*id)+i]-l[(digits*id)+i+1]==0)
identicalflag+=1; /* checking if 2 consequitive numbers are identical*/
sum = sum + l[(digits*id)+i]; /* finding the sum*/
}
if (identicalflag!=1)identicalflag=0;
result[id]=sum*zeroflag*identicalflag;
}
}
int main()
{
int digits=6;
int possible_ids=pow(10,digits);
/*populate the array */
int* a ;
a= (int *)malloc((possible_ids * digits) * sizeof(int));
int the_id,temp=possible_ids;
for (int i = 0; i < possible_ids; i++)
{
temp--;
the_id=temp;
for (int j = 0; j < digits; j++)
{
a[i * digits + j] = the_id % 10;
if(the_id !=0) the_id /= 10;
}
}
/*the numbers will appear in reversed order */
/*allocate memory on host and device then invoke the kernel function*/
int *d_a,*d_c,*c;
int size=possible_ids * digits;
c= (int *)malloc(possible_ids * sizeof(int));/*results matrix*/
cudaMalloc((void **)&d_a,size*sizeof(int));
cudaMemcpy(d_a,a,size*sizeof(int),cudaMemcpyHostToDevice);
cudaMalloc((void **)&d_c,possible_ids*sizeof(int));
/*EDITED: cudaMalloc((void **)&d_c,digits*sizeof(int));*/
matrixadd<<<ceil(possible_ids/1024.0),1024>>>(d_a,d_c,digits,possible_ids);
cudaMemcpy(c,d_c,possible_ids*sizeof(int),cudaMemcpyDeviceToHost);
int acc=0;
for (int k=0;k<possible_ids;k++)
{
if (c[k]==7||c[k]==17||c[k]==11||c[k]==15)continue;
acc += c[k];
}
printf("The number of possible ids %d",acc);
}

You are doing invalid indexing into the l array in this line of code: if(l[(digits*id)+i]-l[(digits*id)+i+1]==0)
From comment by Robert Covella

If you are using python code, you can use 'pdb' built-in breakpoint function. put the following line of command at the top of your script.
import pdb
then before the line, you want to debug put the following command
pdb.set_trace()
you will get '(Pdb), then empty box' to insert the command. If you want to continue to the next line put 'n' or you can use 's' to see the detailed work of your current line command.
Suppose you are interested in debugging python code. Enjoy it!

Related

What is segmentation fault. How to deal with it [duplicate]

This question already has answers here:
What is a segmentation fault?
(17 answers)
Closed 2 years ago.
Problem:
A student signed up for workshops and wants to attend the maximum
number of workshops where no two workshops overlap. You must do the
following: Implement structures:
struct Workshop having the following members: The workshop's start time. The workshop's duration. The workshop's end time.
struct Available_Workshops having the following members: An integer, (the number of workshops the student signed up for). An
array of type Workshop array having size . Implement functions:
Available_Workshops* initialize (int start_time[], int duration[], int n) Creates an Available_Workshops object and
initializes its elements using the elements in the and parameters
(both are of size ). Here, and are the respective start time and
duration for the workshop. This function must return a pointer to
an Available_Workshops object.
int CalculateMaxWorkshops(Available_Workshops* ptr) Returns the maximum number of workshops the student can attend—without overlap.
The next workshop cannot be attended until the previous workshop
ends. Note: An array of unkown size ( ) should be declared as
follows: DataType* arrayName = new DataType[n];
Your initialize function must return a pointer to an
Available_Workshops object. Your CalculateMaxWorkshops function
must return maximum number of non-overlapping workshops the student
can attend.
Sample Input
6
1 3 0 5 5 8
1 1 6 2 4 1
Sample Output
4
Explanation The first line denotes , the number of workshops. The next line contains space-separated integers where the integer
is the workshop's start time. The next line contains
space-separated integers where the integer is the workshop's
duration. The student can attend the workshops and without
overlap, so CalculateMaxWorkshops returns to main (which then
prints to stdout).
MY CODE:
#include <iostream>
using namespace std;
class Workshop{
public:
int start_time{},duration{},end_time{};};
class Available_Workshops
{
public:
int n{};
struct Workshop*arr=new struct Workshop[n];
~Available_Workshops()
{
delete [] arr;
}
void arr_sort();
void arr_delete(int i);
};
////////////////////////////////////////////////////////////////////////////////////////////
Available_Workshops * initialize(int start_time[],int duration[],int n)
{
Available_Workshops * x=new Available_Workshops{};
x->n=n;
for(int i=0;i<n;i++)
{
x->arr[i].start_time=start_time[i];
x->arr[i].duration=duration[i];
x->arr[i].end_time=start_time[i]+duration[i];
}
return x;
}
///////////////////////////////////////////////////////////////////////////////////////////
void Available_Workshops:: arr_delete(int i)
{
n-=1;
for(int j=i;j<n;j++)
{
arr[j]=arr[j+1];
}
}
///////////////////////////////////////////////////////////////////////////////////////////
void Available_Workshops:: arr_sort()
{
for(int i=0;i<n;i++)
{
for(int j=i+1;j<n;j++)
{
if(arr[i].start_time>arr[j].start_time)
{
struct Workshop temp=arr[i];
arr[i]=arr[j];
arr[j]=temp;
}
}
}
}
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int CalculateMaxWorkshops(Available_Workshops * x)
{
x->arr_sort();
for(int i=0;i<x->n-1;i++)
{
for(int j=i+1;j<x->n;j++)
{
if(x->arr[i].end_time>x->arr[j].start_time)
{
if(x->arr[i].duration>=x->arr[j].duration)
x->arr_delete(i);
else x->arr_delete(j);
j--;
}
}
}
int y=x->n;
delete x;
return y;
}
int main(int argc, char *argv[]) {
int n; // number of workshops
cin >> n;
// create arrays of unknown size n
int* start_time = new int[n];
int* duration = new int[n];
for(int i=0; i < n; i++){
cin >> start_time[i];
}
for(int i = 0; i < n; i++){
cin >> duration[i];
}
Available_Workshops * ptr;
ptr = initialize(start_time,duration, n);
cout << CalculateMaxWorkshops(ptr) << endl;
return 0;
}
My code is not running. It has segmentation fault. Please help me find this error
You bug can be seen from the class declaration:
class Available_Workshops
{
public:
int n{};
struct Workshop* arr = new struct Workshop[n];
~Available_Workshops()
{
delete[] arr;
}
void arr_sort();
void arr_delete(int i);
};
Member n gets explicitly initialized to 0. Yet, your initialize function will happily fill in more elements into arr (an array of zero elements) and cause all kinds of undefined behavior.
You really, really want a proper constructor for your class instead of trying to inline initialize the members.
Available_Workshops(int size) :
n(size)
{
arr = new Workshop[n];
}
Another issue, although not related to your crash is inside your arr_delete function.
for (int j = i; j < n; j++)
{
arr[j] = arr[j + 1];
}
When j == n-1 on the last iteration of the loop, it will execute arr[n-1] = arr[n]. Accesing arr[n] is undefined behavior since the only valid indices in the array are from [0..n-1]

Why this code is giving segmentation fault?

Alice is playing an arcade game and wants to climb to the top of the leaderboard and wants to track her ranking. Its leaderboard works like this:
-The player with the highest score is ranked number on the leaderboard.
-Players who have equal scores receive the same ranking number, and the next player(s) receive the immediately following ranking number.
For example, the four players on the leaderboard have high scores of 100, 90, 90 and 80. Those players will have ranks 1, 2, 2 and 3 respectively. If Alice's scores are 70, 80 and 105 her rankings after each game are 4th, 3rd and 1st.
#include <bits/stdc++.h>
using namespace std;
struct table{
int rank;
int score;
};
This is a modified binary search function iplementation for searching the score.
int search(vector<table> v,int low,int high,int n,int x){
if(low<=high){
int mid =(high+low)/2;
if((v[mid].score==x) || (mid==0 && v[mid].score<x))
return v[mid].rank;
if(mid==n-1 && v[mid].score>x)
return (v[mid].rank + 1);
if(v[mid].score>x && x>v[mid+1].score && mid<n-1)
return v[mid+1].rank;
if(v[mid].score>x)
return search(v,mid+1,high,n,x);
else
return search(v,low,mid-1,n,x);
}
return -1;
}
Main climbingLeaderboard function
vector<int> climbingLeaderboard(vector<int> scores, vector<int> alice) {
vector<table> v;
vector<int> res;
int n = scores.size();
int m = alice.size();
int x=1;
for(int i=0 ; i<n ; i++){
if(scores[i]!=scores[i-1] && i>0)
x++;
v[i].rank = x;
v[i].score = scores[i];
}
int z;
for(int i=0 ; i<m ; i++){
x=alice[i];
z = search(v,0,n-1,n,x);
res.push_back(z);
}
return res;
}
Driver Program
int main(){
int scores_count;
cin >> scores_count;
vector<int> scores; `//vector for storing leaderboard scores`
int k;
for(int i=0 ; i<scores_count ; i++){
cin >> k;
scores.push_back(k);
}
int game_count; `//number of games played by alice`
vector<int> Alice; `//vector for storing Alice's scores`
for(int i=0 ; i<game_count ; i++){
cin >> k;
alice.push_back(k);
}
vector<int> result; `//vector for storing result rank of each game of Alice`
result = climbingLeaderboard(scores,alice);
for(auto i = result.begin() ; i!=result.end() ; i++){
cout << *i << endl;
}
return 0;
}
Problem: In your climbingLeaderboard function, the first loop will attempt to access scores[i-1] when i is set to 0, resulting in a negative index for a std::vector access.
Fix: Change the for loop to start from i=1.
Problem 2: You access v by index without instantiating any structures to hold the data (e.g. v[i].rank = x).
Fix 2: Create an instance of the structure and write the data to it, then push it back into the vector v. Alternatively, reserve the memory for the whole vector as a preallocation.
Problem 3: On closer inspection, your search functionality is definitely broken. You should probably test this in isolation from the rest of the code.
Core Dump/Segmentation fault is a specific kind of error caused by accessing memory that does not belong to you.
Error in function:Main climbingLeaderboard function:
Accessing out of array index bounds
Start Loop From I =1 as you are doing score[i-1] which here in the first iteration would score[-1] (index) and there is no -1 index in c++
for(int i=1; i<n ; i++){
if(scores[i]!=scores[i-1] && i>0)
x++;
v[i].rank = x;
v[i].score = scores[i];
}

Using qsort in Cython to get a sorting index/permutation

Overview
There are a few questions similar to this one but they are all slightly different. To be clear, if values is an array of integers, I want to find perm such that sorted_values (values sorted by some comparison operator), is given by
sorted_values[i] = values[perm[i]]
Step 1
How to do it in C? Well qsort requires declaring a comparison function to tell you whether one value is greater than another. If we make values a global variable, then we can exploit this comparison function to sort an array perm initially set to 0:N-1 (where N is the length of values) by not comparing perm[i] vs perm[j] but instead comparing values[perm[i]] vs values[perm[j]]. See this link. Some example C code:
// sort_test.c
#include <stdio.h>
#include <stdlib.h>
int *values;
int cmpfunc (const void * a, const void * b) {
return ( values[*(int*)a] - values[*(int*)b] );
}
int main () {
int i;
int n = 5;
int *perm;
// Assign memory
values = (int *) malloc(n*sizeof(int));
perm = (int *) malloc(n*sizeof(int));
// Set values to random values between 0 and 99
for (i=0; i<n; i++) values[i] = rand() % 100;
// Set perm initially to 0:n-1
for (i=0; i<n; i++) perm[i] = i;
printf("Before sorting the list is: \n");
for (i=0; i<n; i++) printf("%d ", values[i]);
qsort(perm, n, sizeof(int), cmpfunc);
printf("\nThe sorting permutation is: \n");
for (i=0; i<n; i++) printf("%d ", perm[i]);
free(values);
free(perm);
printf("\n");
return(0);
}
Of course the trick is defining values globally, so the cmpfunc can see it.
Step 2
How to do it in Cython? Unfortunately I cannot get Cython to use the same trick with values declared globally. My best attempt is the following based off the answer here, however the difference is that they just sort an array they do not need to get the indexing/permutation.
# sort_test_c.pyx
cimport cython
from libc.stdlib cimport qsort
# Try declaring global variable for the sort function
cpdef long[:] values
cdef int cmpfunc (const void *a , const void *b) nogil:
cdef long a_v = (<long *> a)[0]
cdef long b_v = (<long *> b)[0]
return (values[a_v] - values[b_v]);
def sort(long[:] py_values, long[:] perm, int N):
# Assign to global
values = py_values
# Make sure perm is 0:N-1
for i in range(N):
perm[i] = i
# Perform sort
qsort(&perm[0], N, perm.strides[0], &cmpfunc)
This can be compiled using
cythonize -i sort_test_c.pyx
and tested with the script
# sort_test.py
from sort_test_c import sort
import numpy as np
n = 5
values = np.random.randint(0, 100, n).astype(int)
perm = np.empty(n).astype(int)
sort(values, perm, n)
This however complains about our global variable values i.e.
UnboundLocalError: local variable 'values' referenced before assignment
Exception ignored in: 'sort_test_c.cmpfunc
and the sorting permutation is not correct (unless the values are already ordered in which case it is luck as perm always returns the array 0:4). How can I fix this?

Eigen JacobiSVD cuda compile error

I've got an error, regarding calling JacobiSVD in my cuda function.
This is the part of the code that causing the error.
Eigen::JacobiSVD<Eigen::Matrix3d> svd( cov_e, Eigen::ComputeThinU | Eigen::ComputeThinV);
And this is the error message.
CUDA_voxel_building.cu(43): error: calling a __host__
function("Eigen::JacobiSVD , (int)2> ::JacobiSVD") from a __global__
function("kernel") is not allowed
I've used the following command to compile it.
nvcc -std=c++11 -D_MWAITXINTRIN_H_INCLUDED -D__STRICT_ANSI__ -ptx CUDA_voxel_building.cu
I'm using code 8.0 with eigen3 on ubuntu 16.04.
It seems like other functions such as eigen value decomposition also gives the same error.
Anyone knows a solution? I'm enclosing my code below.
//nvcc -ptx CUDA_voxel_building.cu
#include </usr/include/eigen3/Eigen/Core>
#include </usr/include/eigen3/Eigen/SVD>
/*
#include </usr/include/eigen3/Eigen/Sparse>
#include </usr/include/eigen3/Eigen/Dense>
#include </usr/include/eigen3/Eigen/Eigenvalues>
*/
__global__ void kernel(double *p, double *breaks,double *ind, double *mu, double *cov, double *e,double *v, int *n, char *isgood, int minpts, int maxgpu){
bool debuginfo = false;
int idx = threadIdx.x + blockIdx.x * blockDim.x;
if(debuginfo)printf("Thread %d got pointer\n",idx);
if( idx < maxgpu){
int s_ind = breaks[idx];
int e_ind = breaks[idx+1];
int diff = e_ind-s_ind;
if(diff >minpts){
int cnt = 0;
Eigen::MatrixXd local_p(3,diff) ;
for(int k = s_ind;k<e_ind;k++){
int temp_ind=ind[k];
//Eigen::Matrix<double, 3, diff> local_p;
local_p(1,cnt) = p[temp_ind*3];
local_p(2,cnt) = p[temp_ind*3+1];
local_p(3,cnt) = p[temp_ind*3+2];
cnt++;
}
Eigen::Matrix3d centered = local_p.rowwise() - local_p.colwise().mean();
Eigen::Matrix3d cov_e = (centered.adjoint() * centered) / double(local_p.rows() - 1);
Eigen::JacobiSVD<Eigen::Matrix3d> svd( cov_e, Eigen::ComputeThinU | Eigen::ComputeThinV);
/* Eigen::Matrix3d Cp = svd.matrixU() * svd.singularValues().asDiagonal() * svd.matrixV().transpose();
mu[idx]=p[ind[s_ind]*3];
mu[idx+1]=p[ind[s_ind+1]*3];
mu[idx+2]=p[ind[s_ind+2]*3];
e[idx]=svd.singularValues()(0);
e[idx+1]=svd.singularValues()(1);
e[idx+2]=svd.singularValues()(2);
n[idx] = diff;
isgood[idx] = 1;
for(int x = 0; x < 3; x++)
{
for(int y = 0; y < 3; y++)
{
v[x+ 3*y +idx*9]=svd.matrixV()(x, y);
cov[x+ 3*y +idx*9]=cov_e(x, y);
//if(debuginfo)printf("%f ",R[x+ 3*y +i*9]);
if(debuginfo)printf("%f ",Rm(x, y));
}
}
*/
} else {
mu[idx]=0;
mu[idx+1]=0;
mu[idx+2]=0;
e[idx]=0;
e[idx+1]=0;
e[idx+2]=0;
n[idx] = 0;
isgood[idx] = 0;
for(int x = 0; x < 3; x++)
{
for(int y = 0; y < 3; y++)
{
v[x+ 3*y +idx*9]=0;
cov[x+ 3*y +idx*9]=0;
}
}
}
}
}
First of all, Ubuntu 16.04 provides Eigen 3.3-beta1, which is not really recommended to be used. I would suggest upgrading to a more recent version. Furthermore, to include Eigen, write (e.g.):
#include <Eigen/Eigenvalues>
and compile with -I /usr/include/eigen3 (if you use the version provided by the OS), or better -I /path/to/local/eigen-version.
Then, as talonmies noted, you can't call host-functions from kernels, (I'm not sure at the moment, why JacobiSVD is not marked as device function), but in your case it would make much more sense to use Eigen::SelfAdjointEigenSolver, anyway. Since the matrix you are decomposing is fixed-size 3x3 you should actually use the optimized computeDirect method:
Eigen::SelfAdjointEigenSolver<Eigen::Matrix3d> eig; // default constructor
eig.computeDirect(cov_e); // works for 2x2 and 3x3 matrices, does not require loops
It seems the computeDirect even works on the beta version provided by Ubuntu (I'd still recommend to update).
Some unrelated notes:
The following is wrong, since you should start with index 0:
local_p(1,cnt) = p[temp_ind*3];
local_p(2,cnt) = p[temp_ind*3+1];
local_p(3,cnt) = p[temp_ind*3+2];
Also, you can write this in one line:
local_p.col(cnt) = Eigen::Vector3d::Map(p+temp_ind*3);
This line will not fit (unless diff==3):
Eigen::Matrix3d centered = local_p.rowwise() - local_p.colwise().mean();
What you probably mean is (local_p is actually 3xn not nx3)
Eigen::Matrix<double, 3, Eigen::Dynamic> centered = local_p.colwise() - local_p.rowwise().mean();
And when computing cov_e you need to .adjoint() the second factor, not the first.
You can avoid both 'big' matrices local_p and centered, by directly accumulating Eigen::Matrix3d sum2 and Eigen::Vector3d sum with sum2 += v*v.adjoint() and sum +=v and computing
Eigen::Vector3d mu = sum / diff;
Eigen::Matrix3d cov_e = (sum2 - mu*mu.adjoint()*diff)/(diff-1);

Openacc error ibgomp: while loading libgomp-plugin-host_nonshm.so.1: libgomp-plugin-host_nonshm.so.1: cannot

I want to compile an easy openacc sample (it was attached) , it was correctly compiled but when i run it got an error :
compile with : gcc-5 -fopenacc accVetAdd.c -lm
run with : ./a.out
got error in runtime
error: libgomp: while loading libgomp-plugin-host_nonshm.so.1: libgomp-plugin-host_nonshm.so.1: cannot open shared object file: No such file or directory
I google it and find only one page! then i ask how to fix this problem?
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main(int argc, char* argv[])
{
// Size of vectors
int n = 10000;
// Input vectors
double *restrict a;
double *restrict b;
// Output vector
double *restrict c;
// Size, in bytes, of each vector
size_t bytes = n*sizeof(double);
// Allocate memory for each vector
a = (double*)malloc(bytes);
b = (double*)malloc(bytes);
c = (double*)malloc(bytes);
// Initialize content of input vectors, vector a[i] = sin(i)^2 vector b[i] = cos(i)^2
int i;
for (i = 0; i<n; i++) {
a[i] = sin(i)*sin(i);
b[i] = cos(i)*cos(i);
}
// sum component wise and save result into vector c
#pragma acc kernels copyin(a[0:n],b[0:n]), copyout(c[0:n])
for (i = 0; i<n; i++) {
c[i] = a[i] + b[i];
}
// Sum up vector c and print result divided by n, this should equal 1 within error
double sum = 0.0;
for (i = 0; i<n; i++) {
sum += c[i];
}
sum = sum / n;
printf("final result: %f\n", sum);
// Release memory
free(a);
free(b);
free(c);
return 0;
}
libgomp dynamically loads shared object files for the plugins it supports, such as the one implementing the host_nonshm device. If they're installed in a non-standard directory (that is, not in the system's default search path), you need to tell the dynamic linker where to look for these shared object files: either compile with -Wl,-rpath,[...], or set the LD_LIBRARY_PATH environment variable.

Resources