Errors in OpenCL kernel code at runtime - windows

I am new to Visual Studio and I am using it to write a simple parallel sorting program using OpenCL.
When I run it, I get a line before my output (i.e. from before I receive and print the result buffer) saying "5 Errors Generated.".
I assume this is telling me that I have errors in my kernel file, and if I deliberately write errors in my kernel file that number increases.
I would really like to know what those errors are so I can correct my program. Being unfamiliar with VS I simply cannot find them listed anywhere.
Does anyone know where I can find what errors are being generated.
Thanks

You need to call clGetProgramBuidlInfo asking for the CL_PROGRAM_BUILD_LOG in order to get the runtime errors of the compiler.
char result[4096];
size_t size;
clGetProgramBuildInfo( program, device, CL_PROGRAM_BUILD_LOG, sizeof(result), result, &size);
printf("%s\n", result);

Related

Expression must have a constant value error in array via MPI world size

Recently, i started learning about MPI programming and I have tried to program it on both Linux and Windows OS. I do not have any problem running the MPI application on Linux, however, i stumbled upon expression must have a constant value error on Visual Studio
For example, i'm trying to get the world_size via the MPI_Comm_size(MPI_COMM_WORLD, &world_size); and create an array based on the world_size (for example)
Code Sample :
#include <mpi.h>
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int database[world_size]; //error occured here
However, when i'm running it on Linux, it is working perfectly fine as i'm able to execute the code while stating the number of processes i wish to have. Am i missing out anything? I followed this particular youtube link that taught me how to install MS-MPI on my Visual Studio 2015.
Any help would be greatly appreciated.
Automatic array sizing using non const values actually works with gcc (https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html). However, it's considered a bad idea because (as you've just experienced) your code won't be portable anymore. You just need to change your code to create an array using new. You might want to generate an error to make sure your code is portable: Disable variable-length automatic arrays in gcc

How do I interpret this error from GDB?

I feel pretty dumb right now, but how do I interpret this message in GDB?
Program received signal SIGSEGV, Segmentation fault.
0x00007fe2eb46073a in clearerr (fp=0x4359790) at clearerr.c:27
27 clearerr.c: No such file or directory.
in clearerr.c
What file is missing that's causing the segfault? Is it clearerr.c or the file that clearerr is trying to access?
What file is missing that's causing the segfault?
We don't know what is causing SIGSEGV, but it's unlikely that any missing file has anything to do with it.
First, this:
clearerr.c: No such file or directory.
simply means that GDB can not show you the source where SIGSEGV occurred. That is because clearerr() is part of your libc, and you either didn't install sources for your libc (they may not even be available for your environment), or you didn't tell GDB how to find these sources.
Second, the actual cause of SIGSEGV is most likely because the fp that you invoked it with has been corrupted or is invalid in some other way.
Here are a few ways this could happen:
char c;
FILE *fp = (FILE*) &c; // fp is bogus: doesn't point to a FILE at all
clearerr(fp); // likely will crash
FILE *fp2; // fp2 contains uninitialized garbage
clearerr(fp2); // likely will crash
FILE *fp3 = fopen("/tmp/foo", "w");
fclose(fp3); // destroys fp3
clearerr(fp3); // accesses dangling memory, likely will crash
There are of course many other ways as well. You'll need to look at the caller of clearerr to see if it's doing something stupid. To find the caller, use GDB where command.
The seg fault is being caused by a file that clearerr.c is trying to access (at line 27).

Cygwin 64-bit C compiler caching funny (and ending early)

We've been using CygWin (/usr/bin/x86_64-w64-mingw32-gcc) to generate Windows 64-bit executable files and it has been working fine through yesterday. Today it stopped working in a bizarre way--it "caches" standard output until the program ends. I wrote a six line example
that did the same thing. Since we use the code in batch, I wouldn't worry except when I run a test case on the now-strangely-caching executable, it opens the output files, ends early, and does not fill them with data. (The same code on Linux works fine, but these guys are using Windows.) I know it's not gorgeous code, but it demonstrates my problem, printing the numbers "1 2 3 4 5 6 7 8 9 10" only after I press the key.
#include <stdio.h>
main ()
{
char q[256];
int i;
for (i = 1; i <= 10; i++)
printf ("%d ", i);
gets (q);
printf ("\n");
}
Does anybody know enough CygWin to help me out here? What do I try? (I don't know how to get version numbers--I did try to get them.) I found a 64-bit cygwin1.dll in /cygdrive/c/cygwin64/bin and that didn't help a bit. The 32-bit gcc compilation works fine, but I need 64-bit to work. Any suggestions will be appreciated.
Edit: we found and corrected an unexpected error in the original code that caused the program not to populate the output files. At this point, the remaining problem is that cygwin won't show the output of the program.
For months, the 64-bit executable has properly generated the expected output, just as the 32-bit version did. Just today, it has started exhibiting the "caching" behavior described above. The program sends many hundreds of lines with many newline characters through stdout. Now, when the 64-bit executable is created as above, none of these lines are shown until the program completes and the entire output it printed at once. Can anybody provide insight into this problem?
This is quite normal. printf outputs to stdout which is a FILE* and is normally line buffered when connected to a terminal. This means you will not see any output until you write a newline, or the internal buffer of the stdout FILE* is full (A common buffer size is 4096 bytes).
If you write to a file or pipe, output might be fully buffered, in which case output is flushed when the internal buffer is full and not when you write a newline.
In all cases the buffers of a FILE* are flushed when: you call fflush(..). You call fclose(..) or the program ends normally.
Your program will behave the same on windows/cygwin as on linux.
You can add a call to fflush(stdout) to see the output immediately.
for (i = 1; i <= 10; i++) {
printf ("%d ", i);
fflush(stdout);
}
Also, do not use the gets() function.
If your real programs "ends early" and does not write data in text files that it's supposed to, it may be it crashes due to a bug of yours before it finishes, in which case the buffered output will not be flushed out. Or, more unlikely, you call the _exit() function, which will terminate the program without flushing FILE* buffers (in contrast to the exit() function)

Cuda kernel launch failure

I am trying to call two kernels as shown below
for (t=0; t<=time_total; t++)
{
//kernel calls
kernel1<<<noOfBlocks,noOfThreadsPerBlock>>>(** SOME PARAMETERS **);
checkCudaError(cudaThreadSynchronize());
kernel2<<<noOfBlocks,noOfThreadsPerBlock>>>(** SOME PARAMETERS **);
checkCudaError(cudaThreadSynchronize());
}
And the structure of the second kernel is
var[index+0]=**SOME CALCULATION**
var[index+1]=**SOME CALCULATION**
var[index+2]=**SOME CALCULATION**
Now when I execute this code, checkCudaError does not report anything and the code is executed giving some output but visual studio gives the following exception
First-chance exception at 0x7640c41f in **.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0039f9c4..
First-chance exception at 0x7640c41f in **.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0039f9c4..
And when I check on Nsight it says kernel 2 is having the following error
CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
Now the problem is that var array in kernel 2 is giving some of the rows correct some are copies of other row values and some are garbage.
Also when I do this
var[index+0]=3
var[index+1]=3
var[index+2]=3
All the values of var are set to 3
A few side notes:
cudaThreadSynchronize() is deprecated in favor of cudaDeviceSynchronize().
The fact that nsight is reporting an error on the 2nd kernel launch, but your error checking code is not, leads me to believe your error checking code is broken.
Now, regarding your issue, out of resources is frequently due to a code requesting too many registers (too many registers per thread times the number of threads per threadblock requested.) Try re-compiling your code specifying -Xptxas -v to get verbose output, and then recompiling again with -maxrregcount 20 (or something like that) to try to work around this for test purposes.
If this "fixes" your problem, you may then want to consider the following:
See if there is a way you can re-order or restructure your code to reduce the register pressure
If not, then adjust your maxrregcount value upwards to approximately the highest value that will allow your code to compile and run according to the launch configurations (number of threads per block) that you care about. You may also want to benchmark your code at different levels of this setting, as it can affect occupancy. Usually if you have it set to the highest value that will compile and run, then you are limiting yourself to one threadblock per SM at execution time. This may be OK, or there may be a lower setting that is better, allowing two threadblocks per SM residency, and possibly higher performance. Only benchmarking your code will tell.

Why do some static analysis tools not report potential buffer overflows?

I have an example of a strcpy command that seems to be a risk of a buffer overflow, but PVS-Studio doesn’t raise a warning. In my example, strcpy is used to copy a command line argument into a buffer, without checking the size of the command line argument. This could result in a buffer overflow if the argument exceeds the size of the buffer.
Code example:
char carg1[13];
int main(int argc, char* argv[])
{
// Get name from the 1st command line arg
strcpy(carg1, argv[1]);
…
}
The size of argv[1] isn't checked before being coping into carg1. Shouldn’t this raise a warning?
It's theoretically impossible to build a perfect static analysis tool (this follows from results like the undecidability of the halting problem). As a result, all static analysis tools are at best heuristics that can try to detect certain classes of errors, and even then can't necessarily detect all of those errors.
So yes, the code you've got above looks like it has a potential buffer overflow. I honestly don't know why this particular tool can't detect the error, but my guess is that the internal heuristics the analyzer uses for some reason is failing to detect it.
Hope this helps!
There are 3 facts:
1) If you use Visual C++ compiler then you will receive compiler warnings 4996.
1>robust.cpp(529): warning C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
1> C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\include\string.h(110) : see declaration of 'strcpy'
2) PVS-Studio initially worked with Visual Studio only.
3) PVS-Studio policy is to implement diagnostic rules which are not duplicate compiler warnings.
So it is seems a logical that PVS doesn't check the case which are already was checked by Microsoft compiler for a long time already (from VS2005).
Updated:
Finally PVS implemented such diagnostic rule:
https://www.viva64.com/en/w/V755/print/

Resources