Nsight 2.2 sometimes works sometimes doesn't - debugging

I have problem about Parallel Nsight 2.2 debugger. It is very strange and I don't know how to describe it. Anyway, It works sometimes and sometimes doesn't.
What I observed is, that it works with dynamic array(this array has no effect on cuda_kernels or any other functon like cudaMemcpy atc...) named with 3 elements. And this is importnat... If I set size on 4+, it just falls down, no errors, nothing just fall down.
Interesting fact is, that if I run it normally via normal debugger hole program works correctly with right results. Also interesting fact is, that when set this array as static
unsigned topology[4];
and set in same values Nsight debugger works but very slowly.
So first of all I commented all cuda source code (like kernels and all cuda functions) but still same - it falls down. So I started to comment more host_code and I found loop (in host code) which does this creepy thing. So when program in Nsght-debug reach loop(under text) it falls down, BUT, when I write command in this loop to print number of each loop on screen, it runs, loop is finished, hole program is finished and then debugger told me:
Debug Assertion Failed!
Program:
File:f:\dd\vctools\crt_bld\self_x86\crt\src\dbgheap.c
Line: 1322
Expression: _CrtIsValidHeapPointer(pUserData)
.... I don't even have disk f ... so wtf???
Anyway, on normal debugger it runs fine and with right results.
This is mentioned loop and dynamic array *topology:
unsigned *topology;
unsigned numberOfLayersInput = 5;
topology = new unsigned [numberOfLayersInput];
topology[0] = 784;
topology[1] = 1000;
topology[2] = 800;
topology[3] = 300;
topology[4] = 10;
kernelTopology_ *topologyOfKernels;
topologyOfKernels = new kernelTopology_ [numberOfLayersInput - 1];
for (int i = 0, numberOfThreads; i < numberOfLayersInput; i++)
{
cout <<i << endl; // this is the added line!
numberOfThreads = fixedTopology[i];
topologyOfKernels[i].size = numberOfThreads;
if(numberOfThreads > THREADS_PER_BLOCK)
topologyOfKernels[i].BLOCK_SIZE = THREADS_PER_BLOCK;
else topologyOfKernels[i].BLOCK_SIZE = numberOfThreads;
if(numberOfThreads <= THREADS_PER_BLOCK)
topologyOfKernels[i].GRID_SIZE = 1;
else if(fixedTopology[i] % topologyOfKernels[i].BLOCK_SIZE == 0)
topologyOfKernels[i].GRID_SIZE = fixedTopology[i] / topologyOfKernels[i].BLOCK_SIZE;
else
topologyOfKernels[i].GRID_SIZE = (fixedTopology[i] / topologyOfKernels[i].BLOCK_SIZE) + 1;
}
I can't see any mistakes in this code... also normal debugger has no problem with it.
I have reinstalled graphics drivers, CUDA toolkit, CUDA SDK and Paralell Nsight but it does same creepy things. By the way I use Win 7 64 bit and VS2010.
Does have anyone any ideas what I should do with this?
Please, let me know if someone has any idea :)

The error
Debug Assertion Failed! Program: File:f:\dd\vctools\crt_bld\self_x86\crt\src\dbgheap.c Line: 1322
is from the Microsoft C runtime function _CrtIsValidHeapPointer. The default debug build adds additional heap and stack checks into the code. This function is used to verify that a specified pointer is in the local heap. The path f:... is the location of the source file in the C runtime. This function is at the time Microsoft built the library.
The assertion indicates an out of bounds memory access. The cause of the error appears to be incorrect allocation of topologyOfKernels.
corruption.topologyOfKernels = new kernelTopology_ [numberOfLayersInput - 1];
should be allocating numberofLayersInput elements.
corruption.topologyOfKernels = new kernelTopology_ [numberOfLayersInput];

Related

Halide with CUDA targets not working

I am new to Halide and have written a simple code to compute max(127, pix(x,y)) for every pixel in an image.
Though the code runs fine on CPU, it gives me wrong outputs when I set Target::CUDA. I'm not able to find the issue.
The following is a part of my code. Let me know if there is a mistake in the code, or do I have to re-build Halide with a support that will enable CUDA.
Halide::Var x, y;
Halide::Buffer<uint8_t> inputImageBuf(inpImg, imgSizes);
Halide::Func reluOp("ReLU Operation");
reluOp(x,y) = Halide::max(127, inputImageBuf(x, y));
int numTiles = 4;
Halide::Var threads_x, threads_y, blocks_x, blocks_y;
Halide::Target targetCUDA = Halide::get_host_target();
targetCUDA.set_feature(Halide::Target::CUDA);
targetCUDA.set_feature(Halide::Target::Debug);
reluOp.gpu_tile(x, y, blocks_x, blocks_y, threads_x, threads_y, numTiles, numTiles, Halide::TailStrategy::Auto, Halide::DeviceAPI::CUDA);
// reluOp.compile_jit(targetCUDA);
reluOp.print_loop_nest();
Halide::Buffer<uint8_t> result = reluOp.realize(cols, rows, targetCUDA);
result.copy_to_host();
One thing to try is adding a inputImageBuf.set_host_dirty(). If that helps I would consider that a bug in Halide.
You can also scroll through the debug output and see if the expected number of copies to and from the host are happening.

HAL_GetTick() crash mcu

I created a simple project using STCubeMX for my nucleo-f446ZE(STM32F446ZET6).
The project should be a USB device HID but it fail to start. After messing around with the debugger, I discovered that the MCU PC register go to 0x00000000 or 0xFFFFFFFF or sometimes random invalid value.
I didn't modify any code. I compiled the code with MDK-ARM (modified GCC, Vision IDE), and with GCC (openSTM32) and the same thing happen.
CallstackĀ :
Main
SystemClock_Config
HAL_RCC_ClockConfig (632)
Hal_GetTick
Ps:
RAM got corrupted after 0x080149A and that why the program do weird stuff
Image
Solution
CubeMX didn't setup clocks very well. here is the setup i used to make work the usb.
//RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_HSI;
RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_HSE;
//RCC_OscInitStruct.HSIState = RCC_HSI_ON;
//RCC_OscInitStruct.HSICalibrationValue = 16;
RCC_OscInitStruct.HSEState = RCC_HSE_ON;
RCC_OscInitStruct.PLL.PLLState = RCC_PLL_ON;
RCC_OscInitStruct.PLL.PLLSource = RCC_PLLSOURCE_HSE;
RCC_OscInitStruct.PLL.PLLM = 8;
RCC_OscInitStruct.PLL.PLLN = 192;
RCC_OscInitStruct.PLL.PLLP = RCC_PLLP_DIV4;
RCC_OscInitStruct.PLL.PLLQ = 4;
RCC_OscInitStruct.PLL.PLLR = 2;
The RCC_ClkInitStruct is probably not initialized properly (or at all)

FMOD Ex dropping sounds, eventually going silent

I'm attempting to port an old open-source FMOD 3 game (Candy Crisis) to the latest version of FMOD Ex 4 on OS X. Its sound needs are very simpleā€”it plays WAVs, sometimes changing their frequency or speaker mix, and also plays MOD tracker music, sometimes changing the speed. I'm finding that the game works fine at first, but over the course of a few minutes, it starts truncating sounds early, then the music loses channels and eventually stops, then over time all sound ceases. I can cause the problem to reproduce more quickly if I lower the number of channels available to FMOD.
I can get the truncated/missing sounds issue to occur even if I never play a music file, but music definitely seems to make things worse. I have also tried commenting out the code which adjusts the sound frequency and speaker mix, and that was not the issue.
I am calling update() every frame.
Here's the entirety of my interactions with FMOD to play WAVs:
void InitSound( void )
{
FMOD_RESULT result = FMOD::System_Create(&g_fmod);
FMOD_ERRCHECK(result);
unsigned int version;
result = g_fmod->getVersion(&version);
FMOD_ERRCHECK(result);
if (version < FMOD_VERSION)
{
printf("Error! You are using an old version of FMOD %08x. This program requires %08x\n", version, FMOD_VERSION);
abort();
}
result = g_fmod->init(8 /* was originally 64, but 8 repros the issue faster */, FMOD_INIT_NORMAL, 0);
FMOD_ERRCHECK(result);
for (int index=0; index<kNumSounds; index++)
{
result = g_fmod->createSound(QuickResourceName("snd", index+128, ".wav"), FMOD_DEFAULT, 0, &s_sound[index]);
FMOD_ERRCHECK(result);
}
}
void PlayMono( short which )
{
if (soundOn)
{
FMOD_RESULT result = g_fmod->playSound(FMOD_CHANNEL_FREE, s_sound[which], false, NULL);
FMOD_ERRCHECK(result);
}
}
void PlayStereoFrequency( short player, short which, short freq )
{
if (soundOn)
{
FMOD::Channel* channel = NULL;
FMOD_RESULT result = g_fmod->playSound(FMOD_CHANNEL_FREE, s_sound[which], true, &channel);
FMOD_ERRCHECK(result);
result = channel->setSpeakerMix(player, 1.0f - player, 0, 0, 0, 0, 0, 0);
FMOD_ERRCHECK(result);
float channelFrequency;
result = s_sound[which]->getDefaults(&channelFrequency, NULL, NULL, NULL);
FMOD_ERRCHECK(result);
result = channel->setFrequency((channelFrequency * (16 + freq)) / 16);
FMOD_ERRCHECK(result);
result = channel->setPaused(false);
FMOD_ERRCHECK(result);
}
}
void UpdateSound()
{
g_fmod->update();
}
And here's how I play MODs.
void ChooseMusic( short which )
{
if( musicSelection >= 0 && musicSelection <= k_songs )
{
s_musicChannel->stop();
s_musicChannel = NULL;
s_musicModule->release();
s_musicModule = NULL;
musicSelection = -1;
}
if (which >= 0 && which <= k_songs)
{
FMOD_RESULT result = g_fmod->createSound(QuickResourceName("mod", which+128, ""), FMOD_DEFAULT, 0, &s_musicModule);
FMOD_ERRCHECK(result);
result = g_fmod->playSound(FMOD_CHANNEL_FREE, s_musicModule, true, &s_musicChannel);
FMOD_ERRCHECK(result);
EnableMusic(musicOn);
s_musicModule->setLoopCount(-1);
s_musicChannel->setPaused(false);
musicSelection = which;
s_musicPaused = 0;
}
}
If someone wants to experiment with this, let me know and I'll upload the project somewhere. My gut feeling is that FMOD is busted but I'd love to be proven wrong.
Sounds like your music needs to be set as higher priority than your other sounds. Remember, lower numbers are more important. I think you can just set the priority on the channel.
Every time I play the following WAV, FMOD loses one channel permanently. I am able to reproduce this channel-losing behavior in the "playsound" example if I replace the existing jaguar.wav with my file.
https://drive.google.com/file/d/0B1eDRY8sV_a9SXMyNktXbWZOYWs/view?usp=sharing
I contacted Firelight and got this response. Apparently WAVs can include a looping command! I had no idea.
Hello John,
I've taken a look at the two files you have provided. Both files end
with a 2 sample infinite loop region.
FMOD 4 (and FMOD 5 for that matter) will see the loop region in the
file and automatically enable FMOD_LOOP_NORMAL if you haven't
specified any loop mode. Assuming you want one-shot behavior just pass
in FMOD_LOOP_OFF when you create the sound.
Kind regards, Mathew Block | Senior Platform Engineer
Technically this behavior contradicts the documented behavior of FMOD_DEFAULT (which is specified to imply FMOD_LOOP_OFF) so they are planning to improve the documentation here.
Based on the wave sample you supplied, FMOD is behaving correctly as it appears you've figured out. The sample has a loop that is honored by FMOD and the last samples are simply repeated forever. While useless, this is correct and the variance in the samples is so slight as to not be audible. While not part of the original spec for wave format, extended information was added later to support meta data such as author, title, comments and multiple loop points.
Your best bet is to examine all your source assets for those that contain loop information. Simply playing all sounds without loop information is probably not the best workaround. Some loops may be intentional. Those that are will have code that stops them. Typically, in a game, the entire waveform is looped when looping is desired. You can then write or use a tool that will strip the loop information. If you do write your own tool, I'd recommend resampling the audio to the native output sampling rate of the hardware. You'd need to insure your resampler was sample accurate (no time shift) and did not introduce noise.
Historically, some game systems had a section at the end of the sound with silence and a loop point set on this region. The short reason for this was to reduce popping that might occur at the end of a sound in a hardware audio channel.
Curiosly, the last 16 samples of your .wav look like garbage and I'm wondering if the .wav assets you're using were converted from a source meant for a game console and that's where the bogus loop information came from as well.
This would have been a comment but my lowly rep does not allow it.

OpenCV - Earth Mover's Distance issue, icvInitEMD()

I am having trouble calling EMD() in OpenCV 2.4.2 under Mac OS ML.
I have a class with an attribute Mat _signature defined like that :
Mat _signature(size,dim+1,CV_32F);
for (int i = 0; i<size; ++i){
_signature.at<float>(i,0) = weight;
for (int j = 1; j < dim+1; ++j){
_signature.at<float>(i,j) = vec[i].at<float>(0,j-1); // vec[i] is a line vector containing the position in R^dim
}
}
I then have u and v 2 instances of that class, and when I call EMD(u._signature, v._signature, CV_DIST_L2);
It fails with OpenCV Error: One of arguments' values is out of range () in icvInitEMD, file /*SOME PATH*/OpenCV-2.4.2/modules/imgproc/src/emd.cpp, line 408
I looked at the sourcecode but could not figure out what this fails. My arguments appear in correspondence to what the documentation wants. Any help will be appreciated.
Ok, it took me quite some time to figure it out, but among my data there was a component of one of my vector which was miscalculated, and ended up being NaN.
Of course this was buried deep into my data so that it would be completely lost in any amount of data reasonably observable via a debugger (or even cout)
The cryptic error from OpenCV did the rest in confusing me.
For people stumbling upon the same issue as me :
Make sure your weight vectors are not zero
Make sure none of your data is NaN

Rcpp: Mac shows loading wheel and almost freeze

I created a R package which depends on Rcpp.
A function in this package supposed to show printing statements at every n iterations.
So I expect to see a new line on R console every few seconds.
The odd thing is that when I run my function in R GUI, the cursor becomes a loading wheel and R "almost" freezes. The loading wheel disappear once after the computation is done.
The minimal example of this situation is summarized as follow:
library(inline)
library(Rcpp)
test <- cxxfunction(
signature(),
body= '
RNGScope scope;
for (int i = 0; i < 100; i++)
{
sleep(1); // sleep one second at each iteration. this sleep is
// replaced by something in my code
if (i%20==0) Rprintf("\v%d iterations are done...",i);
}
return wrap(1);
' ,
plugin="Rcpp"
)
test()// freeze for 100 seconds!
I also found that if the code is run on terminal, the new lines appear every 20 seconds as I expected.
But I prefer to run it on R GUI.
I would appreciate if someone can tell me why this is happening..
I am using Mac.
Rgui buffers the output. I don't use Rgui, but try and find setting which controls whether or not output is buffered or not. For R code you could use flush.console to force the output to be shown, but I'm not entirely sure how this would work with C++ code.
The question is about R.app on the Mac, not Rgui on Windows. The solution below works for me: follow Rprintf with R_FlushConsole and R_ProcessEvents, like this:
RNGScope scope;
for (int i = 0; i < 100; i++) {
sleep(1); // sleep one second at each iteration. this sleep is
// replaced by something in my code
if (i%20==0) {
Rprintf("\v%d iterations are done...\n",i);
R_FlushConsole();
R_ProcessEvents();
}
return wrap(1);

Resources