OSX Snow Leopard: Static and dynamic link behaving differently - macos

I have some code structured like this:
Fixed mainline -> user code -> dependent library
These three parts can be statically linked and all is well.
Alternatively, the mainline can be turned into an executable
and the user code into a dylib, and the mainline loads the
user code with dlopen() and enters it with dlsym().
I have hundreds of test programs where this works fine, but two cases
where it fails:
Case 1: the dependent library is gmpxx (which depends on gmp).
Case 2: the dependent library is SDL
In the gmp case the dynamic program prints the correct answers by then terminates with:
flx_run(29601) malloc: *** error for object 0x7fff712ac500: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap
(gdb) bt
#0 0x00007fff8438e0b6 in __kill ()
#1 0x00007fff8442e9f6 in abort ()
#2 0x00007fff84346195 in free ()
#3 0x00000001000d3a35 in flxusr::gmp__hyphen_0::_init_ ()
Previous frame inner to this frame (gdb could not unwind past this frame)
I would guess some memory corruption is here.
In the SDL case:
~/felix>LD_LIBRARY_PATH=build/release/lib/rtl build/release/bin/flx_arun demos/sdl/sdl-1.01.03-0.dylib
frames in seconds = FPS
frames in seconds = FPS
frames in seconds = FPS
whereas
~/felix>demos/sdl/sdl-1.01.03-0
98 frames in 5.016 seconds = 19.5375 FPS
100 frames in 5.043 seconds = 19.8295 FPS
82 frames in 5.043 seconds = 16.2602 FPS
The frame rate data is formatted by
_urv20946 = ::flx::rtl::strutil::str(((double)PTF Frames / seconds ));
where
template<class T>
string str(T const &t) {
std::ostringstream x;
x << t;
return x.str();
}
noting this code is "in" a dependent library (well of course not, it is in a header file!)
Any ideas?

Related

ESP32 Deep Sleep Wakeup Though Touchpad

I'm trying to write a program that puts my ESP32 into a deep sleep state, and uses touch input to wake it up. I'm able to put it into the deep sleep state, but as soon as it enters, it wakes up and never calls into the callback function.
Reading the raw data from the touch pad, it idles around 25k, and touch inputs from my hand give it a value of around 180k. The 100k value in the code snippet below is the threshold to where I'm comfortable to determine that a touch has been detected.
I'd like to point out that this is different from ext0 and ext1 wake ups.
static void touchsensor_interrupt_cb(void *arg)
{
... // code here turns on an LED and prints to serial
}
void setup(){
...
touch_pad_init();
touch_pad_config(TOUCH_PAD_NUM2);
touch_pad_sleep_set_threshold(TOUCH_PAD_NUM2, 100000);
touch_pad_isr_register(touchsensor_interrupt_cb, NULL, TOUCH_PAD_INTR_MASK_ACTIVE);
touch_pad_intr_enable(TOUCH_PAD_INTR_MASK_ACTIVE);
touch_pad_sleep_channel_enable(TOUCH_PAD_NUM2, true);
touch_pad_set_fsm_mode(TOUCH_FSM_MODE_TIMER);
touch_pad_fsm_start();
esp_sleep_enable_touchpad_wakeup();
Serial.println("entering deep sleep");
esp_deep_sleep_start();
}
I've triple-checked that my circuit is correct. Running on an ESP32S3 Dev Kit v1.0. If there's a better place to post this please let me know.
The issue ended up being this line:
touch_pad_sleep_set_threshold(TOUCH_PAD_NUM2, 100000);
The threshold here was relative to the raw sensor data. Instead, it should be relative to raw_data - benchmark, where (I'm assuming) the benchmark is computed from what it considers the sensor's baseline value. You can get your benchmark using touch_pad_sleep_channel_read_benchmark.
Printing out the expression above on raw, unfiltered data gives values like:
01:25:10.619 -> 3
01:25:10.696 -> -4
01:25:10.774 -> 3
01:25:10.851 -> -10
01:25:10.930 -> -4
01:25:11.006 -> 20
01:25:11.100 -> -2
01:25:11.177 -> 2
01:25:11.255 -> 1
01:25:11.333 -> 0
Where those values oscillate around 0 (the noise in my touch sensor). I changed the line above it to:
touch_pad_sleep_set_threshold(TOUCH_PAD_NUM2, benchmark * threshold);
Where threshold is 0.2, meaning I'll consider a "touch" when the ESP32 reads sensor values of 120% of the benchmark value. I hope this helps someone.

CoreAudio: AudioUnit can neither be stopped nor uninitialized

I wrote a command line c tool generating an sine wave and playing it using CoreAudio on the default audio output. I am initializing a
AURenderCallbackStruct and initialize an AudioUnit using AudioUnitInitialize (as already discussed in this forum). All this is working as intended, but when it comes to closing the program I am not able to close the AudioUnit, neither with using AudioOutputUnitStop(player.outputUnit); nor AudioOutputUnitStop(player.outputUnit); nor
AudioComponentInstanceDispose(player.outputUnit);
The order of appearance of these calls in the code does not change the behavior.
The program is compiled without error messages, but the sine is still audible as long as the rest of the program is running.
Here is the code I'm using for initializing the AudioUnit:
void CreateAndConnectOutputUnit (ToneGenerator *player) {
AudioComponentDescription outputcd = {0};
outputcd.componentType = kAudioUnitType_Output;
outputcd.componentSubType = kAudioUnitSubType_DefaultOutput;
outputcd.componentManufacturer = kAudioUnitManufacturer_Apple;
AudioComponent comp = AudioComponentFindNext (NULL, &outputcd);
if (comp == NULL) {
printf ("can't get output unit");
exit (-1);
}
AudioComponentInstanceNew(comp, &player->outputUnit);
// register render callback
AURenderCallbackStruct input;
input.inputProc = SineWaveRenderCallback;
input.inputProcRefCon = player;
AudioUnitSetProperty(player->outputUnit,
kAudioUnitProperty_SetRenderCallback,
kAudioUnitScope_Output,
0,
&input,
sizeof(input);
// initialize unit
AudioUnitInitialize(player->outputUnit);
}
In my main program I'm starting the AudioUnit and the sine wave.
void main {
// code for doing various things
ToneGenerator player = {0}; // create a sound object
CreateAndConnectOutputUnit (&player);
AudioOutputUnitStart(player.outputUnit);
// waiting to listen to the sine wave
sleep(3);
// attempt to stop the sound output
AudioComponentInstanceDispose(player.outputUnit);
AudioUnitUninitialize(player.outputUnit);
AudioOutputUnitStop(player.outputUnit);
//additional code that should be executed without sine wave being audible
}
As I'm new to both, this forum as well as programming in Xcode I hope that I could explain this issue in a way that you can help me out and I hope that I didn't miss the answer somewhere in the forum while searching for a solution.
Thank you in advance for your time and input,
Stefan
You should manage and unmanage your audio unit in a logical order. It doesn't make sense to stop playback on an already uninitialized audio unit, which had in fact previously been disposed of in the middle of the playback. Rather than that, try the following order:
AudioOutputUnitStop(player.outputUnit); //first stops playback
AudioUnitUninitialize(player.outputUnit); //then deallocates unit's resources
AudioComponentInstanceDispose(player.outputUnit); //finally disposes of the AU itself
The sine wave command line app you're after is a well elaborated lesson in this textbook. Please read it step by step.
Last, but not least, your question has nothing to do with C++, CoreAudio is a plain-C API, so C++ in both your title and tag are wrong and misleading.
An Audio Unit runs in an asynchronous thread that may not actually stop immediately when you call AudioOutputUnitStop. Thus, it may work better to wait a fraction of a second (at least a couple audio callback buffer durations in time) before calling AudioUnitUninitialize and AudioComponentInstanceDispose on a potentially still running audio unit.
Also, check to make sure your player.outputUnit value is a valid unit (and not an uninitialized or trashed variable) at the time you stop the unit.

SysTick->LOAD vs SysTick->CALIB

I am currently porting my DCF77 library (you may find the source code at GitHub) from Arduino (AVR based) to Arduino Due (ARM Cortex M3).
The library requires precise 1ms timing. An obvious candidate is the use of the systicks. Conveneniently the Arduino Due is already setup for systicks with 1 kHz.
However my (AVR) DCF77 library is capable to tune the timing once it locks to DCF77. This is done by manipulating the timer reload values like so
void isr_handler() {
cumulated_phase_deviation += adjust_pp16m;
// 1 / 250 / 64000 = 1 / 16 000 000
if (cumulated_phase_deviation >= 64000) {
cumulated_phase_deviation -= 64000;
// cumulated drift exceeds 1 timer step (4 microseconds)
// drop one timer step to realign
OCR2A = 248;
} else if (cumulated_phase_deviation <= -64000) {
// cumulated drift exceeds 1 timer step (4 microseconds)
// insert one timer step to realign
cumulated_phase_deviation += 64000;
OCR2A = 250;
} else {
// 249 + 1 == 250 == 250 000 / 1000 = (16 000 000 / 64) / 1000
OCR2A = 249;
}
DCF77_Clock_Controller::process_1_kHz_tick_data(the_input_provider());
}
I want to port this to the ARM processor. In the ARM information center I found the following documentation.
Configuring SysTick
...
To configure the SysTick you need to load the SysTick Reload Value
register with the interval required between SysTick events. The timer
interrupt or COUNTFLAG bit (in the SysTick Control and Status
register) is activated on the transition from 1 to 0, therefore it
activates every n+1 clock ticks. If a period of 100 is required 99
should be written to the SysTick Reload Value register. The SysTick
Reload Value register supports values between 1 and 0x00FFFFFF.
If you want to use the SysTick to generate an event at a timed
interval, for example 1ms, you can use the SysTick Calibration Value
Register to scale your value for the Reload register. The SysTick
Calibration Value Register is a read-only register that contains the
number of pulses for a period of 10ms, in the TENMS field (bits 0 to
23). This register also has a SKEW bit (30) that is used to indicate
that the calibration for 10ms in the TENMS section is not exactly 10ms
due to small variations in clock frequency. Bit 31 is used to indicate
if the reference clock is provided.
...
Unfortunately I did not find anything on how SysTick->LOAD and SysTick->CALIB are connected. That is: if I want to throttle or accelerate systicks, do I need to manipulate the LOAD or the CALIB value? And which values do I need to put into these registers?
Searching the internet did not bring up any better hints. Maybe I am searching at the wrong places.
Is there anywhere a more detailed reference for these questions? Or maybe even some good examples?
Comparing the AtMega328 datasheet with the Cortex-M3 TRM, the standout point is that the timers work opposite ways round: on the AVR, you're loading a value into OCR2A and waiting for the timer in TCNT2 to count up to it, whereas on the M3 you load the delay value into SYST_RVR, then the system will count down from this value to 0 in SYST_CVR.
The big difference for calibration is going to be because the comparison value is fixed at 0 and you can only adjust the reload value, you might have more latency compared to adjusting the comparison value directly (assuming the counter reload happens at the same time the interrupt is generated).
The read-only value in SYST_CALIB (if indeed it even exists, being implementation-defined and optional), is merely for relating SYSTICK ticks to actual wallclock time - when first initialising the timer, you need to know the tick frequency in order to pick an appropriate reload value for your desired period, so having a register field that says "this many reference clock ticks happen in 10ms (possibly)" offers some possibility of calculating that at runtime in a portable fashion, rather than having to hard-code a value that might need changing for different devices.
In this case, however, not only does having an even-more-accurate external clock to synchronise against makes this less important, but crucially, the firmware has already configured the timer for you. Thus you can assume that whatever value is in SYST_RVR represents close-enough-to-1KHz, and work from there - in fact to simply fine-tune the 1KHz period you don't even need to know what the actual value is, just do SysTick->LOAD++ or SysTick->LOAD-- if the error gets too big in either direction.
Delving a bit deeper, the SAM3X datasheet shows that for the particular M3 implementation in that SoC, SYSTICK has a 10.5 MHz reference clock, therefore the SYST_CALIB register should give a value of 105000 ticks for 10ms. Except it doesn't, because apparently Atmel thought it would be really clever to make the unambiguously-named TENMS field give the tick count for 1ms, 10500, instead. Wonderful.
Just for the reason that others do not have to dig around like had to do - here is what I found out in addition.
In arduino-1.5.8/hardware/arduino/sam/system/CMSIS/CMSIS/Include/core_cm*.h there is code to manipulate SysTick. In particular in core_cm3.h there is a function
static __INLINE uint32_t SysTick_Config(uint32_t ticks)
{
if (ticks > SysTick_LOAD_RELOAD_Msk) return (1); /* Reload value impossible */
SysTick->LOAD = (ticks & SysTick_LOAD_RELOAD_Msk) - 1; /* set reload register */
NVIC_SetPriority (SysTick_IRQn, (1<<__NVIC_PRIO_BITS) - 1); /* set Priority for Cortex-M0 System Interrupts */
SysTick->VAL = 0; /* Load the SysTick Counter Value */
SysTick->CTRL = SysTick_CTRL_CLKSOURCE_Msk |
SysTick_CTRL_TICKINT_Msk |
SysTick_CTRL_ENABLE_Msk; /* Enable SysTick IRQ and SysTick Timer */
return (0); /* Function successful */
}
Then in arduino-1.5.8/hardware/arduino/sam/variants/arduino_due_x/variant.cpp in function init there is
// Set Systick to 1ms interval, common to all SAM3 variants
if (SysTick_Config(SystemCoreClock / 1000))
{
// Capture error
while (true);
}
Since SystemCoreClock evaluates to 84000000 it follows that this compiles like SysTick_Config(84000). I verified against a DCF77 module that SysTick_Config(84001) will slow down SysTicks while SysTick_Config(83999) will speed it up.

Nsight 2.2 sometimes works sometimes doesn't

I have problem about Parallel Nsight 2.2 debugger. It is very strange and I don't know how to describe it. Anyway, It works sometimes and sometimes doesn't.
What I observed is, that it works with dynamic array(this array has no effect on cuda_kernels or any other functon like cudaMemcpy atc...) named with 3 elements. And this is importnat... If I set size on 4+, it just falls down, no errors, nothing just fall down.
Interesting fact is, that if I run it normally via normal debugger hole program works correctly with right results. Also interesting fact is, that when set this array as static
unsigned topology[4];
and set in same values Nsight debugger works but very slowly.
So first of all I commented all cuda source code (like kernels and all cuda functions) but still same - it falls down. So I started to comment more host_code and I found loop (in host code) which does this creepy thing. So when program in Nsght-debug reach loop(under text) it falls down, BUT, when I write command in this loop to print number of each loop on screen, it runs, loop is finished, hole program is finished and then debugger told me:
Debug Assertion Failed!
Program:
File:f:\dd\vctools\crt_bld\self_x86\crt\src\dbgheap.c
Line: 1322
Expression: _CrtIsValidHeapPointer(pUserData)
.... I don't even have disk f ... so wtf???
Anyway, on normal debugger it runs fine and with right results.
This is mentioned loop and dynamic array *topology:
unsigned *topology;
unsigned numberOfLayersInput = 5;
topology = new unsigned [numberOfLayersInput];
topology[0] = 784;
topology[1] = 1000;
topology[2] = 800;
topology[3] = 300;
topology[4] = 10;
kernelTopology_ *topologyOfKernels;
topologyOfKernels = new kernelTopology_ [numberOfLayersInput - 1];
for (int i = 0, numberOfThreads; i < numberOfLayersInput; i++)
{
cout <<i << endl; // this is the added line!
numberOfThreads = fixedTopology[i];
topologyOfKernels[i].size = numberOfThreads;
if(numberOfThreads > THREADS_PER_BLOCK)
topologyOfKernels[i].BLOCK_SIZE = THREADS_PER_BLOCK;
else topologyOfKernels[i].BLOCK_SIZE = numberOfThreads;
if(numberOfThreads <= THREADS_PER_BLOCK)
topologyOfKernels[i].GRID_SIZE = 1;
else if(fixedTopology[i] % topologyOfKernels[i].BLOCK_SIZE == 0)
topologyOfKernels[i].GRID_SIZE = fixedTopology[i] / topologyOfKernels[i].BLOCK_SIZE;
else
topologyOfKernels[i].GRID_SIZE = (fixedTopology[i] / topologyOfKernels[i].BLOCK_SIZE) + 1;
}
I can't see any mistakes in this code... also normal debugger has no problem with it.
I have reinstalled graphics drivers, CUDA toolkit, CUDA SDK and Paralell Nsight but it does same creepy things. By the way I use Win 7 64 bit and VS2010.
Does have anyone any ideas what I should do with this?
Please, let me know if someone has any idea :)
The error
Debug Assertion Failed! Program: File:f:\dd\vctools\crt_bld\self_x86\crt\src\dbgheap.c Line: 1322
is from the Microsoft C runtime function _CrtIsValidHeapPointer. The default debug build adds additional heap and stack checks into the code. This function is used to verify that a specified pointer is in the local heap. The path f:... is the location of the source file in the C runtime. This function is at the time Microsoft built the library.
The assertion indicates an out of bounds memory access. The cause of the error appears to be incorrect allocation of topologyOfKernels.
corruption.topologyOfKernels = new kernelTopology_ [numberOfLayersInput - 1];
should be allocating numberofLayersInput elements.
corruption.topologyOfKernels = new kernelTopology_ [numberOfLayersInput];

where is memory leak in this code c++ opencv?

this is the code
CvMemStorage *mem123 = cvCreateMemStorage(0);
CvSeq* ptr123;CvRect face_rect123;
CvHaarClassifierCascade* cascade123 = (CvHaarClassifierCascade*)cvLoad("haarcascade_frontalface_alt2.xml" ); //detects the face if it's frontal
void HeadDetection(IplImage* frame,CvRect* face){
ptr123=cvHaarDetectObjects(frame,cascade123,mem123,1.2,2,CV_HAAR_DO_CANNY_PRUNING);
if(!ptr123){return ;}
if(!(ptr123->total)){return ;}
face_rect123=*(CvRect*)cvGetSeqElem( ptr123, 0 ); //CvRect face_rect holds the position of Rectangle
face->height=face_rect123.height;
face->width=face_rect123.width;
face->x=face_rect123.x;
face->y=face_rect123.y;
return ;
}//detects the position of head and it is fed in CvRect*face as rectangle
int main(){
IplImage* oldframe=cvCreateImage(cvSize(640,480),8,3);
CvCapture* capture=cvCaptureFromCAM(CV_CAP_ANY);
CvRect a;a.height=0;a.width=0;a.x=0;a.y=0;
while(1){
oldframe=cvQueryFrame(capture); //real frame captured of size 640x480
cvFlip(oldframe,oldframe,1);
cvResize(oldframe,frame); //frame scaled down 4 times
HeadDetection(frame,&a);
cvShowImage("frame",frame);
cvWaitKey(1);
}
}
Here if "HeadDetection(frame,&a);" is commented, then using task manager i see that angledetection.exe (name of my project) consumes 20188 Kb memory (No memory leak happening then).
However if I don't comment that the taskmanager shows that some memory leak is happening (around 300Kb/s )
I'm using VS 2010 on 64 bit windows 7 bit OS (core 2 duo).
This code is trying to detect face and get the four corners of square by haar detection in OpenCV 2.1
In case anything is unclear please ask. :-)
Thanks in advance.
You are getting a pointer to an object when you call cvHaarDetectObjects.
But you never free it ( the object that ptr123 points to).
Also face_rect123 isnt freed.
Btw you should consider refactoring the code and give better names to the variables.

Resources