I'm currently working on a game that creates a window using WindowsAPI. However, at the moment the process is taking up 50% of my CPU. All I am doing is creating the window and looping using the code found below:
int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nShowCmd)
{
MSG message = {0};
WNDCLASSEX wcl = {0};
wcl.cbSize = sizeof(wcl);
wcl.style = CS_OWNDC | CS_HREDRAW | CS_VREDRAW;
wcl.lpfnWndProc = WindowProc;
wcl.cbClsExtra = 0;
wcl.cbWndExtra = 0;
wcl.hInstance = hInstance = hInstance;
wcl.hIcon = LoadIcon(0, IDI_APPLICATION);
wcl.hCursor = LoadCursor(0, IDC_ARROW);
wcl.hbrBackground = 0;
wcl.lpszMenuName = 0;
wcl.lpszClassName = "GL2WindowClass";
wcl.hIconSm = 0;
if (!RegisterClassEx(&wcl))
return 0;
hWnd = CreateAppWindow(wcl, "Application");
if (hWnd)
{
if (Init())
{
ShowWindow(hWnd, nShowCmd);
UpdateWindow(hWnd);
while (true)
{
while (PeekMessage(&message, 0, 0, 0, PM_REMOVE))
{
if (message.message == WM_QUIT)
break;
TranslateMessage(&message);
DispatchMessage(&message);
}
if (message.message == WM_QUIT)
break;
if (hasFocus)
{
elapsedTime = GetElapsedTimeInSeconds();
lastEarth += elapsedTime;
lastUpdate += elapsedTime;
lastFrame += elapsedTime;
lastParticle += elapsedTime;
if(lastUpdate >= (1.0f / 100.0f))
{
Update(lastUpdate);
lastUpdate = 0;
}
if(lastFrame >= (1.0f / 60.0f))
{
UpdateFrameRate(lastFrame);
lastFrame = 0;
Render();
SwapBuffers(hDC);
}
if(lastEarth >= (1.0f / 10.0f))
{
UpdateEarthAnimation();
lastEarth = 0;
}
if(lastParticle >= (1.0f / 30.0f))
{
particleManager->rightBooster->Update();
particleManager->rightBoosterSmoke->Update();
particleManager->leftBooster->Update();
particleManager->leftBoosterSmoke->Update();
particleManager->breakUp->Update();
lastParticle = 0;
}
}
else
{
WaitMessage();
}
}
}
Cleanup();
UnregisterClass(wcl.lpszClassName, hInstance);
}
return static_cast<int>(message.wParam);
}
Where GetElapsedTimeInSeconds :
float GetElapsedTimeInSeconds()
{
static const int MAX_SAMPLE_COUNT = 50;
static float frameTimes[MAX_SAMPLE_COUNT];
static float timeScale = 0.0f;
static float actualElapsedTimeSec = 0.0f;
static INT64 freq = 0;
static INT64 lastTime = 0;
static int sampleCount = 0;
static bool initialized = false;
INT64 time = 0;
float elapsedTimeSec = 0.0f;
if (!initialized)
{
initialized = true;
QueryPerformanceFrequency(reinterpret_cast<LARGE_INTEGER*>(&freq));
QueryPerformanceCounter(reinterpret_cast<LARGE_INTEGER*>(&lastTime));
timeScale = 1.0f / freq;
}
QueryPerformanceCounter(reinterpret_cast<LARGE_INTEGER*>(&time));
elapsedTimeSec = (time - lastTime) * timeScale;
lastTime = time;
if (fabsf(elapsedTimeSec - actualElapsedTimeSec) < 1.0f)
{
memmove(&frameTimes[1], frameTimes, sizeof(frameTimes) - sizeof(frameTimes[0]));
frameTimes[0] = elapsedTimeSec;
if (sampleCount < MAX_SAMPLE_COUNT)
++sampleCount;
}
actualElapsedTimeSec = 0.0f;
for (int i = 0; i < sampleCount; ++i)
actualElapsedTimeSec += frameTimes[i];
if (sampleCount > 0)
actualElapsedTimeSec /= sampleCount;
return actualElapsedTimeSec;
}
So even when I am not drawing anything when the window has focus it still takes up 50%. I don't understand how this is taking up so much system resources.
Am I doing something wrong?
Any help would be much appreciated, thank you!
hasfocus falls out of the sky. You'll burn 100% core when it is true, there's nothing inside the if() statement that will make your program wait for anything. Sleep(1) would fix that but it isn't otherwise obvious what you intend to do.
To add to the other answers...
You need to throttle your game loop in some way. PeekMessage returns immediately if there are no messages so you are just looping as fast as possible, consuming 100% of one of your CPU cores. Presumably you see 50% as you have a dual core PC.
Instead of doing a Sleep() to avoid consuming 100% of the CPU cycles, call MsgWaitForMultipleObjects at the start of each loop, passing in zero handles, passing in a small timeout being your minimal interval between frames. Each time it returns, its because the timeout elapsed, OR because there are messages to process. If there are messages, process them (all) and then, either way, then process a game OnNextFrame / Repaint cycle. Use UpdateWindow instead of waiting for a WM_PAINT to be posted.
Your application is running in a tight loop and burning CPU by not doing anything useful. You have to add something like Sleep(1) into the loop.
This example you provided is a degenerate case and it doesn't really have to be fixed. If you going to make a game out of this, then you are going to put the frame update and render functionality into the loop and it will be doing some useful "stuff". In this case you wouldn't want to "sleep" any cycles.
If I were you I wouldn't start making a game from scratch and would look for some game engine. There's a lot of high quality open and closed source free game engines available on the market. Or at least look for some simple game skeletons that provide basic things like a message loop and double buffered window drawing setup.
Related
The true time it takes from when I send the first bit to a serial port to when I receive the last bit it pings back I measured to be 6ms but ReadFile takes around 70-80ms. I'm wondering if this is expected, is this just Windows or is it my code at fault? Here's the function to send and read from the serial port, in my main I have declared and initialized the HANDLE and called that function.
int sendBytes(char* command, char* COM, HANDLE hSerial, int read) {
BOOL Write_Status;
DCB dcbSerialParams = { 0 }; // Initializing DCB structure
dcbSerialParams.DCBlength = sizeof(dcbSerialParams);
Write_Status = GetCommState(hSerial, &dcbSerialParams); //retreives the current settings
if (Write_Status == FALSE) {
printf("\n Error! in GetCommState()");
CloseHandle(hSerial);
return 1;
}
dcbSerialParams.BaudRate = CBR_57600;
dcbSerialParams.ByteSize = 8;
dcbSerialParams.StopBits = ONESTOPBIT;
dcbSerialParams.Parity = NOPARITY;
Write_Status = SetCommState(hSerial, &dcbSerialParams); //Configuring the port according to settings in DCB
if (Write_Status == FALSE)
{
CloseHandle(hSerial);
return 1;
}
///*----------------------------- Writing a Character to Serial Port----------------------------------------*/
int length = strlen(command);
char send[20];
strcpy(send, command);
send[length + 1] = 13;
send[length + 2] = 10;
DWORD dNoOFBytestoWrite; // No of bytes to write into the port
DWORD dNoOfBytesWritten = 0; // No of bytes written to the port
dNoOFBytestoWrite = length + 2; // Calculating the no of bytes to write into the port
if (!WriteFile(hSerial, send, dNoOFBytestoWrite, &dNoOfBytesWritten, NULL))
printf("Error writing text to %s\n", COM);
if (read) {
int maxChars = 100;
BOOL Read_Status; // Status of the various operations
DWORD dwEventMask; // Event mask to trigger
char SerialBuffer[100]; // Buffer Containing Rxed Data
DWORD NoBytesRead; // Bytes read by ReadFile()
///*------------------------------------ Setting Receive Mask ----------------------------------------------*/
Read_Status = SetCommMask(hSerial, EV_RXCHAR); //Configure Windows to Monitor the serial device for Character Reception
if (Read_Status == FALSE)
printf("\n\n Error! in Setting CommMask");
// else
// printf("\n\n Setting CommMask successfull");
///*------------------------------------ Setting WaitComm() Event ----------------------------------------*/
// printf("\n\n Waiting for Data Reception");
Read_Status = WaitCommEvent(hSerial, &dwEventMask, NULL); //Wait for the character to be received
// /*-------------------------- Program will Wait here till a Character is received ------------------------*/
if (Read_Status == FALSE)
{
printf("\n Error! in Setting WaitCommEvent()");
}
else //If WaitCommEvent()==True Read the RXed data using ReadFile();
{
// printf("\n\n Characters Received \t");
clock_t begin = clock();
if (!ReadFile(hSerial, SerialBuffer, 24, &NoBytesRead, NULL))
{
printf("wrong character");
return 1;
}
clock_t end = clock();
double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
printf("time : %f\n",time_spent);
}
}
}
This is not how you measure timing with sub-second precision:
clock_t begin = clock();
// stuff
clock_t end = clock();
double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
This is how you measure timing:
LARGE_INTEGER before, after, frequency;
QueryPerformanceCounter(&before);
// stuff
QueryPerformanceCounter(&after);
QueryPerformanceFrequency(&frequency);
double time_spent = (after.QuadPart - before.QuadPart) / (double)frequency.QuadPart;
CLOCKS_PER_SEC is imprecise, and then clock() can be even worse, often as bad as the scheduler quantum which is typically 10ms or 15ms.
I try to use ZeroMQ for fast message passing. Messages need to be delivered in less than 1 [ms]. I did some testing (inproc, single process on Linux, no TCP) and see that usually there is no problem with that. The latency is about 10 - 100 [us], depending on how often the messages are sent (why?). Sometimes however messages are received after 6 [ms] which is unacceptable.
What can be the cause that some messages are delayed?
Maybe the process is preempted?
Or it's because of polling used (zmq_poll())?
Example results from my test :
avg lag = 28 [us]
max lag = 5221 [us]
std dev = 25.85 [us]
big lag = 180 x above 200 [us]
"big lag" means number of cases where latency was over 200 [us]. In my tests there are 500 000 messages sent so the value 180 means that latency over 200 [us] was recorded in 180 / 500000 = 0,036%. It's a quite low number but I'd like it to be zero. Even on the expense of average latency.
The test source code is below :
#include <stdlib.h>
#include <math.h>
#include <zmq.h>
#include <pthread.h>
#define SOCKETS_NUM 5
#define RUNS 100000
void *context;
int numbers[SOCKETS_NUM];
struct {
struct timespec send_time;
struct timespec receive_time;
} times[SOCKETS_NUM * RUNS], *ptimes;
static void * worker_thread(void * dummy) {
int * number = dummy;
char endpoint[] = "inproc://endpointX";
endpoint[17] = (char)('0' + *number);
void * socket = zmq_socket(context, ZMQ_PUSH);
zmq_connect(socket, endpoint);
struct timespec sleeptime, remtime;
int rnd = rand() / 3000;
sleeptime.tv_sec = 0;
sleeptime.tv_nsec = rnd;
nanosleep(&sleeptime, &remtime);
clock_gettime(CLOCK_REALTIME, &(ptimes[*number].send_time));
zmq_send(socket, "Hello", 5, 0);
zmq_close(socket);
return NULL;
}
static void run_test(zmq_pollitem_t items[]) {
pthread_t threads[SOCKETS_NUM];
for (int i = 0; i < SOCKETS_NUM; i++) {
pthread_create(&threads[i], NULL, worker_thread, &numbers[i]);
}
char buffer[10];
int to_receive = SOCKETS_NUM;
for (int i = 0; i < SOCKETS_NUM; i++) {
int rc = zmq_poll(items, SOCKETS_NUM, -1);
for (int j = 0; j < SOCKETS_NUM; j++) {
if (items[j].revents & ZMQ_POLLIN) {
clock_gettime(CLOCK_REALTIME, &(ptimes[j].receive_time));
zmq_recv(items[j].socket, buffer, 10, 0);
}
}
to_receive -= rc;
if (to_receive == 0) break;
}
for (int i = 0; i < SOCKETS_NUM; i++) {
pthread_join(threads[i], NULL);
}
}
int main(void)
{
context = zmq_ctx_new();
zmq_ctx_set(context, ZMQ_THREAD_SCHED_POLICY, SCHED_FIFO);
zmq_ctx_set(context, ZMQ_THREAD_PRIORITY, 99);
void * responders[SOCKETS_NUM];
char endpoint[] = "inproc://endpointX";
for (int i = 0; i < SOCKETS_NUM; i++) {
responders[i] = zmq_socket(context, ZMQ_PULL);
endpoint[17] = (char)('0' + i);
zmq_bind(responders[i], endpoint);
numbers[i] = i;
}
time_t tt;
time_t t = time(&tt);
srand((unsigned int)t);
zmq_pollitem_t poll_items[SOCKETS_NUM];
for (int i = 0; i < SOCKETS_NUM; i++) {
poll_items[i].socket = responders[i];
poll_items[i].events = ZMQ_POLLIN;
}
ptimes = times;
for (int i = 0; i < RUNS; i++) {
run_test(poll_items);
ptimes += SOCKETS_NUM;
}
long int lags[SOCKETS_NUM * RUNS];
long int total_lag = 0;
long int max_lag = 0;
long int big_lag = 0;
for (int i = 0; i < SOCKETS_NUM * RUNS; i++) {
lags[i] = (times[i].receive_time.tv_nsec - times[i].send_time.tv_nsec + (times[i].receive_time.tv_sec - times[i].send_time.tv_sec) * 1000000000) / 1000;
if (lags[i] > max_lag) max_lag = lags[i];
total_lag += lags[i];
if (lags[i] > 200) big_lag++;
}
long int avg_lag = total_lag / SOCKETS_NUM / RUNS;
double SD = 0.0;
for (int i = 0; i < SOCKETS_NUM * RUNS; ++i) {
SD += pow((double)(lags[i] - avg_lag), 2);
}
double std_lag = sqrt(SD / SOCKETS_NUM / RUNS);
printf("avg lag = %l5d [us]\n", avg_lag);
printf("max lag = %l5d [us]\n", max_lag);
printf("std dev = %8.2f [us]\n", std_lag);
printf("big lag = %l5d x above 200 [us]\n", big_lag);
for (int i = 0; i < SOCKETS_NUM; i++) {
zmq_close(responders[i]);
}
zmq_ctx_destroy(context);
return 0;
}
Q : "...I'd like it to be zero."
Cool to say, yet hard to make.
As you run an ultra-fast, memory-mapped inproc:// Transport Class, the main focus will be performance tweaking of the Context()-processing. Here, you spend so awfully much setup-overhead & straight termination-overhead operations to send 1E5-times just a 5 [B], so I guess there will never be a queue-management related issue, as there will never be any "stack-growing" at all.
1 ) ( suppose we let the code as-is ) it would be a natural step for the performance tuning to at least set the ZeroMQ mapping of a socket-CPU_core ZMQ_AFFINITY ( not jumping or wandering from core to core ). It may be interesting to see, if that many ~ 5E5 socket setups/terminations on the PUSH-er side, each without ever sending more than a single shot of 5 [B] over the memory-mapped line, could get some help (for those large overheads & maintenance) from configuring the context-instance with SOCKETS_NUM I/O-threads, using the ZMQ_IO_THREADS setting ( fighting for a "RealTime"-ness, using the SCHED_FIFO, having only one I/O-thread does not help much, does it? )
2 ) next level of experimentation is to re-balance the ZMQ_THREAD_AFFINITY_CPU_ADD maps (the global context's I/O-threads onto CPU-cores) and the per-socket setup of the ZMQ_AFFINITY maps onto the context's I/O-thread(s). Having sufficient amount of CPU-cores, there may be some performance / ultra-low latency benefits from making several gangs-of-I/O-threads serving one socket-instance stay "together", on a same CPU-core, yet here we get into territory, where the actual hardware and the real-system's background workloads & still-"spare"-resources for this "RealTime"-ambition motivated experimenting start to become hard to predict without any in-vivo testing & validation.
3 ) tweaking per-socket zmq_setsockopt() parameters may help, yet unless a nano-scaled socket-lifetime ( rather an expensive one-time used "consumable-disposable" ), do not expect any breakthrough from here.
4 ) trying to measure with a nanosecond resolution, the more if used for "durations" of something, ought be used by CLOCK_MONOTONIC_RAW, that avoids ntp-injected adjustments, astronomy-correcting leap seconds injections et al.
5 ) the zmq_poll()-strategy: I would no go this way. Using the timeout == -1 is blocking the whole circus. A thing I strongly discourage in any distributed-computing system, the more in one, that has a "RealTime" ambition. Spinning the PULL-side to a max performance may go via having a 1:1 PUSH/PULL threads on either side, or if trying to challenge the grooming, have 5-PUSH-er threads, as you have it, and collect all ingress messages on a just single, Zero-Copy well oiled PULL-er ( easier polling, may use a payload-based index-helper to which send-side timestamp to put the receive-side timestamp ), anyway, the blocking poller is almost the anti-pattern for challenging any low-latency soft-realtime toys.
Anyway, do not hesistate to refactor the code and to use profiling tools to better see, where you "acquire" the big_lag-s ( my guesses are above )
#include <stdlib.h>
#include <math.h>
#include <zmq.h>
#include <pthread.h>
#define SOCKETS_NUM 5
#define RUNS 100000
void *context;
int numbers[SOCKETS_NUM];
struct {
struct timespec send_time;
struct timespec recv_time;
} times[SOCKETS_NUM * RUNS],
*ptimes;
static void *worker_thread( void *dummy ) { //-------------------------- an ovehead expensive one-shot PUSH-based "Hello"-sender & .close()
int *number = dummy;
char endpoint[] = "inproc://endpointX";
endpoint[17] = (char)( '0' + *number );
int rnd = rand() / 3000;
void *socket = zmq_socket( context, ZMQ_PUSH );
struct timespec remtime,
sleeptime;
sleeptime.tv_sec = 0;
sleeptime.tv_nsec = rnd;
zmq_connect( socket, endpoint );
nanosleep( &sleeptime, &remtime ); // anything betweed < 0 : RAND_MAX/3000 > [ns] ... easily >> 32, as #define RAND_MAX 2147483647 ~ 715 827 [ns]
clock_gettime( CLOCK_REALTIME, &( ptimes[*number].send_time) ); //............................................................................ CLK_set_NEAR_SEND
// any CLOCK re-adjustments may and will skew any non-MONOTONIC_CLOCK
zmq_send( socket, "Hello", 5, 0 );
zmq_close( socket );
return NULL;
}
static void run_test( zmq_pollitem_t items[] ) { //--------------------- zmq_poll()-blocked zmq_recv()-orchestrator ( called ~ 1E5 x !!! resources' nano-use & setup + termination overheads matter )
char buffer[10];
int to_receive = SOCKETS_NUM;
pthread_t threads[SOCKETS_NUM];
for ( int i = 0; i < SOCKETS_NUM; i++ ) { //------------------------ thread-maker ( a per-socket PUSH-er[]-s )
pthread_create( &threads[i], NULL, worker_thread, &numbers[i] );
}
for ( int i = 0; i < SOCKETS_NUM; i++ ) { //------------------------ [SERIAL]-------- [i]-stepping
int rc = zmq_poll( items, SOCKETS_NUM, -1 ); //----------------- INFINITE ??? --- blocks /\/\/\/\/\/\/\/\/\/\/\ --- several may flag ZMQ_POLLIN
for ( int j = 0; j < SOCKETS_NUM; j++ ) { //-------------------- ALL-CHECKED in a loop for an items[j].revents
if ( items[j].revents & ZMQ_POLLIN ) { //------------------- FIND IF IT WAS THIS ONE
clock_gettime( CLOCK_REALTIME, &( ptimes[j].recv_time ) );//...................................................................... CLK_set_NEAR_poll()_POSACK'd R2recv
zmq_recv( items[j].socket, buffer, 10, 0 ); //---------- READ-IN from any POSACK'd by zmq_poll()-er flag(s)
}
}
to_receive -= rc; // ---------------------------------------------------------------------------------------------- SUB rc
if (to_receive == 0) break;
}
for ( int i = 0; i < SOCKETS_NUM; i++ ) { //------------------------ thread-killer
pthread_join( threads[i], NULL );
}
}
int main( void ) {
context = zmq_ctx_new();
zmq_ctx_set( context, ZMQ_THREAD_SCHED_POLICY, SCHED_FIFO );
zmq_ctx_set( context, ZMQ_THREAD_PRIORITY, 99 );
void *responders[SOCKETS_NUM];
char endpoint[] = "inproc://endpointX";
for ( int i = 0; i < SOCKETS_NUM; i++ ) {
responders[i] = zmq_socket( context, ZMQ_PULL ); // ------------ PULL instances into []
endpoint[17] = (char)( '0' + i );
zmq_bind( responders[i], endpoint ); //------------------------- .bind()
numbers[i] = i;
}
time_t tt;
time_t t = time(&tt);
srand( (unsigned int)t );
zmq_pollitem_t poll_items[SOCKETS_NUM];
for ( int i = 0; i < SOCKETS_NUM; i++ ) { //------------------------ zmq_politem_t array[] ---pre-fill---
poll_items[i].socket = responders[i];
poll_items[i].events = ZMQ_POLLIN;
}
ptimes = times;
for ( int i = 0; i < RUNS; i++ ) { //------------------------------- 1E5 RUNs
run_test( poll_items ); // ------------------------------------- RUN TEST
ptimes += SOCKETS_NUM;
}
long int lags[SOCKETS_NUM * RUNS];
long int total_lag = 0;
long int max_lag = 0;
long int big_lag = 0;
for ( int i = 0; i < SOCKETS_NUM * RUNS; i++ ) {
lags[i] = ( times[i].recv_time.tv_nsec
- times[i].send_time.tv_nsec
+ ( times[i].recv_time.tv_sec
- times[i].send_time.tv_sec
) * 1000000000
) / 1000; // --------------------------------------- [us]
if ( lags[i] > max_lag ) max_lag = lags[i];
total_lag += lags[i];
if ( lags[i] > 200 ) big_lag++;
}
long int avg_lag = total_lag / SOCKETS_NUM / RUNS;
double SD = 0.0;
for ( int i = 0; i < SOCKETS_NUM * RUNS; ++i ) {
SD += pow( (double)( lags[i] - avg_lag ), 2 );
}
double std_lag = sqrt( SD / SOCKETS_NUM / RUNS );
printf("avg lag = %l5d [us]\n", avg_lag);
printf("max lag = %l5d [us]\n", max_lag);
printf("std dev = %8.2f [us]\n", std_lag);
printf("big lag = %l5d x above 200 [us]\n", big_lag);
for ( int i = 0; i < SOCKETS_NUM; i++ ) {
zmq_close( responders[i] );
}
zmq_ctx_destroy( context );
return 0;
}
Using nanosleep for a random (not cardinal, safely outside of any control-loop(s) activity) sleep is rather a risky luxury, as in earlier kernels caused problems:
In order to support applications requiring much more precise pauses (e.g., in order to control some time-critical hardware), nanosleep() would handle pauses of up to 2 ms by busy waiting with microsecond precision when called from a thread scheduled under a real-time policy like SCHED_FIFO or SCHED_RR. This special extension was removed in kernel 2.5.39, hence is still present in current 2.4 kernels, but not in 2.6 kernels.
I would like to get a background subtraction method for outdoor conditions, capable of gradually adjust itself to environment light variations but with the capacity of revealing a presence even that is not in motion.
The problem with adaptive opencv background subtraction methods is that they are only capable to detect a presence when it is moving. On the other hand, old background subtraction methods do not work when the conditions of light are not always the same.
In order to get this I’ve modified the Golan Levin’s method in the video library of Processing (actual frames are compared with a first initial frame), setting a certain low difference threshold.
I therefore assume that all changes over that threshold are due to presence (persons, animals, etc), and changes below this are due to progressive light conditions, and I put this changed pixel in the background’s pixels array.
/* auto-updating background part*/
diferencia = diffR+diffG+diffB;
if (diferencia<minDif) backgroundPixels[i]=video.pixels[i];
That’s not working satisfactorily, image gets dirty, far off from being homogenous. Any idea of how to achieve this would be extremely welcome.
I post the whole code, if it could be of some help. Thanks a lot for your time.
import processing.video.*;
int numPixels;
int[] backgroundPixels;
Capture video;
int camSel=0;
int topDiff=763;
int unbralDif=120;
int mindDif=20;
boolean subtraction, lowSubtr;
PGraphics _tempPG;
void setup() {
size(640, 480);
_tempPG=createGraphics(width, height);
if (camSel==0)video = new Capture(this, width, height);
else video = new Capture(this, width, height, Capture.list()[1]);
video.start();
numPixels = video.width * video.height;
backgroundPixels = new int[numPixels];
loadPixels();
}
void draw() {
if (video.available()) {
video.read();
video.loadPixels();
int presenceSum = 0;
for (int i = 0; i < numPixels; i++) {
color currColor = video.pixels[i];
color bkgdColor = backgroundPixels[i];
int currR = (currColor >> 16) & 0xFF;
int currG = (currColor >> 8) & 0xFF;
int currB = currColor & 0xFF;
int bkgdR = (bkgdColor >> 16) & 0xFF;
int bkgdG = (bkgdColor >> 8) & 0xFF;
int bkgdB = bkgdColor & 0xFF;
int diffR = abs(currR - bkgdR);
int diffG = abs(currG - bkgdG);
int diffB = abs(currB - bkgdB);
presenceSum += diffR + diffG + diffB;
pixels[i] = 0xFF000000 | (diffR << 16) | (diffG << 8) | diffB;
/* auto-updating background part*/
int diferencia = diffR+diffG+diffB;
//detect pixels that have change below a threshold
if (lowSubtr && diferencia<mindDif) {
/* substitute with them the backgound img array */
backgroundPixels[i]=video.pixels[i];
}
/* end auto-updating background part*/
}
updatePixels();
}
subtraction=false;
}
void keyPressed() {
if (keyPressed)startSubtr();
}
void startSubtr() {
arraycopy(video.pixels, backgroundPixels);
lowSubtr=true;
}
void actualizacion(int[] _srcArr, int[] _inputArr, int _ind) {
for (int i=0; i<_srcArr.length; i++) {
_srcArr[_ind]=_inputArr[i];
}
}
I have an application that render a cube and do some other syuff. But unfortunately, when I want to move some vertices with the mouse. I have an error. When I map the vertex buffer, the vertices recorded in another struct array is empty. For testing my application, I just put a map after creating the vertexbuffer in order to see if the debugger show me the real numbers. It fails too. Everything is populated with a 0.00 value (positions, normals, tangents...).What is the problem ?
Here you can find the code.
D3D11_BUFFER_DESC bd;
ZeroMemory( &bd, sizeof(bd) );
bd.Usage = D3D11_USAGE_DYNAMIC; //D3D11_USAGE_DEFAULT; //D3D11_USAGE_DYNAMIC;// D3D11_USAGE_DEFAULT;
bd.ByteWidth = CBSize( sizeof( VERTEX_PNTTB)* NumVertices);
bd.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bd.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE; // 0; // D3D11_CPU_ACCESS_READ; //D3D11_CPU_ACCESS_WRITE; // 0;
bd.StructureByteStride = 0;
bd.MiscFlags = 0;
//bd.StructureByteStride
//bd.MiscFlags = 0;
D3D11_SUBRESOURCE_DATA InitData;
ZeroMemory( &InitData, sizeof(InitData) );
InitData.pSysMem = (void*)vertices; //(void*)(mesh->GetVertexes().data()); //vertices; //(float*)vertices; // (UINT*)vertices;
InitData.SysMemPitch=0;
ID3D11Buffer* vbuff = NULL;
hr = device->CreateBuffer(&bd, &InitData, &vbuff); // &(m->vertexBuffer)); //&m_pVertexBuffer );
//if( FAILED( hr ) )
// return hr;
//}
m->vertexBuffer = vbuff;
D3D11_MAPPED_SUBRESOURCE mappedResource;
ID3D11Buffer* &buff = vbuff;
//g_pImmediateContext->CopyResource(buff, mElements[ 0 ]->getElement().model ->Meshes()[selectedMeshIndex]->VertexBuffer());
hr = g_pImmediateContext->Map( buff, 0, D3D11_MAP_WRITE_DISCARD ,0, &mappedResource); // D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
if (SUCCEEDED(hr))
{
VERTEX_PNTTB *vertices = (VERTEX_PNTTB *)mappedResource.pData;
// Fill vertex buffer
//vertices[0].position = vertices[0]; // XMFLOAT3(toX0,toY0,0); //-1;//toX0; //-1;//vf1.x; // x0;
/*for(UINT i=0; i<faces_indices.size(); i++)
{
vertices[ faces_indices[i] ].position.x = vertices[faces_indices[i] ].position.x + DirectX::XMFLOAT3(20*dx,20*dy,0).x;
vertices[ faces_indices[i] ].position.y = vertices[faces_indices[i] ].position.y + DirectX::XMFLOAT3(20*dx,20*dy,0).y;
vertices[ faces_indices[i] ].position.z = vertices[faces_indices[i] ].position.z + DirectX::XMFLOAT3(20*dx,20*dy,0).z;
}*/
//g_pImmediateContext->Unmap( mElements[ 0 ]->getElement().model ->Meshes()[selectedMeshIndex]->VertexBuffer(), 0);
g_pImmediateContext->Unmap( buff, 0);
}
Generally you don't want to create Vertex Buffer or Index Buffers in CPU-readable memory as it has a major negative performance impact. You'll find it's much better to have static VB/IB and another copy of the data in standard RAM for the CPU to modify.
That said, you can create the Direct3D 11 buffer in shared memory by using D3D11_USAGE_DYNAMIC and D3D11_CPU_ACCESS_READ | D3D11_CPU_ACCESS_WRITE. You then call Map with D3D11_MAP_READ_WRITE.
You definitely should not use D3D11_MAP_WRITE_DISCARD which gives you a fresh piece of memory that has no data in it which will overwrite the old data when you call Unmap.
I'm using CoreAudio low level API for audio capturing. The app target is MAC OSX, not iOS.
During testing it, from time to time we got very annoying noise modulate with real audio. the phenomena develops with time, started from barely noticeable and become more and more dominant.
Analyze the captured audio under Audacity indicate that the end part of the audio packet is wrong.
Here are sample picture:
the intrusion repeat every 40 ms which is the configured packetization time (in terms of buffer samples)
Update:
Over time the gap became larger, here is another snapshot from the same captured file 10 minutes later. the gap now contains 1460 samples which is 33ms from the total 40ms of the packet!!
CODE SNIPPESTS:
capture callback
OSStatus MacOS_AudioDevice::captureCallback(void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData)
{
MacOS_AudioDevice* _this = static_cast<MacOS_AudioDevice*>(inRefCon);
// Get the new audio data
OSStatus err = AudioUnitRender(_this->m_AUHAL, ioActionFlags, inTimeStamp, inBusNumber, inNumberFrames, _this->m_InputBuffer);
if (err != noErr)
{
...
return err;
}
// ignore callback on unexpected buffer size
if (_this->m_params.bufferSizeSamples != inNumberFrames)
{
...
return noErr;
}
// Deliver audio data
DeviceIOMessage message;
message.bufferSizeBytes = _this->m_deviceBufferSizeBytes;
message.buffer = _this->m_InputBuffer->mBuffers[0].mData;
if (_this->m_callbackFunc)
{
_this->m_callbackFunc(_this, message);
}
}
Open and start capture device:
void MacOS_AudioDevice::openAUHALCapture()
{
UInt32 enableIO;
AudioStreamBasicDescription streamFormat;
UInt32 size;
SInt32 *channelArr;
std::stringstream ss;
AudioObjectPropertyAddress deviceBufSizeProperty =
{
kAudioDevicePropertyBufferFrameSize,
kAudioDevicePropertyScopeInput,
kAudioObjectPropertyElementMaster
};
// AUHAL
AudioComponentDescription cd = {kAudioUnitType_Output, kAudioUnitSubType_HALOutput, kAudioUnitManufacturer_Apple, 0, 0};
AudioComponent HALOutput = AudioComponentFindNext(NULL, &cd);
verify_macosapi(AudioComponentInstanceNew(HALOutput, &m_AUHAL));
verify_macosapi(AudioUnitInitialize(m_AUHAL));
// enable input IO
enableIO = 1;
verify_macosapi(AudioUnitSetProperty(m_AUHAL, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Input, 1, &enableIO, sizeof(enableIO)));
// disable output IO
enableIO = 0;
verify_macosapi(AudioUnitSetProperty(m_AUHAL, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Output, 0, &enableIO, sizeof(enableIO)));
// Setup current device
size = sizeof(AudioDeviceID);
verify_macosapi(AudioUnitSetProperty(m_AUHAL, kAudioOutputUnitProperty_CurrentDevice, kAudioUnitScope_Global, 0, &m_MacDeviceID, sizeof(AudioDeviceID)));
// Set device native buffer length before setting AUHAL stream
size = sizeof(m_originalDeviceBufferTimeFrames);
verify_macosapi(AudioObjectSetPropertyData(m_MacDeviceID, &deviceBufSizeProperty, 0, NULL, size, &m_originalDeviceBufferTimeFrames));
// Get device format
size = sizeof(AudioStreamBasicDescription);
verify_macosapi(AudioUnitGetProperty(m_AUHAL, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Input, 1, &streamFormat, &size));
// Setup channel map
assert(m_params.numOfChannels <= streamFormat.mChannelsPerFrame);
channelArr = new SInt32[streamFormat.mChannelsPerFrame];
for (int i = 0; i < streamFormat.mChannelsPerFrame; i++)
channelArr[i] = -1;
for (int i = 0; i < m_params.numOfChannels; i++)
channelArr[i] = i;
verify_macosapi(AudioUnitSetProperty(m_AUHAL, kAudioOutputUnitProperty_ChannelMap, kAudioUnitScope_Input, 1, channelArr, sizeof(SInt32) * streamFormat.mChannelsPerFrame));
delete [] channelArr;
// Setup stream converters
streamFormat.mFormatID = kAudioFormatLinearPCM;
streamFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger;
streamFormat.mFramesPerPacket = m_SamplesPerPacket;
streamFormat.mBitsPerChannel = m_params.sampleDepthBits;
streamFormat.mSampleRate = m_deviceSampleRate;
streamFormat.mChannelsPerFrame = 1;
streamFormat.mBytesPerFrame = 2;
streamFormat.mBytesPerPacket = streamFormat.mFramesPerPacket * streamFormat.mBytesPerFrame;
verify_macosapi(AudioUnitSetProperty(m_AUHAL, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Output, 1, &streamFormat, size));
// Setup callbacks
AURenderCallbackStruct input;
input.inputProc = captureCallback;
input.inputProcRefCon = this;
verify_macosapi(AudioUnitSetProperty(m_AUHAL, kAudioOutputUnitProperty_SetInputCallback, kAudioUnitScope_Global, 0, &input, sizeof(input)));
// Calculate the size of the IO buffer (in samples)
if (m_params.bufferSizeMS != -1)
{
unsigned int desiredSignalsInBuffer = (m_params.bufferSizeMS / (double)1000) * m_deviceSampleRate;
// making sure the value stay in the device's supported range
desiredSignalsInBuffer = std::min<unsigned int>(desiredSignalsInBuffer, m_deviceBufferFramesRange.mMaximum);
desiredSignalsInBuffer = std::max<unsigned int>(m_deviceBufferFramesRange.mMinimum, desiredSignalsInBuffer);
m_deviceBufferFrames = desiredSignalsInBuffer;
}
// Set device buffer length
size = sizeof(m_deviceBufferFrames);
verify_macosapi(AudioObjectSetPropertyData(m_MacDeviceID, &deviceBufSizeProperty, 0, NULL, size, &m_deviceBufferFrames));
m_deviceBufferSizeBytes = m_deviceBufferFrames * streamFormat.mBytesPerFrame;
m_deviceBufferTimeMS = 1000 * m_deviceBufferFrames/m_deviceSampleRate;
// Calculate number of buffers from channels
size = offsetof(AudioBufferList, mBuffers[0]) + (sizeof(AudioBuffer) * m_params.numOfChannels);
// Allocate input buffer
m_InputBuffer = (AudioBufferList *)malloc(size);
m_InputBuffer->mNumberBuffers = m_params.numOfChannels;
// Pre-malloc buffers for AudioBufferLists
for(UInt32 i = 0; i< m_InputBuffer->mNumberBuffers ; i++)
{
m_InputBuffer->mBuffers[i].mNumberChannels = 1;
m_InputBuffer->mBuffers[i].mDataByteSize = m_deviceBufferSizeBytes;
m_InputBuffer->mBuffers[i].mData = malloc(m_deviceBufferSizeBytes);
}
// Update class properties
m_params.sampleRateHz = streamFormat.mSampleRate;
m_params.bufferSizeSamples = m_deviceBufferFrames;
m_params.bufferSizeBytes = m_params.bufferSizeSamples * streamFormat.mBytesPerFrame;
}
eADMReturnCode MacOS_AudioDevice::start()
{
eADMReturnCode ret = OK;
LOGAPI(ret);
if (!m_isStarted && m_isOpen)
{
OSStatus err = AudioOutputUnitStart(m_AUHAL);
if (err == noErr)
m_isStarted = true;
else
ret = ERROR;
}
return ret;
}
Any idea what cause it and how to solve?
Thanks in advance!
Periodic glitches or dropouts can be caused by not paying attention to or by not fully processing the number of frames sent to each audio callback. Valid buffers don't always contain the expected or same number of samples (inNumberFrames might not equal bufferSizeSamples or the previous inNumberFrames in a perfectly valid audio buffer).
It is possible that these types of glitches might be caused by attempting to record at 44.1k on some models of iOS devices that only support 48k audio in hardware.
Some types of glitch might also be caused by any non-hard-real-time code within your m_callbackFunc function (such as any synchronous file reads/writes, OS calls, Objective C message dispatch, GC, or memory allocation/deallocation).