Performance problem with backgroundworkers - visual-studio

I have 15 BackgroundWorers that are running all the time, each one of them works for about half a second (making web request) and none of them is ever stopped.
I've noticed that my program takes about 80% of my computer's processing resources and about 15mb of memory (core 2 duo, 4gb ddr2 memory).
It it normal? web requests are not heavy duty, it just sends and awaits server response, and yes, running 15 of them is really not a pro-performance act (speed was needed) but i didn't think that it would be so intense.
I am new to programming, and i hardly ever (just as any new programmer, I assume) care about performance, but this time it is ridiculous, 80% of processing resources usage for a windows forms application with two listboxes and backgroundworkers making web requests isn't relly what expected.
info:
I use exception handling as part of my routine, which i've once read that isn't really good for performance
I have 15 background workers
My code assures none of them is ever idle
List item
windows forms, visual studio, c#.
------[edit - questions in answers]------
What exactly do you mean by "My code assures none of them is ever idle"?
The program remains waiting
while (bgw1.IsBusy || gbw2.IsBusy ... ... ...) { Application.DoWork();}
then when any of them is free, gets put back to work.
Could you give more details about the workload you're putting this under?
I make an HTTP web request object, open it and wait for the server request. It really has only a couple of lines and does no heavy processing, the half second is due to server awaiting.
In what way, and how many exceptions are being thrown?
When the page doesn't exist, there is a system.WebException, when it works it returns "OK", and about 99% of the pages i check don't exist, so i'd say about 300 exceptions per minute (putting it like this makes it sound creepy, i know, but it works)
If you're running in the debugger, then exceptions are much more expensive than they would be when not debugging
I'm not talking about running it in the debugger, I run the executable, the resulting EXE.

while (bgw1.IsBusy || gbw2.IsBusy ... ... ...) { Application.DoWork();}
What's Application.DoWork(); doing? If it's doing something quickly and returning, this loop alone will consume 100% CPU since it never stops doing something. You can put a sleep(.1) or so inside the loop, to only check the worker threads every so often instead of continuously.

This bit concerns me:
My code assures none of them is ever idle
What exactly do you mean by that?
If you're making thousands and thousands of web requests, and if those requests are returning very quickly, then that could eat some CPU.
Taking 15MB of memory isn't unexpected, but the CPU is the more worrying bit. Could you give more details about the workload you're putting this under? What do you mean by "each one of them workds for about half a second"?
What do you mean by "I use exception handling as part of my routine"? In what way, and how many exceptions are being thrown? If you're running in the debugger, then exceptions are much more expensive than they would be when not debugging - if you're throwing and catching a lot of exceptions, that could be responsible for it...

Run the program in the debugger, pause it ten times, and have a look at the stacktraces. Then you will know what is actually doing when it's busy.

From your text I read that you have a Core 2 Duo. Is that a 2 Threads or a 4 Threads?
If you have a 2 Threads you only should use 2 BackGroundworkers simultaneously.
If you have a 4 Threads then use 4 BGW's simultaneously. If you have more BGW's then use frequently the following statement:
System.Threading.Thread.Sleep(1)
Also use Applications.DOevents.
My general advice is: start simple and slowly make your application more complex.
Have a look at: Visual Basic 2010 Parallel Programming techniques.

Related

Using dot-trace to analyze high CPU usage application

I have a .NET CORE 3.1 Console application running on a Ubuntu20 x64 server, and randomly experiencing High Cpu(100% for 4 cores) cases.
I'm following diagnostics to start a diag during the peak time for my app like:
dotnet-trace collect -p 1039 --providers Microsoft-DotNETCore-SampleProfiler
from the resulted .nettrace file opened in Visual Studio, I can see the Funcions list with CPU Time for each single function.
But I understand the CPU time here is actually the wall time that just means the time of a function call stayed in a thread stack, and no matter it consumes real CPU caculation resource or not.
The hotest spot my this .nettrace now is pointing to these lines of code(pseudo code):
while(true)
{
Thread.Sleep(1000);//<---------hottest spot
socket.Send(bytes);
}
and
while(true)
{
ManualResetEvent.WaitOne();//<---------hottest spot
httpClient.Post(data);
}
Obviously above 2 hottest spot will not consume real CPU resource but just idle waitting, so any way to trace the functions that used the real cpu usage, just like the JetBrains dotTrace provided:
You might want to use external tools like top. This could help to identify the process consuming the CPU percentage.
If your profiler identifies Thread.Sleep() as hottest spot, chances are that your application is waiting for some external process outside the scope of the profiler.
I would suggest refactoring this code to use async and use 'await Task.Wait(xxx)' instead of doing this on the Thread level.
I'm having this suggestion based on partially similar problem which has been described here
Why Thread.Sleep() is so CPU intensive?

High CPU Usage with no load

We are running a windows service which every 5 seconds checks a folder for files and if found logs some info about it using NLog.
I already tried the suggestions from ASP.NET: High CPU usage under no load without succes.
When the service was just started there is hardly any CPU usage. After a few hours we see CPU peaks to 100% and after some more waiting the cpu graph looks like:
I tried the steps described in http://blogs.technet.com/b/sooraj-sec/archive/2011/09/14/collecting-data-using-xperf-for-high-cpu-utilization-of-a-process.aspx to produce information on what is going on:
I don't know where to continue. Any help appreciated
Who wrote this windows service? Was it you or 3rd party?
To me, checking folder for changes every 5 seconds sounds really suspicious, and maybe the primary reason why you are getting this massive slowdown.
If you do it right, you can get directory changes immediately as they happen, and yet spend almost no CPU time while doing that.
This Microsoft article explains how exactly to do that: Obtaining Directory Change Notifications by using functions FindFirstChangeNotification, FindNextChangeNotification, ReadDirectoryChangesW and WaitForMultipleObjects.
After a lot of digging it had to do with this:
The service had a private object X with property Y
Every time the service was fired X was passed to the Business Logic. There Y was used and in the end disposed. The garbage collector will then wait until X is disposed, which will never happen until the service is restarted. This caused an extra GC waiting thread every time the service was fired.

Handling windows events in a tight loop?

I have written a compiler and interpreter for a scripting language. The interpreter is a DLL ('The Engine') which runs in a single thread and can load many 100s or 1000s of compiled byte-code applications and excecute them as a set of internal processes. There is a main loop that excecutes a few instructions from each of the loaded app processes before moving one to the next process.
The byte code instruction in the compiled apps can either be a low level instructions (pop, push, add, sub etc) or a call to an external function library (which is where most of the work is done). These external libararies can call back to the engine to put the internal processes into a sleep state waiting for a particular event upon which the external function (probably after receiving an event) will wake up the internal process again. If all internal processes are in a sleep state (which the are most of the time) then I can put the Engine to sleep as well thus handing off the CPU to other threads.
However there is nothing to prevent someone writing a script which just does a tight loop like this:
while(1)
x=1;
endwhile
Which means my main loop will never enter a sleep state and so the CPU goes up to 100% and locks up the system. I want my engine to run as fast as possibly, whilst still handling windows events so that other applications are still responsive when a tight loop similar to the above is encountered.
So my first question is how to add code to my main loop to ensure windows events are handled without slowing down the main engine which should run at the fastest speed possible..
Also it would be nice to be able to set the maximum CPU usage my engine can use and throttle down the CPU usage by calling the occasional Sleep(1)..
So my second question is how can I throttle down then CPU usage to the required level?
The engine is written in Borland C++ and makes calls to the win32 API.
Thanks in advance
1. Running a message loop at the same time as running your script
I want my engine to run as fast as
possibly, whilst still handling
windows events so that other
applications are still responsive when
a tight loop similar to the above is
encountered.
The best way to continue running a message loop while performing another operation is to move that other operation to another thread. In other words, move your script interpreter to a second thread and communicate with it from your main UI thread, which runs the message loop.
When you say Borland C++, I assume you're using C++ Builder? In this situation, the main thread is the only one that interacts with the UI, and its message loop is run via Application->Run. If you're periodically calling Application->ProcessMessages in your library callbacks, that's reentrant and can cause problems. Don't do it.
One comment to your question suggested moving each script instance to a separate thread. This would be ideal. However, beware of issues with the DLLs the scripts call if they keep state - DLLs are loaded per-process, not per-thread, so if they keep state you may encounter threading issues. For the moment purely to address your current question, I'd suggest moving all your script execution to a single other thread.
You can communicate between threads many ways, such as by posting messages between them using PostMessage or PostThreadMessage. Since you're using Borland C++, you should have access to the VCL. It has a good thread wrapper class called TThread. Derive from this and put your script loop in Execute. You can use Synchronize (blocks waiting) or Queue (doesn't block; method may be run at any time, when the target thread processes its message loop) to run methods in the context of another thread.
As a side note:
so that other
applications are still responsive when
a tight loop similar to the above is
encountered.
This is odd. In a modern, preemptively multitasked version of Windows other applications should still be responsive even when your program is very busy. Are you doing anything odd with your thread priorities, or are you using a lot of memory so that other applications are paged out?
2. Handling an infinite loop in a script
You write:
there is nothing to prevent someone
writing a script which just does a
tight loop like this:
while(1) x=1; endwhile
Which means my main loop will never
enter a sleep state and so the CPU
goes up to 100% and locks up the
system.
but phrase how to handle this as:
Also it would be nice to be able to
set the maximum CPU usage my engine
can use and throttle down the CPU
usage by calling the occasional
Sleep(1)..
So my second question is how can I
throttle down then CPU usage to the
required level?
I think you're taking the wrong approach. An infinite loop like while(1) x=1; endwhile is a bug in the script, but it should not take down your host application. Just throttling the CPU won't make your application able to handle the situation. (And using lots of CPU isn't necessarily a problem: if it the work is available for the CPU to run, do it! There's nothing holy about using only a bit of your computer's CPU. It's there to use after all.) What (I think) you really want is to be able to continue to have your application able to respond when running this script (solved by a second thread) and then:
Detect when a script is 'not responding', or not calling into your callbacks
Be able to take action, such as asking the user if they want to terminate the script
An example of another program that does this is Firefox. If you go to a page with a misbehaving script, eventually you'll get a dialog asking if you want to stop the script running.
Without knowing more about how your script is actually interpreted or run, I can't give a detailed answer to these two. But I can suggest an approach, which is:
Your interpreter probably runs a loop, getting the next instruction and executing it. Your interactivity is currently provided by a callback running from one of those instructions being executed. I'd suggest making use of that by having your callback simply log the time it was last called. Then in your processing thread, every instruction (or every ten or a hundred) check the current time against the last callback time. If a long time has passed, say fifteen or thirty seconds, it may be an indication that the script is stuck. Notify the main thread but keep processing.
For "time", something like GetTickCount is probably sufficient.
Next step: Your main UI thread can react to this by asking the user what to do. If they want to terminate the script, communicate with the script thread to set a flag. In your script processing loop, again every instruction (or hundred) check for this flag, and if it's set, stop.
When you move to having one thread per script interpreter, you TThread's Terminated flag for this. Idiomatically for something that runs infinitely in a thread, you run in a while (!Terminated && [any other conditions]) loop in your Execute function.
To actually answer your question about using less CPU, the best approach is probably to change your thread's priority using SetThreadPriority to a lower priority, such as THREAD_PRIORITY_BELOW_NORMAL. It will still run if nothing else needs to run. This will affect your script's performance. Another approach is to use Sleep as you suggest, but this really is artificial. Perhaps SwitchToThread is slightly better - it yields to another thread the OS chooses. Personally, I think the CPU is there to use, and if you solve the problem of an interactive UI and handling out-of-control scripts then there should be no problem with using all CPU if your script needs it. If you're using "too much" CPU, perhaps the interpreter itself could be optimised. You'll need to run a profiler and find out where the CPU time is being spent.
Although a badly designed script might put you in a do-nothing loop, don't worry about it. Windows is designed to handle this kind of thing, and won't let your program take more than its fair share of the CPU. If it does manage to get 100%, it's only because nothing else wants to run.

Total system freezing when using timers in graphical application

I’m really stuck with this issue and will greatly appreciate any advice.
The problem:
Some of our users complain about total system “freezing” when using our product. No matter how we tried, we couldn’t reproduce it in any of systems available for troubleshooting.
The product:
Physically, it’s a 32bit/64bit DLL. The product has a self-refreshing GUI, which draws a realtime spectrogram of an audio signal
Problem details:
What I managed to collect from a number of fragmentary reports makes the following picture:
When GIU is opened, sometimes immediately, sometimes after a few minutes of GIU being visible, the system completely stalls, without possibility to operate with windows, start Task Manager etc. No reactions on keyboard, no mouse cursor seen (or it’s seen but is not responsibe to mouse movements – this I do not know). The user has to hard-reset the system in order to reboot. What is important, I think, is that (in some cases) for some time the GIU is responsive and shows some adequate pictures. Then this freezing happens. One of the reports tells that once the system was frozen, the audio continued to be rendered – i.e. heard by the reporter (but the whole graphic shell of Windows was already frozen). Note: in this sort of apps it’s usually a specialized thread which is responsible for sound processing.
The freezing is more or less confirmed to happen for 2 users on Windows7 x64 using both 32 and 64 bit versions of the DLL, never heard of any other OSs mentioned with connection to this freezing (though there was 1 report without any OS specified).
That’s all that I managed to collect.
The architecture / suspicions:
I strongly suspect that it’s the GUI refreshing cycle that is a culprit.
Basically, it works like this:
There is a timer that triggers callbacks at a frame rate of approx 25 fps.
In this callback audio analysis is performed and GUI updated
Some details about the timer:
It’s based on this call:
CreateTimerQueueTimer(&m_timerHandle, NULL, xPlatformTimerCallbackWrapper,
this, m_firstExpInterval, m_period, WT_EXECUTEINTIMERTHREAD);
We create a timer and m_timerHandle is called periodically.
Some details about the GUI refreshing:
It works like this:
HDC hdc = GetDC (hwnd);
// Some drawing
ReleaseDC(hwnd,hdc);
My intuition tells me that this CreateTimeQueueTimer might be not the right decision. The reference page tells that in case of using WT_EXECUTEINTIMERTHREAD:
The callback function is invoked by the timer thread itself. This flag
should be used only for short tasks or
it could affect other timer
operations. The callback function is
queued as an APC. It should not
perform alertable wait operations.
I don’t remember why this WT_EXECUTEINTIMERTHREAD option was chosen actually, now WT_EXECUTEDEFAULT seems equally suitable for me.
In fact, I don’t see any major difference in using any of the options mentioned in the reference page.
Questions:
Is anything of what was told give anyone any clue on what might be wrong?
Have you faced similar problems, what was the reason?
Thanks for any info!
==========================================
Update: 2010-02-20
Unfortunatelly, the advise given here (which I could check so far) didn't help, namelly:
changing to WT_EXECUTEDEFAULT in CreateTimerQueueTimer(&m_timerHandle,NULL,xPlatformTimerCallbackWrapper,this,m_firstExpInterval,m_period, WT_EXECUTEDEFAULT);
the reenterability guard was already there
I havent' yet checked if updateding the GUI in WM_PAINT hander helps or not
Thanks for the hints anyway.
Now, I've been playing with this for a while, also got a real W7 intallation (I used to use the virtual one) and it seems that the problem can be narrowed down.
On my installation, using of the app really get the GUI far less responsive, although I couldn't manage to reproduce a total system freezing as someone reported.
My assumption now is this responsiveness degradation and reported total freezing have a common origin.
Then I did some primitive profiling and found that at least one of the culprits is BitBlt function that is called approx 50 times a second
BitBlt ((HDC)pContext->getSystemContext (), // hdcDest
destRect.left + pContext->offset.h,
destRect.top + pContext->offset.v,
destRect.right - destRect.left,
destRect.bottom - destRect.top,
(HDC)pSystemContext,
srcOffset.h,
srcOffset.v,
SRCCOPY);
The regions being copied are not really large (approx. 400x200 pixels). It is used for displaying the backbuffer and is executed in the timer callback.
If I comment out this BitBlt call, the problem seems to disappear (at least partly).
On the same machine running WinXP everything works just fine.
Any ideas on this?
Most likely what's happening is that your timer callback is taking more than 25 ms to execute. Then another timer tick comes along and it starts processing, too. And so on, and pretty soon you have a whole bunch of threads sucking down CPU cycles, all trying to do your audio analysis and in short order the system is so busy doing thread context switches that no real work gets done. And all the while, more and more timer ticks are getting placed into the queue.
I would strongly suggest that you use WT_EXECUTEDEFAULT here, rather than WT_EXECUTEINTIMERTHREAD. Also, you need to prevent overlapping timer callbacks. There are several ways to do that.
You can use a critical section in your timer callback. When the callback is triggered it calls TryEnterEnterCriticalSection and if not successful, just returns without doing anything.
You can do something similar using a volatile variable and InterlockedCompareExchange.
Or, you can change your timer to be a one-shot (WT_EXECUTEONLYONCE), and then re-set the timer at the end of every callback. That would make the thing execute 25 ms after the last one completed.
Which you choose is up to you. If your analysis often takes longer than 25 ms but not more than 35 ms, then you'll probably get a smoother update rate using WT_EXECUTEONLYONCE. If it's rare that analysis takes more than 25 ms, or if it often takes more than about 35 ms (but less than 50 ms), then you're probably better off using one of the other techniques.
Of course, if it often takes longer than 25 ms, then you probably want to increase the time (reduce the update rate).
Also, as one of the commenters pointed out, it's possible that the problem also involves accessing the GUI from the timer thread. You should do all of your analysis in the timer thread, store the results somewhere that the main thread can access it, and then send a message to the window proc, telling it to update the display.
Have you asked the users to disable Aero/WDMDWM? With Aero enabled, rendering is implemented quite different. Without Aero, the behaviour will be similar to XP. Not that it solves anything, but it will give you a clue as to what the problem is.

Problems with running an application under controlled environment (Win32)

I'm not exactly sure how to tag this question or how to write the title, so if anyone has a better idea, please edit it
Here's the deal:
Some time ago I had written a little but cruicial part of a computing olympiad management system. The system's job is to get submissions from participants (code files), compile them, run them against predefined test cases, and return results. Plus all the rest of the stuff you can imagine it should do.
The part I had written was called Limiter. It was a little program whose job was to take another program and run it in a controlled environment. Controlled in this case means limitations on available memory, computing time and access to system resources. Plus if the program crashes I should be able to determine the type of the exception and report that to the user. Also, when the process terminated, it should be noted how long it executed (with a resolution of at least 0.01 seconds, better more).
Of course, the ideal solution to this would be virtualization, but I'm not that experienced to write that.
My solution to this was split into three parts.
The simplest part was the access to system resources. The program would simply be executed with limited access tokens. I combined some of the basic (Everyone, Anonymous, etc.) access tokens that are available to all processes in order to provide practically a read-only access to the system, with the exception of the folder it was executing in.
The limitation of memory was done through job objects - they allow to specify maximum memory limit.
And lastly, to limit execution time and catch all the exceptions, my Limiter attaches to the process as a debugger. Thus I can monitor the time it has spent and terminate it if it takes too long. Note, that I cannot use Job objects for this, because they only report Kernel Time and User Time for the job. A process might do something like Sleep(99999999) which would count in none of them, but still would disable the testing machine. Thus, although I don't count a processes idle time in its final execution time, it still has to have a limit.
Now, I'm no expert in low-level stuff like this. I spent a few days reading MSDN and playing around, and came up with a solution as best I could. Unfortunately it seems it's not running as well as it could be expected. For most part it seems to work fine, but weird cases keep creeping up. Just now I have a little C++ program which runs in a split second on its own, but my Limiter reports 8 seconds of User mode time (taken from job counters). Here's the code. It prints the output in about half a second and then spends more than 7 seconds just waiting:
#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector< vector<int> > dp(50000, vector<int>(4, -1));
cout << dp.size();
}
The code of the limiter is pretty lengthy, so I'm not including it here. I also feel that there might be something wrong with my approach - perhaps I shouldn't do the debugger stuff. Perhaps there are some common pitfalls that I don't know of.
I would like some advice on how other people would tackle this problem. Perhaps there is already something that does this, and my Limiter is obsolete?
Added: The problem seems to be in the little program that I posted above. I've opened a new question for it, since it is somewhat unrelated. I'd still like comments on this approach for limiting a program.
Running with a debugger attached can change the characteristics of the application. Performance can be impacted, and code paths can even change (if the target process does things based on the presence of a debugger, i.e. IsDebuggerPresent).
A different approach that we've used is to configure our own application to run as the JIT debugger. By setting the AeDebug registry key, you can control what debugger is invoked when an application crashes. This way you only jump in when the target process crashes, and it doesn't impact the process during normal run-time.
This site has some details about setting the postmortem debugger: Configuring Automatic Debugging.
Your approaches for limiting the memory, getting timing etc. all sound perfectly fine.

Resources