I am helping to develop a fairly complex web application with Django. However some pages are taking 10+ seconds to render. I can't seem to get to the bottom of why this is so slow. The Django Debug Toolbar shows that the bottleneck is clearly CPU, rather than the database or anything else, but it doesn't go into any further detail (as far as I can see). I have tried running a profiler on the system, but can't make head or tail of what is going on from this either. It only shows deep internals as taking up the vast majority of the time, especially <method 'poll' of 'select.poll' objects>, and other such functions as builtins.hasattr, getmodule, ismodule, etc.
I've also tried stepping through the code manually, or pausing execution at random points to try to catch what's going on. I can see that it takes quite a long time to get through a ton of import statements, but then it spends a huge amount of time inside render() functions, taking especially long loading a large number of model fields--something in the order of 100 fields.
None of what is happening looks wrong to me, except for the insane amount of time everything takes to run. It's as though Python is just taking forever to load and parse each .py file before doing any processing. However, this is all running on a brand new Digital Ocean server with an SSD, with Gunicorn, PostgreSQL and Nginx. Does anyone have any hints how I can get to the bottom of this?
We need to seed an application with 3 million entities before running performance tests.
The 3 million entities should be loaded through the application to simulate 3 years of real data.
We are inserting 1-5000 entities at a time. In the beginning response times are very good. But after a while they decay exponentially.
We use at groovy script to hit a URL to start each round of insertions.
Restarting the application resets the response time - i.e. fixes the problem temporally.
Reruns of the script, without restarting the app, have no effect.
We use the following to enhance performance
1) Cleanup GORM after each 100 insertions:
def session = sessionFactory.currentSession
session.flush()
session.clear()
DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear()
(old Ted Naleid trick: http://naleid.com/blog/2009/10/01/batch-import-performance-with-grails-and-mysql)
2) We use GPars for parallel insertions:
GParsPool.withPool {
(0..<1000).eachParallel {
def entity = new Entity(...)
insertionService.insert(entity)
}
}
Notes
When looking at the log output, I've noticed that the processing time for each entity are the same, but the system seems to pause longer and longer between each iteration.
The exact number of entities inserted are not important, just around 3 mill, so if some fail we can ignore it.
Tuning the number of entities at a time have little or no effect.
Help
I'm really hoping somebody have a good idea on how to fix the problem.
Environment
Grails: 2.4.2 (GRAILS_OPTS=-Xmx2G -Xms512m -XX:MaxPermSize=512m)
Java: 1.7.0_55
MBP: OS X 10.9.5 (2,6 GHz Intel Core i7, 16 GB 1600 MHz DDR3)
The pausing would make me think it's the JVM doing garbage collection. Have you used a profiler such as VisualVM to see what time is being spent doing garbage collection? Typically this will be the best approach to understanding what is happening with your application within the JVM.
Also, it's far better to load the data directly into the database rather than using your application if you are trying to "seed" the application. Performance wise of course.
(Added as answer per comment)
So I'm currently learning openmp4.
What I experienced is that if I call a function a 2nd time it will get significantly faster.
The omp block is inside of this function.
In my example the 1st call takes 5 seconds and the 2nd only 0,3s.
I am using the intel-icc with an Intel Xeon Phi(60cores 240Threads).
Could someone please explain why this is happening?
I think that is due to the OpenMP threading initialization. The creation of the threading team, and initializing each thread, etc.. This in fact is an expensive procedure.
I don't know what your code looks like, but this can also be the effect of caching. During first time, the mic will start caching the needed data. The second time: the data will already be cached and ready. But I can't confirm this until I see the code.
I'm using EF Code first, with one model that has over than 200 Entities(winforms), when i ran my program for first time, it took long time to run first query,then I used pre-generated views for improving performance, startup time reduced to about 12-13 seconds(before pregenerated views, startup time was about 30 seconds), which options i have, to reduce the time of my first query?
You don't have many options. First of all try to use the latest EF version - that means EF6 alpha 2 because there were some improvements but it may not be enough. IMHO add splash screen to your app and make the "first query" during application startup. WinForms application simply can have longer startup time if they perform some complex logic. Commonly whole application is initialized during startup so that it run smoothly once it is started.
I’m really stuck with this issue and will greatly appreciate any advice.
The problem:
Some of our users complain about total system “freezing” when using our product. No matter how we tried, we couldn’t reproduce it in any of systems available for troubleshooting.
The product:
Physically, it’s a 32bit/64bit DLL. The product has a self-refreshing GUI, which draws a realtime spectrogram of an audio signal
Problem details:
What I managed to collect from a number of fragmentary reports makes the following picture:
When GIU is opened, sometimes immediately, sometimes after a few minutes of GIU being visible, the system completely stalls, without possibility to operate with windows, start Task Manager etc. No reactions on keyboard, no mouse cursor seen (or it’s seen but is not responsibe to mouse movements – this I do not know). The user has to hard-reset the system in order to reboot. What is important, I think, is that (in some cases) for some time the GIU is responsive and shows some adequate pictures. Then this freezing happens. One of the reports tells that once the system was frozen, the audio continued to be rendered – i.e. heard by the reporter (but the whole graphic shell of Windows was already frozen). Note: in this sort of apps it’s usually a specialized thread which is responsible for sound processing.
The freezing is more or less confirmed to happen for 2 users on Windows7 x64 using both 32 and 64 bit versions of the DLL, never heard of any other OSs mentioned with connection to this freezing (though there was 1 report without any OS specified).
That’s all that I managed to collect.
The architecture / suspicions:
I strongly suspect that it’s the GUI refreshing cycle that is a culprit.
Basically, it works like this:
There is a timer that triggers callbacks at a frame rate of approx 25 fps.
In this callback audio analysis is performed and GUI updated
Some details about the timer:
It’s based on this call:
CreateTimerQueueTimer(&m_timerHandle, NULL, xPlatformTimerCallbackWrapper,
this, m_firstExpInterval, m_period, WT_EXECUTEINTIMERTHREAD);
We create a timer and m_timerHandle is called periodically.
Some details about the GUI refreshing:
It works like this:
HDC hdc = GetDC (hwnd);
// Some drawing
ReleaseDC(hwnd,hdc);
My intuition tells me that this CreateTimeQueueTimer might be not the right decision. The reference page tells that in case of using WT_EXECUTEINTIMERTHREAD:
The callback function is invoked by the timer thread itself. This flag
should be used only for short tasks or
it could affect other timer
operations. The callback function is
queued as an APC. It should not
perform alertable wait operations.
I don’t remember why this WT_EXECUTEINTIMERTHREAD option was chosen actually, now WT_EXECUTEDEFAULT seems equally suitable for me.
In fact, I don’t see any major difference in using any of the options mentioned in the reference page.
Questions:
Is anything of what was told give anyone any clue on what might be wrong?
Have you faced similar problems, what was the reason?
Thanks for any info!
==========================================
Update: 2010-02-20
Unfortunatelly, the advise given here (which I could check so far) didn't help, namelly:
changing to WT_EXECUTEDEFAULT in CreateTimerQueueTimer(&m_timerHandle,NULL,xPlatformTimerCallbackWrapper,this,m_firstExpInterval,m_period, WT_EXECUTEDEFAULT);
the reenterability guard was already there
I havent' yet checked if updateding the GUI in WM_PAINT hander helps or not
Thanks for the hints anyway.
Now, I've been playing with this for a while, also got a real W7 intallation (I used to use the virtual one) and it seems that the problem can be narrowed down.
On my installation, using of the app really get the GUI far less responsive, although I couldn't manage to reproduce a total system freezing as someone reported.
My assumption now is this responsiveness degradation and reported total freezing have a common origin.
Then I did some primitive profiling and found that at least one of the culprits is BitBlt function that is called approx 50 times a second
BitBlt ((HDC)pContext->getSystemContext (), // hdcDest
destRect.left + pContext->offset.h,
destRect.top + pContext->offset.v,
destRect.right - destRect.left,
destRect.bottom - destRect.top,
(HDC)pSystemContext,
srcOffset.h,
srcOffset.v,
SRCCOPY);
The regions being copied are not really large (approx. 400x200 pixels). It is used for displaying the backbuffer and is executed in the timer callback.
If I comment out this BitBlt call, the problem seems to disappear (at least partly).
On the same machine running WinXP everything works just fine.
Any ideas on this?
Most likely what's happening is that your timer callback is taking more than 25 ms to execute. Then another timer tick comes along and it starts processing, too. And so on, and pretty soon you have a whole bunch of threads sucking down CPU cycles, all trying to do your audio analysis and in short order the system is so busy doing thread context switches that no real work gets done. And all the while, more and more timer ticks are getting placed into the queue.
I would strongly suggest that you use WT_EXECUTEDEFAULT here, rather than WT_EXECUTEINTIMERTHREAD. Also, you need to prevent overlapping timer callbacks. There are several ways to do that.
You can use a critical section in your timer callback. When the callback is triggered it calls TryEnterEnterCriticalSection and if not successful, just returns without doing anything.
You can do something similar using a volatile variable and InterlockedCompareExchange.
Or, you can change your timer to be a one-shot (WT_EXECUTEONLYONCE), and then re-set the timer at the end of every callback. That would make the thing execute 25 ms after the last one completed.
Which you choose is up to you. If your analysis often takes longer than 25 ms but not more than 35 ms, then you'll probably get a smoother update rate using WT_EXECUTEONLYONCE. If it's rare that analysis takes more than 25 ms, or if it often takes more than about 35 ms (but less than 50 ms), then you're probably better off using one of the other techniques.
Of course, if it often takes longer than 25 ms, then you probably want to increase the time (reduce the update rate).
Also, as one of the commenters pointed out, it's possible that the problem also involves accessing the GUI from the timer thread. You should do all of your analysis in the timer thread, store the results somewhere that the main thread can access it, and then send a message to the window proc, telling it to update the display.
Have you asked the users to disable Aero/WDMDWM? With Aero enabled, rendering is implemented quite different. Without Aero, the behaviour will be similar to XP. Not that it solves anything, but it will give you a clue as to what the problem is.