Drastic performance inprovement in .NET CF after app gets moved out of the foreground. Why? - performance

I have a large (500K lines) .NET CF (C#) program, running on CE6/.NET CF 3.5 (v.3.5.10181.0). This is running on a FreeScale i.Mx31 (ARM) # 400MHz. It has 128MB RAM, with ~80MB available to applications. My app is the only significant one running (this is a dedicated, embedded system). Managed memory in use (as reported by GC.Collect) is about 18MB.
To give a better idea of the app size, here's some stats culled from .NET CF Remote Performance Monitor after staring up the application:
GC:
Garbage Collections 131
Bytes Collected by GC 97,919,260
Managed Bytes in use after GC 17,774,992
Total Bytes in use after GC 24,117,424
GC Compactions 41
JIT:
Native Bytes Jitted: 10,274,820
Loader:
Classes Loaded 7,393
Methods Loaded 27,691
Recently, I have been trying to track down a performance problem. I found that my benchmark after running the app in two different startup configurations would run in approximately 2 seconds (slow case) vs. 1 second (fast case). In the slow case, the time for the benchmark could change randomly from EXE run to EXE run from 1.1 to 2 seconds, but for any given EXE run, would not change for the life of the application. In other words, you could re-run the benchmark and the time for the test stays the same until you restart the EXE, at which point a new time is established and consistent.
I could not explain the 1.1 to 2x slowdown via any conventional mechanism, or by narrowing the slowdown to any particular part of the benchmark code. It appeared that the overall process was just running slower, almost like a thread was spinning and taking away some of "my" CPU.
Then, I randomly discovered that just by switching away from my app (the GUI loses the foreground) to another app, my performance issue disappears. It stays gone even after returning my app to the foreground. I now have a tentative workaround where my app after startup launches an auxiliary app with a 1x1 size window that kills itself after 5ms. Thus the aux app takes the foreground, then relinquishes it.
The question is, why does this speed up my application?
I know that code gets pitched when a .NET CF app loses the foreground. I also notice that when performing a "GC Heap" capture with .NET CF Remote Performance Monitor, a Code Pitch is logged -- and this also triggers the performance improvement in my app. So I suspect somehow that code pitching is related or even responsible for fixing performance. But I'm at a loss as to figure out how to determine if that is really the case, or even to explain why pitching code could help in this way. Does pitching out lots of code somehow significantly help locality of reference of code pages (that are re-JITted, presumably near each other in memory) enough to help this much? (My benchmark spans probably 3 dozen routines and hundreds of lines of code.)
Most importantly, what can I do in my app to reliably avoid this slower condition. Any pointers to relevant .NET CF / JIT / Code pitching information would be greatly appreciated.

Your app going to the background auto-triggers a GC.Collect, which collects, may compact the GC Heap and may pitch code. Have you checked to see if a manual GC.Collect without going to the background gives the same behavior? It might not be pitching that's giving the perf gain, it might be collection or compaction. If a significant number of dead roots are swept up, walking the root tree may be getting faster. Can't say I've specifically seen this issue, so this is all conjecture.

Send your app a wm_hibernate before your performance loop. Will clean up things

We have a similar issue with our .NET CF application.
Over time, our application progressively slows down, eventually to a halt with what I anticipate is due to high CPU load, or as #wil-s says, as if thread is spinning consuming CPU. The only assumption / conclusion I've made to so far is either we have a rogue thread in our code, or there's an under the cover issue in .NET CF, maybe with the JITter.
Closing the application and re-launching returns our application to normal expected performance.
I am yet to implement code change to test issuing WM_HIBERNATE or launch a dummy app which quits itself (as above) to force a code pitch, but am fairly sure this will resolve our issue based on the above comments. (so many thanks for that)
However, I'm subsequently interested to know whether a root cause was ever found to this specific issue?
Incidentally and seemingly off topic (but bear with me), we're using a Freescale i.MX28 processor and are experiencing unpredictable FlashDisk corruption. Seeing 2K blocks of 0xFFs (erased blocks) in random files located on NAND Flash.
I'm mentioning this as I now believe the CPU and FlashDisk corruption issues are linked, due to this article as well as this one:
https://electronics.stackexchange.com/questions/26720/flash-memory-corruption-due-to-electricals
In the article, #jwygralak67 comments:
I recently worked through a flash corruption issue, on a WinCE system,
as part of a development team. We would sporadically find 2K blocks of
flash that were erased. (All bytes 0xFF) For about 6 months we tested
everything from ESD, to dirty power to EMI and RFI interference, we
bought brand new devices and tracked usage to make sure we weren't
exceeding the erase cycle limit and buring out blocks, we went through
our (application level) software with a fine toothed comb.
In the end it turned out to be an obscure bug in the very low level
flash driver code, which only occurred under periods of heavy CPU
load. The driver came from a 3rd party. We informed them of the issue
we found, but I don't know if they ever released a patch.
Unfortunately, we're yet to make contact with him.
With all of this in mind, potentially if we work around the high CPU load, maybe the corruption will no longer be present. Another conjecture case!
On that assumption however, this doesn't give a firm root cause for either situation, which I'm desperately seeking!
Any knowledge or insight, however small, would be very gratefully received.
#ctacke - we've spoken before regarding OpenNETCF via email, so I'm pleased to see your name!

Related

windowed OpenGL first frame delay after idle

I have windowed WinApi/OpenGL app. Scene is drawn rarely (compared to games) in WM_PAINT, mostly triggered by user input - MW_MOUSEMOVE/clicks etc.
I noticed, that when there is no scene moving by user mouse (application "idle") and then some mouse action by user starts, the first frame is drawn with unpleasant delay - like 300 ms. Following frames are fast again.
I implemented 100 ms timer, which only does InvalidateRect, which is later followed by WM_PAINT/draw scene. This "fixed" the problem. But I don't like this solution.
I'd like know why is this happening and also some tips how to tackle it.
Does OpenGL render context save resources, when not used? Or could this be caused by some system behaviour, like processor underclocking/energy saving etc? (Although I noticed that processor runs underclocked even when app under "load")
This sounds like Windows virtual memory system at work. The sum of all the memory use of all active programs is usually greater than the amount of physical memory installed on your system. So windows swaps out idle processes to disc, according to whatever rules it follows, such as the relative priority of each process and the amount of time it is idle.
You are preventing the swap out (and delay) by artificially making the program active every 100ms.
If a swapped out process is reactivated, it takes a little time to retrieve the memory content from disc and restart the process.
Its unlikely that OpenGL is responsible for this delay.
You can improve the situation by starting your program with a higher priority.
https://superuser.com/questions/699651/start-process-in-high-priority
You can also use the virtuallock function to prevent Windows from swapping out part of the memory, but not advisable unless you REALLY know what you are doing!
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366895(v=vs.85).aspx
EDIT: You can improve things for sure by adding more memory and for sure 4GB sounds low for a modern PC, especially if you Chrome with multiple tabs open.
If you want to be scientific before spending any hard earned cash :-), then open Performance Manager and look at Cache Faults/Sec. This will show the swap activity on your machine. (I have 16GB on my PC so this number is very low mostly). To make sure you learn, I would check Cache Faults/Sec before and after the memory upgrade - so you can quantify the difference!
Finally, there is nothing wrong with the solution you found already - to kick start the graphic app every 100ms or so.....
Problem was in NVidia driver global 3d setting -"Power management mode".
Options "Optimal Power" and "Adaptive" save power and cause the problem.
Only "Prefer Maximum Performance" does the right thing.

My OS X app slowly consumes more and more ram over time, how can I debug this and find the cause?

I am new to swift and coding in general. I have made my first OS X app over the last few days. It is a simple ticker app that lives in the menu bar.
My issue is that over the space of 3 hours, my app goes from 10mb or ram being used to over 1gb. It slowly and slowly uses more and more. I noticed after about 6 hours the app stops working, I can only assume that OS X has stopped the process because it's hogging too much memory?
Anyway, I have looked online and I have used Xcode instruments to try and find a memory leak, but I don't know exactly how to pin point it. Can anyone give me some general good ways to find memory leaks and sources of bugs when using Xcode? Any general practices are good too.
If the memory loss is not due to a leak (Run Leaks and Analyzer) the lost is to inadvertently retained and unused memory.
Use instruments to check for leaks and memory loss due to retained but not leaked memory. The latter is unused memory that is still pointed to. Use Mark Generation (Heapshot) in the Allocations instrument on Instruments.
For HowTo use Heapshot to find memory creap, see: bbum blog
Basically the method is to run Instruments allocate tool, take a heapshot, run an iteration of your code and take another heapshot repeating 3 or 4 times. This will indicate memory that is allocated and not released during the iterations.
To figure out the results disclose to see the individual allocations.
If you need to see where retains, releases and autoreleases occur for an object use instruments:
Run in instruments, in Allocations set "Record reference counts" on (For Xcode 5 and lower you have to stop recording to set the option). Cause the app to run, stop recording, drill down and you will be able to see where all retains, releases and autoreleases occurred.
When confronted with a memory leak, the first thing I do is look at where variables are created and destroyed; especially if they are defined in looping logic (although generally not a good idea).
Generally most memory leaks come from there. I would venture a guess that the leak occurs somewhere in the logic that tracks your timed iterations.
Good luck!

Amount of acceptable Memory Leak

How much memory leak is negligible?
In my program, I am using Unity and when I go through Profile > Leaks and work with the project it shows about total 16KB memory Leak caused by Unity and I cannot help that.
EDIT: After a long play with the program it amounts to a 400KB leak.
What should I do? Is this amount of memory leak acceptable for an iPad project?
It's not great, but it won't get your app rejected unless it causes a crash in front of a reviewer. The size is less important than how often it occurs. If it only occurs once every time the app is run, that's not a big deal. If it happens every time the user does something, then that's more of a problem.
It's probably a good idea for you to track down these bugs and fix them, because Objective C memory management is quite different compared to Java, and it's good to get some practice in with smaller stuff before you're stuck trying to debug a huge problem with a deadline looming.
First, look if you can use Unity in another way to circumvent the leak (if you have enough insight into the workings of this framework).
Then, report the leakage to the Unity developers if not already done (by you or someone else).
Third, if you absolutely rely on this framework, hope it get fixed ASAP, unless switching to another framework is an option for you.
A 400K leak is not a very big deal unless it amounts to that size within few minutes. Though, no matter how small the leak, it is always necessary to keep an eye on any leak caused by your or third party code and try to get rid of them in the next minor or major iteration of your app.

Compact Framework and JIT. How long could it take

We have/had a phantom delay in our app. This was traced to the initialisation of a singleton when the object was touched for the first time and was blamed on JIT. I'm not utterly convinced by this as there is no mechanism for measuring JIT (or is there?) and the entire delay was seven seconds. Seven seconds of JIT?!? Could that be forreal?
Either way I have difficulty in blaming things that one cannot easily measure. When I had a glance at the issue a while back I commented out a bunch of code and watched the seven second delay "jump" elsewhere in the app. Suggesting it is somehow happening on a background process somewhere (and I guess this would count JIT in as a potential cause).
Just for fun if there was a static object that happened to reference a lot of other objects does anyone have a rule of thumb for how long the JIT might take? Does anyone have further references so I can understand more about the JIT so I stand a chance of learning whether or not JIT is/was to blame for this slow down?
I've only seen JIT take a really long time (greater than 1 second) in a weird bug that had to do with templated items inside a templated collection (see edit below).
At any rate, the fact you see it "move" definitely indicates to me that it probably isn't the issue. To try to determine this definitively I'd look at using RPM to see what's happening right before and after the delay.
Expected JIT time is a really nebulous thing, since there are so many factors that can affect it. Processor speed is an obvious one, but less obvious might be things like app storage media and device memory pressure.
Storage media can affect JIT speed because the JITter has to pull the IL from the media when it needs to JIT it, and if pulling it is slow, then JITting it will be slow.
Memory pressure is a tough one, and can have serious repercussions on a CE device. The issue here is that when you start running out of memory, the EE will start pitching JITted code during collection - everything but the call stack. Now if you're in a routine that, for example, calls out to some worker or helper stuff, or has a thread running, then that helper method could be getting pitched, JITted, pitched JITted, etc. This is referred to as "thrash."
Identifying the latter is fairly easy with RPM (fixing it may not be so easy). Look at the amount of code pitched to raise frequently and look for a strong correlation between a rise in the number of pitches and your perceived lock ups.
Edit: I finally found the bug description here.
JIT (and GC) timers etc. can be found here:
Performance Counters in the .NET Compact Framework
(http://msdn.microsoft.com/en-us/library/ms172525.aspx)
Monitoring Application Performance on the .NET Compact Framework Part I - Enabling performance counters (http://blogs.msdn.com/davidklinems/archive/2005/10/04/476988.aspx)
Analyzing Device Application Performance with the .Net Compact Framework Remote Performance Monitor (http://blogs.msdn.com/stevenpr/archive/2006/04/17/577636.aspx)
Performance Counters in the .NET Framework
(http://msdn.microsoft.com/en-us/library/w8f5kw2e(VS.80).aspx)
Regards,
tamberg

Windows Stalls When My Program Uses Swapfile

I am running a user mode program on normal priority. My program is searching an NP problem, and as a result, uses up a lot of memory which eventually ends up in the swap file.
Then my mouse freezes up, and it takes forever for task manager to open up and let me end the process.
What I want to know is how I can stop my Windows operating system from completely locking up from this even though only 1 out of my 2 cores are being used.
Edit:
Thanks for the replies.
I know that making it use less memory will help, but it just doesn't make sense to me that the whole OS should lock up.
The obvious answer is "use less memory". When your app uses up all the
available memory, the OS has to page the task manager (etc.) out to make room for your app. When you switch programs, the OS has to page the other programs back in (as they are needed).
Disk reads are slower than memory reads, so everything appears to be
going slower.
If you want to avoid this, have your app manage its own memory, or
use a better algorithm than brute force. (There are genetic
algorithms, simulated annealing, etc.)
The problem is that when another program (e.g. explorer.exe) is going to execute, all of its code and memory has been swapped out. To make room for the other program Windows has to first write data that your program is using to disk, then load up the other program's memory. Every new page of code that is executed in the other program requires disk access, causing it to run slowly.
I don't know the access pattern of your program, but I'm guessing it touches all of its memory pages a lot in a random fashion, which makes the problem worse because as soon as Windows evicts a memory page from your program, suddenly you need it again and Windows has to find some other page to give the same treatment.
To give other processes more RAM to live in, you can use SetProcessWorkingSetSize to reduce the maximum amount of RAM that your program may use. Of course this will make your program run more slowly because it has to do more swapping.
Another alternative you could try is to add more drives to the system, and distribute the swap files over those. You may have a dual-core CPU, but you have only a single drive. Distributing the swap file over multiple drives allows Windows to balance work across them (although I don't have first-hand experience of how well it does this).
I don't think there's a programming answer to this question, aside from "restructure your app to use less memory." The swapfile problem is most likely due to the bottleneck in accessing the disk, especially if you're using an IDE HDD or a highly fragmented swapfile.
It's a bit extreme, but you could always minimise your swap file so you don't have all the disk thashing, and your program isn't allowed to allocate much virtual memory. Under Control panel / Advanced / Advanced tab / Perfromance / Virtual memory, set the page file to custom size and enter a value of 2mb (smallest allowed on XP). When an allocation fails, you should get an exception and be able exit gracefully. It doesn't quite fix your problem, just speeds it up ;)
Another thing worth considering would be if you are ona 32bit platform, port to a 64bit system and get a box with much more addressable RAM.

Resources