Lazy evaluation/initialization in GUI applications- Non-disruptive ways to do it? - user-interface

Suppose you are making a GUI application, and you need to load/parse/calculate a bunch of things before a user can use a certain tool, and you know what you have to do beforehand.
Suddenly, it makes sense to start doing these calculations in the background over a period of time (as opposed to "in one go" on start-up or exactly when it is needed). However, doing too much in the background will slow down the responsiveness of the application.
Are there any standard practices in this kind of approach? Perhaps ways to detect low load on the CPU or the user being idle and execute code in those times? Arguments against this type of approach?
Thanks!

Without knowing your app or your audience, I can't give you specific advice.
My main argument against the approach is that unless you have a high-profile application which will see a lot of use by non-programmers, I wouldn't bother. This sounds like a lot of busy work that could be spent developing or refining features that actually allow people to do new things with your app.
That being said, if there is a reason to do it, there is nothing wrong with lazy-loading data.
The problem with waiting until idle time is that some people have programs like SETI#Home installed on their computer, in which case their computer has little to no idle time. If loading full-throttle kills the responsiveness of your app, you could try injecting sleeps. This is what a lot of video games do when you specify a target frame rate, to avoid pegging the CPU. This would get the data loaded faster, rather than waiting for idle time.
If parts of your app depend on data to work, and the user invokes that part of the app, you will have to abandon the lazy-loading approach and resume your full CPU/disk taxing load. If it takes a long time, or make the app unresponsive, you could display a loading dialog with a progress bar.

If your target audience will tend to have a multicore CPU and if your app startup and background initialization tasks won't contend for the same other resources (e.g. disk IO, network, ...) to the point of creating a new bottleneck, you might want to kick off a background thread to perform the initialization tasks (or even a thread per initialization task if you have several tasks that can run in parallel). That will make more efficient use of a multicore hardware architecture.
You didn't specify your target platform, but it's exceedingly easy to achieve this in .NET and so I have begun doing it in many of my desktop apps.

Related

stress testing concurrency: how to slow down thread execution oin my application?

I have hit situations where certain race conditions only occur when I run my application in the valgrind emulator. But if I run my executable as a normal program these issues did not occur.
My theory is that the emulator slows down execution so much that threads have larger common time frames in which they execute and can operate on shared data structures thereby increasing the chance of hitting race conditions on these shared resources.
I would like to have a more fine grained control on e.g. a virtual clock frequency by means of a specialized emulator.
Does somebody know about an existing tool to do this job. I have tried to search for this online. There should be some academic paper dealing with this. But so far I haven't found it yet.

Testing perceived performance

I recently got a shiny new development workstation. The only disadvantage of this is that the desktop apps I'm developing now run very, very fast, and so I fear that parts of the code that would be annoyingly slow on end users' machines will go unnoticed during my testing.
Is there a good way to slow down an application for testing? I've tried searching around, but all of the results I've been able to find seem pretty fiddly to set up (e.g., manually setting up a high-priority CPU-bound task on the same CPU core as the target app, or running a background process that rapidly interrupts and resumes the target app), and I don't know if the end result is actually a good representation of running on a slower computer (with its slower CPU, slower RAM, slower disk I/O...).
I don't think that this is a job for a profiler; I'm interested in the user's perception of end-to-end performance rather than in where the time goes for particular operations.
setup a virtual machine, give in as little ram as needed and also you can have it use 1,2 or more CPUs. I like VirtualBox myself install your app and test with different RAM configs
Personally, I'd get an old used crappy computer that is typical of what the users have and test on that. It should be cheap and you will see pretty fast how bad things are.
I think the only way to deal with this is through proper end-user testing, i.e. get yourself a "typical" system for testing and use that to identify any perceptible performance bottlenecks.
You can try out either Virtual PC or VMWare Player/Workstation, load an OS onto it, and then throttle back the resources. I know that with any of those tools you can reduce the memory to whatever you'd like. You can also specify the number of cores you want to use. You might even be able to adjust the clock speed in VMWare Workstation... I'm not sure.
I upvoted SQLMenace's answer, but, I also think that profiling needs to be mentioned, no matter how quickly the code is executing - you'll still see what's taking the most time. If you find yourself with some free time, I think profiling and investigating the results is a good way to spend it.

Handling very large SFTP uploads - Cocoa

I'm working on a small free Cocoa app that involves some SFTP functionality, specifically working with uploads. The app is nearing completion however I've run into a pretty bad issue with regards to uploading folders that contain a lot of files.
I'm using ConnectionKit to handle the uploads:
CKTransferRecord * record;
record = [connection recursivelyUpload:#"/Users/me/large-folder"
to:#"/remote/directory"];
This works fine for most files and folders. Although in this case #"/Users/me/large-folder" has over 300 files in it. Calling this method spins my CPU up to 100% for about 30 seconds and my application is unresponsive (mac spinning ball). After the 30 seconds my upload is queued and works fine, but this is hardly ideal. Obviously whatever is enumerating these files is causing my app to lock up until it's done.
Not really sure what to do about this. I'm open to just about any solution - even using a different framework, although I've done my research and ConnectionKit seems to be the best of what's out there.
Any ideas?
Use Shark. Start sampling, start downloading, and as soon as the hang ends, stop sampling.
If the output confirms that the problem is in ConnectionKit, you have two choices:
Switch to something else.
Contribute a patch that makes it not hang.
The beauty of open-source is that #2 is possible. It's what I recommend. Then, not only will you have a fast ConnectionKit, but once the maintainers accept your patch, everyone else who uses CK can have one too.
And if Shark reveals that the problem is not in ConnectionKit (rule #2 of profiling: you will be surprised), then you have Shark's guidance on how to fix your app.
Since the problem is almost certainly on the enumeration, you'll likely need to move the enumeration into an asynchronous action. Most likely they're using NSFileManager -enumeratorAtPath: for this. If that's the main problem, then the best solution is likely to move that work onto its own thread. Given the very long time involved, I suspect they're actually reading the files during enumeration. The solution to that is to read the files lazily, right before uploading.
Peter is correct that Shark is helpful, but after being a long-time Shark fan, I've found that Instruments tends to give more useable answers more quickly. You can more easily add a disk I/O and memory allocation track to your CPU Sampler using Instruments.
If you're blocking just one core at 100%, I recommend setting Active Thread to "Main Thread" and setting Sample Perspective to "All Sample Counts." If you're blocking all your cores at 100%, I recommend setting Active Thread to "All Threads" and Sample Perspective to "Running Sample Times."

Windows System Idle Processes interfering with performance measurement

I am doing some performance measurement of my code on a Windows box and I am finding that I am getting dramatically different results between measurements. A quick bit of ad hoc exploration during a slow one shows in the task manager System Idle Processes taking up almost 100% CPU.
Does anyone know what System Idle Processes actually means and what Windows features it may be running?
NB: I am not measuring performance using the task manager, I just used it to take a look at what else was running during a particularly slow measurement.
Please think before saying this is not programming related and closing the question. I would not ask it unless I thought there were grounds to say it is. In this case I believe it clearly is because it is detrimentally affecting my development and test environments and in order to sort it out I need to know a bit more about it. Programming does not start and end with the writing of the code.
The idle process typically doesn't do any useful work except execute the HLT instruction, which puts that CPU core into a lower power state (C1). However, the fact that your benchmark is not consuming 100% CPU time does open the door for speculation about what is going on.
If your application is single-threaded and your test system is multicore/hyperthreaded/multi-CPU, then you should expect to see around 50% idle CPU time for two cores, 75% for four, etc. This is because the CPU time percentages in Task Manager include all cores. (I believe that older versions of Windows had an option to change this, but I don't see it on Vista.)
If the idle process is consuming a lot of CPU, that may indicate that your application is spending a lot of time sleeping. It might be waiting for data from some external source (e.g. a disk or network). It might be spending a lot of time waiting for synchronization objects (e.g. mutexes or events). It also might be spending a lot of time calling the Sleep() function. Profiling your code should identify where it's spending the time.
Getting fully reproducible benchmark results may require you to disable processor/disk/network-intensive background applications and services (e.g. search indexing, SMS software inventory, virus scanning, Windows Update downloads, IncrediBuild/distcc) or to connect the machine to an isolated network (or to no network at all).
I'm assuming that you wrote a benchmark for your application, and that you're just trying to use Task Manager to diagnose why the benchmark results aren't what you expected. Task Manager isn't an accurate way to measure application performance.
System Idle Process is a sort of a default process that Windows runs on the processor when it has nothing else to schedule for running. This process is like a housekeeper that does things like trying to save system power etc.
If you're measuring the performance of your program, don't use the Windows Task manager. Use Performance Monitor instead (which you can start by typing 'perfmon' at the commandline). Or better still, use a profiler.
If you "system idle process" is taking up 100% then essentially you machine is bored, nothing is going on. If you add up everything going on in task manager, subtract this number from 100%, then you will have the value of "system idle process." Notice it consumes almost no memory at all and cannot be affecting performance.

Comparing cold-start to warm start

Our application takes significantly more time to launch after a reboot (cold start) than if it was already opened once (warm start).
Most (if not all) the difference seems to come from loading DLLs, when the DLLs' are in cached memory pages they load much faster. We tried using ClearMem to simulate rebooting (since its much less time consuming than actually rebooting) and got mixed results, on some machines it seemed to simulate a reboot very consistently and in some not.
To sum up my questions are:
Have you experienced differences in launch time between cold and warm starts?
How have you delt with such differences?
Do you know of a way to dependably simulate a reboot?
Edit:
Clarifications for comments:
The application is mostly native C++ with some .NET (the first .NET assembly that's loaded pays for the CLR).
We're looking to improve load time, obviously we did our share of profiling and improved the hotspots in our code.
Something I forgot to mention was that we got some improvement by re-basing all our binaries so the loader doesn't have to do it at load time.
As for simulating reboots, have you considered running your app from a virtual PC? Using virtualization you can conveniently replicate a set of conditions over and over again.
I would also consider some type of profiling app to spot the bit of code causing the time lag, and then making the judgement call about how much of that code is really necessary, or if it could be achieved in a different way.
It would be hard to truly simulate a reboot in software. When you reboot, all devices in your machine get their reset bit asserted, which should cause all memory system-wide to be lost.
In a modern machine you've got memory and caches everywhere: there's the VM subsystem which is storing pages of memory for the program, then you've got the OS caching the contents of files in memory, then you've got the on-disk buffer of sectors on the harddrive itself. You can probably get the OS caches to be reset, but the on-disk buffer on the drive? I don't know of a way.
How did you profile your code? Not all profiling methods are equal and some find hotspots better than others. Are you loading lots of files? If so, disk fragmentation and seek time might come into play.
Maybe even sticking basic timing information into the code, writing out to a log file and examining the files on cold/warm start will help identify where the app is spending time.
Without more information, I would lean towards filesystem/disk cache as the likely difference between the two environments. If that's the case, then you either need to spend less time loading files upfront, or find faster ways to load files.
Example: if you are loading lots of binary data files, speed up loading by combining them into a single file, then do a slerp of the whole file into memory in one read and parse their contents. Less disk seeks and time spend reading off of disk. Again, maybe that doesn't apply.
I don't know offhand of any tools to clear the disk/filesystem cache, but you could write a quick application to read a bunch of unrelated files off of disk to cause the filesystem/disk cache to be loaded with different info.
#Morten Christiansen said:
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
That makes the customer pay for initializing our app at every boot even when it isn't used, I really don't like that option (neither does Raymond).
One succesful way to speed up application startup is to switch DLLs to delay-load. This is a low-cost change (some fiddling with project settings) but can make startup significantly faster. Afterwards, run depends.exe in profiling mode to figure out which DLLs load during startup anyway, and revert the delay-load on them. Remember that you may also delay-load most Windows DLLs you need.
A very effective technique for improving application cold launch time is optimizing function link ordering.
The Visual Studio linker lets you pass in a file lists all the functions in the module being linked (or just some of them - it doesn't have to be all of them), and the linker will place those functions next to each other in memory.
When your application is starting up, there are typically calls to init functions throughout your application. Many of these calls will be to a page that isn't in memory yet, resulting in a page fault and a disk seek. That's where slow startup comes from.
Optimizing your application so all these functions are together can be a big win.
Check out Profile Guided Optimization in Visual Studio 2005 or later. One of the thing sthat PGO does for you is function link ordering.
It's a bit difficult to work into a build process, because with PGO you need to link, run your application, and then re-link with the output from the profile run. This means your build process needs to have a runtime environment and deal cleaning up after bad builds and all that, but the payoff is typically 10+ or more faster cold launch with no code changes.
There's some more info on PGO here:
http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx
As an alternative to function order list, just group the code that will be called within the same sections:
#pragma code_seg(".startUp")
//...
#pragma code_seg
#pragma data_seg(".startUp")
//...
#pragma data_seg
It should be easy to maintain as your code changes, but has the same benefit as the function order list.
I am not sure whether function order list can specify global variables as well, but use this #pragma data_seg would simply work.
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
Another note, is that .NET 3.5SP1 supposedly has much improved cold-start speed, though how much, I cannot say.
It could be the NICs (LAN Cards) and that your app depends on certain other
services that require the network to come up. So profiling your application alone may not quite tell you this, but you should examine the dependencies for your application.
If your application is not very complicated, you can just copy all the executables to another directory, it should be similar to a reboot. (Cut and Paste seems not work, Windows is smart enough to know the files move to another folder is cached in the memory)

Resources