WinForms Performance Question

WinForms Performance Question - visual-studio

My application has several large forms with lots of images which dramatically increases the size of the built executable. Over time, it seems that the startup performance becomes sluggish and it doesn't seem to be getting any better.
If I put all of the forms besides the main form in a separate dll, would it alleviate some of the pressure put on the application during startup?
I'd test it myself, but I have A LOT of forms and I don't want to do it unless someone can confirm that such an action will prove to be useful.

Many factors can affect startup performance. Have you used any tools to prove that it's the images?
For a start, go through these tips:
http://devcomponents.com/blog/?p=361
And consider using multithreading to load bigger objects in the background.

I'm not quite sure about that, but if I were you I would use the Profiler when it comes to improving performance.
Before I go guessing what's wrong, I consult with it and work my way up, because it tells me which methods and classes are costing the most in my code.

Another tip that may be useful: This reduced my application's startup time from 2 minutes to <10 seconds on a low-end thin client. Use NGEN to generate a precompiled native image of your assemblies.

I'm wondering if you were to use MEF and Lazy load, then when you actually need the module (Form) instantiate by calling .Value.
There are a couple of things I do with applications containing a lot of forms:
Create a UI .exe: basically only my forms
Create a backend .dll: everything that does the work behind the UI.
Are the images actually included in the .dll? If so, I would actually put my images into a .dll separate from the UI.
Given that the images are for toolbars, I wouldn't split them out as resources. I'll still stand fast on my advice to split into multiple .dlls.

As others said, profile, don't guess.
Not just any profiler will do.
Here's a user (besides me) who discovered random pausing on his own.
You say the "intense" methods are all in dlls you don't have source code for - that's typical and normal.
What you need to know is which statements in your code are requesting the time to be spent, and they can't be restricted to CPU-only time.
Most profilers don't tell you this, but random-pausing does.
If you're interested, here's a recent discussion of the issues.

Related

Why isn't there a widely used tool to "warmup" .NET applications to prevent "cold start"?

I understand why cold starts happen (Byte code needs to be turned into machine code through JIT compilation). However with all the generated meta data available for binaries these days I do not understand why there isn't a simple tool that automatically takes the byte code and turns ALL PATHS THROUGH THE CODE (auto discovered) into machine code specific for that target platform. That would mean the first request through any path (assume a rest api) would be fast and not require any further Just In Time Compilation.
We can create an automation test suite or load test to JIT all the paths before allowing the machine into the load balancer rotation (good best practice anyway). We can also flip the "always on" setting in cloud hosting providers to keep the warmed application from getting evicted from memory (requiring the entire process over again). However, it seems like such an archaic process to still be having in 2020.
Why isn't there a tool that does this? What is the limitation that prevents us from using meta data, debug symbols and/or other means to understand how to generate machine code that is already warm and ready for users from the start?

So I have been asking some sharp minds around my professional network and no one seems to be able to point out exactly what limitation makes this so hard to do. However, I did get a few tools on my radar that do what i'm looking for to some level.
Crossgen appears to be the most promising but it's far from widely used among the many peers I've spoken to. Will have to take a closer look.
Also several do some sort of startup task that runs some class initialization and also register them as singletons. I wouldn't consider those much different then just running integration or load tests on the application.

Most programming languages have some form of native image compiler tool. It's up to you to use them if that is what you are looking to do.
Providers are supposed to give you a platform for your application and there is a certain amount of isolation and privacy you should expect from your provider. They should not go digging into your application to figure out all its "paths". That would be very invasive.
Plus "warming up" all paths would be an overly resource intensive process for a provider to be obligated to perform for every application they host.

Intercept BIG application execution after DLL injection

I must intercept execution in very big application in many places.
What programs I can use to do this? What techniques exists for this problems?
Manually reverse engineering and adding hooks is maybe not optimal solution for this problem, because application is very big and some part of application can be updated in some time, i think with some tools or good practices for this problem i can do this faster, anyone know how to do?
Anybody help me?

seeing as the tools part has been covered, here is something for the techniques.
Depending what it is you need to hook and whether or not there is protection invloved, there are a few methods:
Relative call/jmp patching in the virtualized binary: this is the simplest, but also a lot of work if you can't automatically find all references to a function, this probably won't work in this cause due to your criteria.
IAT/EAT hooking: this is use for imports(IAT) and exports(EAT), great if your targeting a known importted/exported set of API functions. a good example of this can be found here or here
Hot-Patching: Windows XP SP2 introduced something called "hot-patching" (used for realtime system function updates), where all its (the WinAPI) functions start with a 'mov edi,edi', allowing a relative jump to be patched into the free space created above every hot-patchable function(one can do it too). this is generally used for programs that checksum there IAT's or have other funny forms of protection, more info can be found here and here
Code-Caving: capturing execution flow by placing redirections in arbitrary code space. see here, here or here
VFT/COM Redirection: basically overwriting entries in a objects virtual function table, useful for OOP/COM based applications. see this
There are a lot of 3rd party libraries, most famous would probably be MS Detours, one can also look at APIHijack or a mini-hook engine.
Ofcourse nothing can substitute for the initial poking you'll need to do with a debugger like ollydbg, but knowing the method your gonna use can drastically short them amount time time spent poking around

Some details on what exactly you need to do (e.g. how do you determine where to break) would be nice. Depending on your situation, something like Pin might work.

I suggest using Deviare API Hook. It's the easiest way you can do what you need. It has some COM objects that you can use to hook an application from a different process. In your process you get full parameter information and you can use it in any programming language (I'm using C# and it works like a charm).
If you need to intercept registry API I suggest using Deviare to debug what you need to intercept but then you will have to make your own hooks, otherwise, you'll find performance issues.

You can do API Hooking if you are interested in intercepting method calls.
Or use some disassembler like softice or ollydbg or win32dasm.

How to increase the startup speed of the delphi app?

What do you do to increase startup speed (or to decrease startup time) of your Delphi app?
Other than application specific, is there a standard trick that always works?
Note: I'm not talking about fast algorithms or the likes. Only the performance increase at startup, in terms of speed.

In the project options, don't auto-create all of your forms up front. Create and free them as needed.

Try doing as little as possible in your main form's OnCreate event. Rather move some initialization to a different method and do it once the form is shown to the user. An indicator that the app is busy with a busy mouse cursor goes a long way.
Experiments done shows that if you take the exact same application and simply add a startup notification to it, users actually perceive that app as starting up faster!
Other than that you can do the usual things like exclude debug information and enable optimization in the compiler.
On top of that, don't auto create all your forms. Create them dynamically as you need them.

Well, as Argalatyr suggested I change my comment to a separate answer:
As an extension to the "don't auto create forms" answer (which will be quite effective by itself) I suggest to delay opening connections to databases, internet, COM servers and any peripheral device until you need it first.

Three things happen before your form is shown:
All 'initialization' blocks in all units are executed in "first seen" order.
All auto-created forms are created (loaded from DFM files and their OnCreate handler is called)
You main form is displayed (OnShow and OnActivate are called).
As other have pointed out, you should auto-create only small number of forms (especially if they are complicated forms with lots of component) and should not put lengthy processing in OnCreate events of those forms. If, by chance, your main form is very complicated, you should redesign it. One possibility is to split main form into multiple frames which are loaded on demand.
It's also possible that one of the initialization blocks is taking some time to execute. To verify, put a breakpoint on the first line of your program (main 'begin..end' block in the .dpr file) and start the program. All initialization block will be executed and then the breakpoint will stop the execution.
In a similar way you can step (F8) over the main program - you'll see how long it takes for each auto-created form to be created.

Display a splash screen, so people won't notice the long startup times :).

Fastest code - it's the code, that never runs. Quite obvious, really ;)

Deployment of the application can (and usually does!) happen in ways the developer may not have considered. In my experience this generates more performance issues than anyone would want.
A common bottleneck is file access - a configuration file, ini file that is required to launch the application can perform well on a developer machine, but perform abysmally in different deployment situations. Similarly, application logging can impede performance - whether for file access reasons or log file growth.
What I see so often are rich-client applications deployed in a Citrix environment, or on a shared network drive, where the infrastructure team decides that user temporary files or personal files be stored in a location that the application finds issues with, and this leads to performance or stability issues.
Another issue I often see affecting application performance is the method used to import and export data to files. Commonly in Delphi business applications I see export functions that work off DataSets - iterating and writing to file. Consider the method used to write to file, consider memory available, consider that the 'folder' being written to/read from may be local to the machine, or it may be on a remote server.
A developer may argue that these are installation issues, outside the scope of their concern. I usually see many cycles of developer analysis on this sort of issue before it is identified as an 'infrastructure issue'.

First thing to do is to clear auto
created forms list (look for Project
Options). Create forms on the fly
when needed, especially if the
application uses database connection
(datamodule) or forms that include
heavy use of controls.
Consider using form inheritance also
to decrease exe size (resource usage is mimized)
Decrease number of forms and merge similar or related functionality into single form

Put long running tasks (open database connections, connect to app server, etc) that have to be performed on startup in a thread. Any functionality that depends on these tasks are disabled until the thread is done.
It's a bit of a cheat, though. The main form comes up right away, but you're only giving the appearance of faster startup time.

Compress your executable and any dlls using something like ASPack or UPX. Decompression time is more than made up for by faster load time.
UPX was used as an example of how to load FireFox faster.
Note that there are downsides to exe compression.

This is just for the IDE, but Chris Hesick made a blog posting about increasing startup performance under the debugger.

How to consistently organize code for debugging?

When working in a big project that requires debugging (like every project) you realize how much people love "printf" before the IDE's built-in debugger. By this I mean
Sometimes you need to render variable values to screen (specially for interactive debugging).
Sometimes to log them in a file
Sometimes you have to change the visibility (make them public) just to another class to access it (a logger or a renderer for example).
Some other times you need to save the previous value in a member just to contrast it with the new during debugging
...
When a project gets huge with a lot of people working on it, all this debugging-specific code can get messy and hard to differentiate from normal code. This can be crazy for those who have to update/change someone else's code or to prepare it for a release.
How do you solve this?
It is always good to have naming standards and I guess that debug-coding standards should be quite useful (like marking every debug-variable with a _DBG sufix). But I also guess naming is just not enough. Maybe centralizing it into a friendly tracker class, or creating a robust base of macros in order to erase it all for the release. I don't know.
What design techniques, patterns and standards would you embrace if you are asked to write a debug-coding document for all others in the project to follow?
I am not talking about tools, libraries or IDE-specific commands, but for OO design decisions.
Thanks.

Don't commit debugging code, just debuggin tools.
Loggin OTOH has a natural place in execption handling routines and such. Also a few well placed logging statments in a few commonly used APIs can be good for debugging.
Like one log statment to log all SQL executed from the system.

My vote would be with what you described as a friendly tracker class. This class would keep all of that centralized, and potentially even allow you to change debug/logging strategies dynamically.
I would avoid things like Macros simply because that's a compiler trick, and not true OO. By abstracting the concept of debug/logging, you have the opportunity to do lots of things with it including making it a no-op if needed.

Logging or debugging? I believe that well-designed and properly unit-tested application should not need to be permanently instrumented for debugging. Logging on the other hand can be very useful, both in finding bugs and auditing program actions. Rather than cover a lot of information that you can get elsewhere, I would point you at logging.apache.org for either concrete implementations that you can use or a template for a reasonable design of a logging infrastructure.

I think it's particularly important to avoid using System.outs / printfs directly and instead use (even a custom) logging class. That at least gives you a centralized kill-switch for all the loggings (minus the call costs in Java).
It is also useful to have that log class have info/warn/error/caveat, etc.
I would be careful about error levels, user ids, metadata, etc. since people don't always add them.
Also, one of the most common problems that I've seen is that people put temporary printfs in the code as they debug something, and then forget where they put them. I use a tool that tracks everything that I do so I can quickly identify all my recent edits since an abstract checkpoint and delete them. In most cases, however, you may want to pose special rules on debug code that can be checked into your source control.

In VB6 you've got
Debug.Print
which sends output to a window in the IDE. It's bearable for small projects. VB6 also has
#If <some var set in the project properties>
'debugging code
#End If
I have a logging class which I declare at the top with
Dim Trc as Std.Traces
and use in various places (often inside #If/#End If blocks)
Trc.Tracing = True
Trc.Tracefile = "c:\temp\app.log"
Trc.Trace 'with no argument stores date stamp
Trc.Trace "Var=" & var
Yes it does get messy, and yes I wish there was a better way.

We routinely are beginning to use a static class that we write trace messages to. It is very basic and still requires a call from the executing method, but it serves our purpose.
In the .NET world, there is already a fair amount of built in trace information available, so we do not need to worry about which methods are called or what the execution time is. These are more for specific events which occur in the execution of the code.
If your language does not support, through its tracing constructs, categorization of messages, it should be something that you add to your tracing code. Something to the effect that will identify different levels of importance and/or functional areas is a great start.

Just avoid instrumenting your code by modifying it. Learn to use a debugger. Make logging and error handling easy. Have a look at Aspect Oriented Programming

Debugging/Logging code can indeed be intrusive. In our C++ projects, we wrap common debug/log code in macros - very much like asserts. I often find that logging is most usefull in the lower level components so it doesn't have to go everywhere.
There is a lot in the other answers to both agree and disagree with :) Having debug/logging code can be a very valuable tool for troubleshooting problems. In Windows, there are many techniques - the two major ones are:
Extensive use of checked (DBG) build asserts and lots of testing on DBG builds.
the use of ETW in what we call 'fre' or 'retail' builds.
Checked builds (what most ohter call DEBUG builds) are very helpfull for us as well. We run all our tests on both 'fre' and 'chk' builds (on x86 and AMD64 as well, all serever stuff runs on Itanium too...). Some people even self host (dogfood) on checked builds. This does two things
Finds lots of bugs that woldn't be found otherwise.
Quickly elimintes noisy or unnessary asserts.
In Windows, we use Event Tracing for Windows (ETW) extensively. ETW is an efficient static logging mechanism. The NT kernel and many components are very well instrumented. ETW has a lot of advantages:
Any ETW event provider can be dynamically enabled/disabled at run time - no rebooting or process restarts required. Most ETW providers provide granular control over individual events, or groups of events.
Events from any provider (most importantly the kernel) can be merged into a single trace so all events can be correlated.
A merged trace can be copied off the box and fully processed - with symbols.
The NT kernel sample pofile interrupt can generate an ETW event - this yeilds a very light weight sample profiler that can be used any time
On Vista and Windows Server 2008, logging an event is lock free and fully multi-core aware - a thread on each processor can independently log events with no synchronization needed between them.
This is hugly valuable for us, and can be for your Windows code as well - ETW is usuable by any component - including user mode, drivers and other kernel components.
One thing we often do is write a streaming ETW consumer. Instead of putting printfs in the code, I just put ETW events at intersting places. When my componetn is running, I can then just run my ETW watcher at any time - the watcher receivs the events and displays them, conts them, or does other interesting things with them.
I very much respectfully disagree with tvanfosson. Even the best code can benefit from well implemented logging. Well implimented static run-time logging can make finding many problems straight forward - without it, you have zero visiblilty into what is happening in your component. You can look at inputs, outputs and guess -that's about it.
They key here is the term 'well implimented'. Instrumentation must be in the right place. Like any thing else, this requries some thought and planning. If it is not in helpfull/intersting places, then it will not help you find problems in a a development, testing, or deployed scenario. You can also have too much instrumeation causing perf problems when it is on - or even off!
Of course, different software products or componetns will have different needs. Some things may need very little instrumenation. But a widely depolyed or critical component can greatly benefit from weill egineered instrumeantion.
Here is a scenario for you (note, this very well may not apply to you...:) ). Lets say you have a line-of-business app deployed on every desktop in your company - hundreds or thousands of users. What do you do when someone has a pbolem? Do yo stop by their office and hookup a debugger? If so, how do you know what version they have? Where do you get the right symbols? How do you get the debuger on their system? What if it only happens once every few hours or days? Are you going to let the system run with the debugger connected all that time?
As you can imagine - hooking up debugger in this scenario is disruptive.
If your component is instrumented with ETW, then you could ask your user to simply turn on tracing; continue to do his/her work; then hit the "WTF" button when the problem happens. Even better: your app may have be able to self log - detecting problems at run time and turning on logging auto-magicaly. It could even send you ETW files when problems occured.
These are just simple exmples - logging can be handled many different ways. My key recomendation here is to think about how loging might be able to help you find, debug, and fix problems in your componetns at dev time, test time, and after they are deployed.

I was burnt by the same issue in about every project I've been involved with, so now I have this habit that involves extensive use of logging libraries (whatever the language/platform provides) from the start. Any Log4X port is fine for me.

Building yourself some proper debug tools can be extremely valuable. For example in a 3D environment, you might have an option to display the octree, or to render planned AI paths, or to draw waypoints that are normally invisible. You'd probably also want some on-screen display to aid with profiling too: the current framerate, count of polygons on screen, texture memory usage, and so on.
Although this takes some time and effort to do, in the long run it can save you a lot of time and frustration.

Comparing cold-start to warm start

Our application takes significantly more time to launch after a reboot (cold start) than if it was already opened once (warm start).
Most (if not all) the difference seems to come from loading DLLs, when the DLLs' are in cached memory pages they load much faster. We tried using ClearMem to simulate rebooting (since its much less time consuming than actually rebooting) and got mixed results, on some machines it seemed to simulate a reboot very consistently and in some not.
To sum up my questions are:
Have you experienced differences in launch time between cold and warm starts?
How have you delt with such differences?
Do you know of a way to dependably simulate a reboot?
Edit:
Clarifications for comments:
The application is mostly native C++ with some .NET (the first .NET assembly that's loaded pays for the CLR).
We're looking to improve load time, obviously we did our share of profiling and improved the hotspots in our code.
Something I forgot to mention was that we got some improvement by re-basing all our binaries so the loader doesn't have to do it at load time.

As for simulating reboots, have you considered running your app from a virtual PC? Using virtualization you can conveniently replicate a set of conditions over and over again.
I would also consider some type of profiling app to spot the bit of code causing the time lag, and then making the judgement call about how much of that code is really necessary, or if it could be achieved in a different way.

It would be hard to truly simulate a reboot in software. When you reboot, all devices in your machine get their reset bit asserted, which should cause all memory system-wide to be lost.
In a modern machine you've got memory and caches everywhere: there's the VM subsystem which is storing pages of memory for the program, then you've got the OS caching the contents of files in memory, then you've got the on-disk buffer of sectors on the harddrive itself. You can probably get the OS caches to be reset, but the on-disk buffer on the drive? I don't know of a way.

How did you profile your code? Not all profiling methods are equal and some find hotspots better than others. Are you loading lots of files? If so, disk fragmentation and seek time might come into play.
Maybe even sticking basic timing information into the code, writing out to a log file and examining the files on cold/warm start will help identify where the app is spending time.
Without more information, I would lean towards filesystem/disk cache as the likely difference between the two environments. If that's the case, then you either need to spend less time loading files upfront, or find faster ways to load files.
Example: if you are loading lots of binary data files, speed up loading by combining them into a single file, then do a slerp of the whole file into memory in one read and parse their contents. Less disk seeks and time spend reading off of disk. Again, maybe that doesn't apply.
I don't know offhand of any tools to clear the disk/filesystem cache, but you could write a quick application to read a bunch of unrelated files off of disk to cause the filesystem/disk cache to be loaded with different info.

#Morten Christiansen said:
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
That makes the customer pay for initializing our app at every boot even when it isn't used, I really don't like that option (neither does Raymond).

One succesful way to speed up application startup is to switch DLLs to delay-load. This is a low-cost change (some fiddling with project settings) but can make startup significantly faster. Afterwards, run depends.exe in profiling mode to figure out which DLLs load during startup anyway, and revert the delay-load on them. Remember that you may also delay-load most Windows DLLs you need.

A very effective technique for improving application cold launch time is optimizing function link ordering.
The Visual Studio linker lets you pass in a file lists all the functions in the module being linked (or just some of them - it doesn't have to be all of them), and the linker will place those functions next to each other in memory.
When your application is starting up, there are typically calls to init functions throughout your application. Many of these calls will be to a page that isn't in memory yet, resulting in a page fault and a disk seek. That's where slow startup comes from.
Optimizing your application so all these functions are together can be a big win.
Check out Profile Guided Optimization in Visual Studio 2005 or later. One of the thing sthat PGO does for you is function link ordering.
It's a bit difficult to work into a build process, because with PGO you need to link, run your application, and then re-link with the output from the profile run. This means your build process needs to have a runtime environment and deal cleaning up after bad builds and all that, but the payoff is typically 10+ or more faster cold launch with no code changes.
There's some more info on PGO here:
http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx

As an alternative to function order list, just group the code that will be called within the same sections:
#pragma code_seg(".startUp")
//...
#pragma code_seg
#pragma data_seg(".startUp")
//...
#pragma data_seg
It should be easy to maintain as your code changes, but has the same benefit as the function order list.
I am not sure whether function order list can specify global variables as well, but use this #pragma data_seg would simply work.

One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
Another note, is that .NET 3.5SP1 supposedly has much improved cold-start speed, though how much, I cannot say.

It could be the NICs (LAN Cards) and that your app depends on certain other
services that require the network to come up. So profiling your application alone may not quite tell you this, but you should examine the dependencies for your application.

If your application is not very complicated, you can just copy all the executables to another directory, it should be similar to a reboot. (Cut and Paste seems not work, Windows is smart enough to know the files move to another folder is cached in the memory)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio