Performance tuning VBA code in large procedure - performance

I've been asked to tune the performance of a specific function which loads every time a worksheet is opened (so it's important that it doesn't make things slow). One of the things that seems to make this function slow is that it does a long call to the database (which is remote), but there are a bunch of other possibilities too. So far, I've been stepping through the code, and when something seems to take a long time making a note of it as a candidate for tuning.
I'd like a more objective way to tell which calls are slowing me down. Searching for timing and VBA yields a lot of results which basically amount to "Write a counter, and start and stop it either side of the critical section" (often with the macro explicitly called). I was wondering whether there was a way to (in the debugger) do something like "Step to next line, and tell me the time elapsed".
If not, can someone suggest a reasonable macro that I could use in the Immediate window to get what I'm after? Specifically, I would like to be able to time an arbitrary line of code within a larger procedure (rather than a whole procedure at once, which is what I found through Google).

Keywords for your further search would be to look for a "Profiler" for VBA. I've heard of VB Watch and VBA Code Profiler System (VBACP) as well as from Stephen Bull's PerfMon, but sparing the latter they're mostly not free.
So far for the official part of my answer, and I toss in some extra in terms of maybe useless suggestions:
Identifying "slow" code by "humanly measurement" (run a line and say: "Woah, that takes forever") in the debugger is certainly helpful, and you can then start looking into why they're slow. Your remote database call may take quite long if it has to transmit a lot of data - in which cases it may be a good idea to timestamp the data on both ends and ask the DB whether data had been modified before you grab it.
Writing the data into the sheet may be slow depending on the way you write it - which can sometimes be improved by writing arrays to a range instead of some form of iteration.
And I probably don't need to tell you about ScreenUpdating and EnableEvents and so on?


Are Sitecore's sublayout rendering stats incorrect?

The built-in Sitecore rendering stats http://<sitename>/sitecore/admin/stats.aspx is really helpful for identifying inefficient and slow-loading XSLT renders. Recently I've started switching to .ascx sub layouts to take advantage of the Sitecore C# API which can help improve performance when used correctly.
However, I've noticed that sub layouts (as opposed to XSLT renders) are not reported correctly on the stats page. See the screenshot below....
I know for a fact that this sub layout takes about 1.8 seconds to generate (I calculated this in the code-behind). Caching is turned off. I've refreshed the page 20 times to ensure I get an average. You will see that the "Avg. items" is always 0 - I can live with this - but the "Avg. time (ms)" is less than 1ms which is just clearly wrong.
Does anyone have any insights into this? Has anyone found a way to get it to work correctly?
Judging whether a statistic is right/wrong is going to rely on understanding exactly what it is measuring.
Digging around in Sitecore.Diagnostics.Statistics using Reflector I note the following:
Sitecore.Web.UI.Webcontrol contains a field m_timer
This is 'started' in the BeforeRender() method and 'stopped' in the AfterRender() method
Data from that timer is sent to Statistics.AddRenderingData() and is logged against the control
This means it is measuring the time taken to render the control, which for an XSLT includes the processing time for preparing all the data in it, but as much of the work of a normal ASCX is done prior to the Render-stage the statistic is much less useful. Incorporating the Load stage in the time would inadvertently include the processing time for all child components, since the Load sequence is chained and called recursively, so that probably doesn't help much either.
I suspect there is no good way of measuring the processing time for a specific ASCX control (excluding children) without first acquiring cumulative data then post-processing the call chain and splitting the time apart. This is the sort of thing RedGate ANTS does really well, but might not be so good if it was being executed on a live production system, given the overheads.

How to use DoEvents() without being "evil"?

A simple search for DoEvents brings up lots of results that lead, basically, to:
DoEvents is evil. Don't use it. Use threading instead.
The reasons generally cited are:
Re-entrancy issues
Poor performance
Usability issues (e.g. drag/drop over a disabled window)
But some notable Win32 functions such as TrackPopupMenu and DoDragDrop perform their own message processing to keep the UI responsive, just like DoEvents does.
And yet, none of these seem to come across these issues (performance, re-entrancy, etc.).
How do they do it? How do they avoid the problems cited with DoEvents? (Or do they?)
DoEvents() is dangerous. But I bet you do lots of dangerous things every day. Just yesterday I set off a few explosive devices (future readers: note the original post date relative to a certain American holiday). With care, we can sometimes account for the dangers. Of course, that means knowing and understanding what the dangers are:
Re-entry issues. There are actually two dangers here:
Part of the problem here has to do with the call stack. If you call .DoEvents() in a loop that itself handles messages that use DoEvents(), and so on, you're getting a pretty deep call stack. It's easy to over-use DoEvents() and accidentally fill up your call stack, resulting in a StackOverflow exception. If you're only using .DoEvents() in one or two places, you're probably okay. If it's the first tool you reach for whenever you have a long-running process, you can easily find yourself in trouble here. Even one use in the wrong place can make it possible for a user to force a stackoverflow exception (sometimes just by holding down the enter key), and that can be a security issue.
It is sometimes possible to find your same method on the call stack twice. If you didn't build the method with this in mind (hint: you probably didn't) then bad things can happen. If everything passed in to the method is a value type, and there is no dependance on things outside of the method, you might be fine. But otherwise, you need to think carefully about what happens if your entire method were to run again before control is returned to you at the point where .DoEvents() is called. What parameters or resources outside of your method might be modified that you did not expect? Does your method change any objects, where both instances on the stack might be acting on the same object?
Performance Issues. DoEvents() can give the illusion of multi-threading, but it's not real mutlithreading. This has at least three real dangers:
When you call DoEvents(), you are giving control on your existing thread back to the message pump. The message pump might in turn give control to something else, and that something else might take a while. The result is that your original operation could take much longer to finish than if it were in a thread by itself that never yields control, definitely longer than it needs.
Duplication of work. Since it's possible to find yourself running the same method twice, and we already know this method is expensive/long-running (or you wouldn't need DoEvents() in the first place), even if you accounted for all the external dependencies mentioned above so there are no adverse side effects, you may still end up duplicating a lot of work.
The other issue is the extreme version of the first: a potential to deadlock. If something else in your program depends on your process finishing, and will block until it does, and that thing is called by the message pump from DoEvents(), your app will get stuck and become unresponsive. This may sound far-fetched, but in practice it's surprisingly easy to do accidentally, and the crashes are very hard to find and debug later. This is at the root of some of the hung app situations you may have experienced on your own computer.
Usability Issues. These are side-effects that result from not properly accounting for the other dangers. There's nothing new here, as long as you looked in other places appropriately.
If you can be sure you accounted for all these things, then go ahead. But really, if DoEvents() is the first place you look to solve UI responsiveness/updating issues, you're probably not accounting for all of those issues correctly. If it's not the first place you look, there are enough other options that I would question how you made it to considering DoEvents() at all. Today, DoEvents() exists mainly for compatibility with older code that came into being before other credible options where available, and as a crutch for newer programmers who haven't yet gained enough experience for exposure to the other options.
The reality is that most of the time, at least in the .Net world, a BackgroundWorker component is nearly as easy, at least once you've done it once or twice, and it will do the job in a safe way. More recently, the async/await pattern or the use of a Task can be much more effective and safe, without needing to delve into full-blown multi-threaded code on your own.
Back in 16-bit Windows days, when every task shared a single thread, the only way to keep a program responsive within a tight loop was DoEvents. It is this non-modal usage that is discouraged in favor of threads. Here's a typical example:
' Process image
For y = 1 To height
For x = 1 to width
ProcessPixel x, y
End For
DoEvents ' <-- DON'T DO THIS -- just put the whole loop in another thread
End For
For modal things (like tracking a popup), it is likely to still be OK.
I may be wrong, but it seems to me that DoDragDrop and TrackPopupMenu are rather special cases, in that they take over the UI, so don't have the reentrancy problem (which I think is the main reason people describe DoEvents as "Evil").
Personally I don't think it's helpful to dismiss a feature as "Evil" - rather explain the pitfalls so that people can decide for themselves. In the case of DoEvents there are rare cases where it's still reasonable to use it, for example while a modal progress dialog is displayed, where the user can't interact with the rest of the UI so there is no re-entrancy issue.
Of course, if by "Evil" you mean "something you shouldn't use without fully understanding the pitfalls", then I agree that DoEvents is evil.

Excel List-Object VBA Performance Bug?

I have an issue with performance on an excel application which uses List Objects (AKA Excel Tables). I suspect it may be a bug, but despite my Googling I could not find any reference of it. I've already developed a workaround for my application, but what I’m interested in is if anyone can give any insight into why this happens.
Note: I’m using Excel 2007 on Windows Vista. The setup is as follows: I have a spreadsheet which holds data in a List Object, with VBA code which can be kicked off via a command button; this code may make several edits to any number of cells on the worksheet, so Excel’s Calculation mode is set to Manual prior to any edits.
The problem I’ve encountered is that if the currently active cell is within the List Object, then setting the Calculation Mode to manual seems to have no effect whatsoever. So if a user happens to have a heavy calculation workbook open in the same instance, then the VBA code runs very slowly. I practically had to pull my application apart to discover that this was caused by the active cell; and I created a new workbook with simple version of this scenario to confirm that there wasn’t some sort of corruption on my application.
I’ve been doing a number of test cases with this, and below are the results from what I’ve found:
Although it seems generally related to the calculation, there is still a time difference when the calculation mode is switched between Manual and Automatic...
Manual = 7.64 secs
Automatic = 9.39 secs
Manual mode is just fewer than 20% faster than Automatic. But my expectation was they’d be more or less the same, considering the issue seems to be the calculation kicking off even when in Manual mode.
Compare that to when the active cell is not on a List Object, and the results are vastly different...
Manual = 0.14 secs
Automatic = 3.23 secs
Now, the Manual run is 50 times faster, and Automatic run shows that the calculation shouldn’t have taken any more than 3.2 secs! So now the first test looks like it might have run the Calculation twice while in Manual mode, and nearly 3 times while in Automatic mode.
Repeating this test again, this time in an instance with no calculation formula in any cells, and suddenly it doesn’t seem as bad,
Active cell is List Object & Calc is Manual = 0.17 secs
Active cell is List Object & Calc is Automatic = 0.20 secs
Active cell is Empty & Calc is Manual = 0.14 secs
Active cell is Empty & Calc is Automatic = 0.18 secs
It’s still slower, but now it’s only by 10-20%, making it unnoticeable. But this does show that the issue must be related to the Calculation in some way, as otherwise it should have taken just as long as the first test.
If anyone wants to create these tests to see for themselves, the setup is as follows:
New Workbook with a List Object added (doesn’t have to be linked to any data)
Add some formula that will take excel a while to calculate (I just did ‘=1*1’ repeated 30,000 times)
Write a quick VBA code which will; (i) loop through a simple edit of a cell several hundred times, (ii) and record the time it took
Then just run the code while changing the active cell between the List Object and an empty cell
I’d be very interested to hear if anyone can explain why Excel behaves in this way, and if is a bug or if is some feature to do with List Objects which actually has some genuine use?
This is not relative to the "bug" you found, which is quite interesting and intriguing.
I just want to share that there is a great way to avoid calculation delays. I had fantastic results with this and now I use it all the time.
Simply put, Excel takes a long time copying data back and forth between the "VBA world" and the "spreadsheet world".
If you do all the "reads" at once, process, and then do all the "writes" at once, you get amazing performance. This is done using variant arrays as documented here:
in the section labeled: Read and Write Large Blocks of Data in a Single Operation
I was able to refactor some code I had that took 5 minutes to run and bring it down to 1.5 minutes. The refactoring took me 10 minutes, which is amazing because it was quite complex code.
Regarding Table performance (and performance, in general):
I know this is an old question, but I want to get this documented.
One thing that changed between older versions of Excel and the post-2007 versions is that Excel now activates the target sheet of any PasteSpecial operation. You cannot override it by turning off ScreenUpdating and making calculations manual. Such Activation WILL make the sheet visible, and cause uncontrollable flicker.
My original VBA code ran very fast on an old, single-processor XP box running Excel 2000. The change to Excel 2013 on a modern machine was stunning in the terrible slowness of code execution. The three areas that kill performance are PasteSpecial from one sheet to another, any other code that requires activating sheets (Zoom level, Advanced Filter, Sheet-Level range names, etc), and automating sheet protection/unprotection.
This is too bad, because PasteSpecial helped "cleanse" data you copy (Direct use of .Copy to a target will throw the occasional error).
So you need to review your code and make sure you are using direct assignment to the right property for the data type you need (from among Value, Value2, Text, and Formula, for example), instead of PasteSpecial.
e.g. .Range("MYRANGE").Value = .Cells(5, 7).Value2
You also need to be scrupulous in resisting use of Select and Activate throughout your code.
As referenced above, many comments you'll find in Excel fora about that last point will make statements that you "never" need to use Activation, which is clearly untrue, since several things in Excel only apply to or require active sheets. Understanding the cases where activation is forced automatically by a particular method or use of an object will help in coding as well. Unfortunately, you won't see much in the way of documentation of this.
Regarding Conditional Formatting, you'll find many complaints in various fora about the slowness of Excel when encountering a large number of Conditionally-formatted cells. I suspected this would impact Excel Tables since they have many table format options. To test this, I took a large workbook we use that is currently formatted as several worksheets with the same style of Excel Table on them.
After converting the tables to a conventional range, I noticed no difference in speed of code execution. This would seem to indicate that use of Excel Table formats is far superior to conditionally-formatting your own arrays of cells.

Is midiOutPrepareHeader a quick call?

Does midiOutPrepareHeader, midiInPrepareHeader just setup some data fields, or does it do something that is more time intensive?
I am trying to decide whether to build and destroy the MIDIHDR's as needed, or to maintain a pool of them.
You really have only two ways to tell (without the Windows source):
1) Profile it. Depending on your findings for how long it takes, have a debug-only scoped timer that logs when it suddenly takes longer than what you think is acceptable for your application, or do your pool solution. Though the docs say not to modify the buffer once you call the prepare function, and it seems if you wanted to re-use it you may have to modify it. I'm not familiar enough with the docs to say one way or the other if your proposed solution would work.
2) Step through the assembly and see. Don't be afraid. Get the MSFT public symbols and see if it looks like it's just filling out fields or if it's doing something complicated.

How Does AQTime Do It?

I've been testing out the performance and memory profiler AQTime to see if it's worthwhile spending those big $$$ for it for my Delphi application.
What amazes me is how it can give you source line level performance tracing (which includes the number of times each line was executed and the amount of time that line took) without modifying the application's source code and without adding an inordinate amount of time to the debug run.
The way that they do this so efficiently makes me think there might be some techniques/technologies used here that I don't know about that would be useful to know about.
Do you know what kind of methods they use to capture the execution line-by-line without code changes?
Are there other profiling tools that also do non-invasive line-by-line checking and if so, do they use the same techniques?
I've made an open source profiler for Delphi which does the same:
It's not perfect, but it's free :-). Is also uses the Detour technique.
It stores every call (you must manual set which functions you want to profile),
so it can make an exact call history tree, including a time chart (!).
This is just speculation, but perhaps AQtime is based on a technology that is similar to Microsoft Detours?
Detours is a library for instrumenting
arbitrary Win32 functions on x86, x64,
and IA64 machines. Detours intercepts
Win32 functions by re-writing the
in-memory code for target functions.
I don't know about Delphi in particular, but a C application debugger can do line-by-line profiling relatively easily - it can load the code and associate every code path with a block of code. Then it can break on all the conditional jump instructions and just watch and see what code path is taken. Debuggers like gdb can operate relatively efficiently because they work through the kernel and don't modify the code, they just get informed when each line is executed. If something causes the block to be exited early (longjmp), the debugger can hook that and figure out how far it got into the blocks when it happened and increment only those lines.
Of course, it would still be tough to code, but when I say easily I mean that you could do it without wasting time breaking on each and every instruction to update a counter.
The long-since-defunct TurboPower also had a great profiling/analysis tool for Delphi called Sleuth QA Suite. I found it a lot simpler than AQTime, but also far easier to get meaningful result. Might be worth trying to track down - eBay, maybe?
