executePackage seems to take a long time to launch subpackage - performance

I am a relative beginner at SSIS so I may be doing something silly.
I have a process that involves looping over a heterogenous queue and processing the objects 1 at a time. The process is currently being done in 'set logic' and its dropping stuff. I was asked to rework it in a looping manner, so that decision has been made for me.
I have chosen to implement queue logic in 1 package and the actual processing in another package.
This is all going relatively well considering...
I now have the process up and running, but its slow. 9 seconds per item. Clearly I cant present this solution. :-)
One thing i notice, 1.5 - 2 seconds of each loop are on the ExecutePackage Task in the queue loop.
I cant figure out how to get a hard number, I am using the flashing green box method of performance tuning. The other steps seem to be very fast. Adding indexes, changing sql to sps, all the usual tricks have helped.
Is the UI realiable at all with regards to boxes turning white/yellow/green? Some tasks report times in the progress tab, some dont seem to. So I am counting yellow time.
Should calling a subpackage be that expensive? 1 change i made was I change 'RunInASeparateProcess' to FALSE. I did that because the subpackage produces the following message otherwise:
Error: 0xC0012024 at Script Task: The task "Script Task" cannot run on this edition of Integration Services. It requires a higher level edition.
Task failed: Script Task
The reading i have done seems to advocate multiple packages. Anyone have any counter patterns? Should i stay the course? I started changing to 1 package. Copy/paste doesnt seem to work well w/ SequenceContainers. I would also need to recreate all the variables in the parent package. Doable, but im not sure that is the answer.
Does anyone know of any tuning resources/websites/books they would be willing to share.
Update - I have been tearing things down in an effort to figure out what the problem is. I was thinking it was the package configurations passing variable values. I dont think that is it. I can pass variables to another package w/ nothing in it and it is fast.
I can make the trivial subpackage slow by adding the two connection managers to it.
I suddenly realize I may be making and breaking a connection to both an Oracle Server and a SQL server in both the main package and then the sub package.
Am I correct in this observation?
Is there any way I can reuse the connection between the two packages?
When i google it, most of what i see is suggestions for passing the connection string.
UPDATE - I combined the two packages into one. This performance is not about 1.25 seconds per item, down from about 9. the only thing i can point to that changed is i am now reusing a single connection instead of making multiple connections.
Thanks, I appreciate any help you are kind enough to offer.
Greg

Once you enable logging, I'd suggest running the package from a command window using dtexec. While that doesn't perfectly duplicate the server environment, it does have the advantages of (a) eliminating BIDS as a potential performance issue and (b) being something you can do without jumping through change control hoops.

Related

Using tbl and src_monetdblite to access data

Sorry if this question has been asked elsewhere, I can't find it. I'm working through some basic examples in MonetDBLite.
> dbGetQuery(dbcon, "SELECT MAX(mpg) FROM mtcars WHERE cyl = 8")
L3
1 19.2
works, but
> ms <- MonetDBLite::src_monetdblite("./DB")
> t <- tbl(ms, "mtcars")
Error in UseMethod("tbl") :
no applicable method for 'tbl' applied to an object of class
"c('src_monetdb', 'src_sql', 'src')"
It seems that it's trying to assign the db to t not the table.
Any suggestions would be greatly appreciated.
I've been perusing resources and found a useR2016 presentation and noticed a difference here:
> ms
src: MonetDBEmbeddedConnection
tbls: mtcars
Curious...
I'm a huge fan of using MonetDBLite together with dplyr. My addition to Hannes Mühleisen's (thanks for the package!) answer would be that it appears that the order you load the packages can matter. Loading MonetDBLite after dplyr and dbplyr seems to be the key for me. Loading MonetDBLite first causes errors similar to the one nzgwynn noted.
Sometimes I could connect to the database with no problems. Other times I would get error messages like:
Error in UseMethod("db_query_fields") : no applicable method for 'db_query_fields' applied to an object of class "MonetDBEmbeddedConnection"
Like nzgwynn, I was puzzled about why it would work sometimes but not others. Restarting and reinstalling wouldn't necessarily fix it for me.
This clue, from an issue filed about sparklyr, lead me to explore the package loading order:
https://github.com/rstudio/sparklyr/issues/38
Like noted there with sparklyr, and I've noticed with other R database packages, MonetDBLite will load and attach automatically if the Global Environment already contains a connection object. My problem was that I had an src_monetdb object in my workspace, which was causing MonetDBLite to load upon starting RStudio. So I while I thought I was loading it after dplyr and dbplyr, it was really loading first. If I clear the workspace and then restart, I can load the packages in the preferred order. So far, this method has worked.
I've seen starting with a clean workspace advised as good practice generally, e.g.: https://twitter.com/hadleywickham/status/561146907519500288. Starting with a fresh workspace loses you no time either given MonetDBLite's speedy query ability.
Lastly, I would put a enthusiastic pitch in for using MonetDBLite. I saw it mentioned on RStudio's database page and was immediately impressed on how easy it was to setup and how fast it is. It's the best way I've found for working with a ~2 GB dataset in R. When exploring the data interactively, the dplyr queries run so quickly that it feels like I'm working with the data in memory. And if all I want to do is load the whole dataset into memory, MonetDBLite is as fast or faster than other methods I've tried like read.fst() from the fst package.
I closed R and opened it again and the same coding worked fine...
You need to call library("dplyr") before using tbl and friends. Also make sure you have dbplyr installed.
Update: Also, please make sure there is no connection object (src) in a stored workspace loaded at startup. Loading connections from .Rdata files does not work! Instead, create the connection/src from scratch every time you run a script.

How to deactivate safe mode in the mongo shell?

Short question is on the title: I work with my mongo Shell wich is in safe mode by default, and I want to gain better performance by deactivating this behaviour.
Long Question for those willing to know the context:
I am working on a huge set of data like
{
_id:ObjectId("azertyuiopqsdfghjkl"),
stringdate:"2008-03-08 06:36:00"
}
and some other fields and there are about 250M documents like that (whole database with the indexes weights 36Go). I want to convert the date in a real ISODATE field. I searched a bit how I could make an update query like
db.data.update({},{$set:{date:new Date("$stringdate")}},{multi:true})
but did not find how to make this work and resolved myself to make a script that take the documents one after the other and make an update to set a new field which takes the new Date(stringdate) as its value. The query use the _id so the default index is used.
Problem is that it takes a very long time. I already figured out that if only I had inserted empty dates object when I created the database I would now get better performances since there is the problem of data relocation when a new field is added. I also set an index on a relevant field to process the database chunk by chunk. Finally I ran several concurrent mongo clients on both the server and my workstation to ensure that the limitant factor is the database lock availability and not any other factor like cpu or network costs.
I monitored the whole thing with mongotop, mongostats and the web monitoring interfaces which confirmed that write lock is taken 70% of the time. I am a bit disappointed mongodb does not have a more precise granularity on its write lock, why not allowing concurrent write operations on the same collection as long as there is no risk of interference? Now that I think about it I should have sharded the collection on a dozen shards even while staying on the same server, because there would have been individual locks on each shard.
But since I can't do a thing right now to the current database structure, I searched how to improve performance to at least spend 90% of my time writing in mongo (from 70% currently), and I figured out that since I ran my script in the default mongo shell, every time I make an update, there is also a getLastError() which is called afterwards and I don't want it because there is a 99.99% chance of success and even in case of failure I can still make an aggregation request after the end of the big process to retrieve the single exceptions.
I don't think I would gain so much performance by deactivating the getLastError calls, but I think itis worth trying.
I took a look at the documentation and found confirmation of the default behavior, but not the procedure for changing it. Any suggestion?
I work with my mongo Shell wich is in safe mode by default, and I want to gain better performance by deactivating this behaviour.
You can use db.getLastError({w:0}) ( http://docs.mongodb.org/manual/reference/method/db.getLastError/ ) to do what you want but it won't help.
This is because for one:
make a script that take the documents one after the other and make an update to set a new field which takes the new Date(stringdate) as its value.
When using the shell in a non-interactive mode like within a loop it doesn't actually call getLastError(). As such downing your write concern to 0 will do nothing.
I already figured out that if only I had inserted empty dates object when I created the database I would now get better performances since there is the problem of data relocation when a new field is added.
I did tell people when they asked about this stuff to add those fields incase of movement but instead they listened to the guy who said "leave them out! They use space!".
I shouldn't feel smug but I do. That's an unfortunately side effect of being right when you were told you were wrong.
mongostats and the web monitoring interfaces which confirmed that write lock is taken 70% of the time
That's because of all the movement in your documents, kinda hard to fix that.
I am a bit disappointed mongodb does not have a more precise granularity on its write lock
The write lock doesn't actually denote the concurrency of MongoDB, this is another common misconception that stems from the transactional SQL technologies.
Write locks in MongoDB are mutexs for one.
Not only that but there are numerous rules which dictate that operations will subside to queued operations under certain circumstances, one being how many operations waiting, another being whether the data is in RAM or not, and more.
Unfortunately I believe you have got yourself stuck in between a rock and hard place and there is no easy way out. This does happen.

How to localize syntax errors in Mathematica?

Is there a trick to localize syntax errors? I occasionally run into this situation when editing existing code. Here's a latest example, do you see an efficient way to find the error? (I found the error already, but not efficiently)
http://yaroslavvb.com/upload/save/syntax-error.nb
May be, I am missing something, but in your example it was quite easy: once I put your code into a notebook and tried to run, the usual orange bracket appeared, and when you expand the messages, it states very clearly that the problem was in an empty parentheses in your vertexFormula function. In my experience, most of the time, the orange box gives enough hints.
Another great way which I use on a daily basis it through the code highlighting in Workbench. It immediately highlights syntax errors, plus you have a very powerful Eclipse-based navigation both for a single package and multiple packages. It might seem as you lose some flexibility by going to Workbench from an interactive FrontEnd development, but I found the opposite (or may be this is the revenge of my enterprise Java background): you still can have your notebook(s) in a Workbench project, where you do the initial development, but then they get attached to the project and a number of packages that you already developed and use. The transition from notebook to a package could not be easier, since you can continue use that notebook after you transfer the code into a package, and you don't even have to worry about loading the package, as long as you do it just once inside a project. Overall, I find the Workbench - based development much more fun as soon as your project goes over some critical mass (I'd say, perhaps around 1000 loc, but ymmv). But, complete new and independent chunks of functionality I still prefer to prototype entirely in the FrontEnd.
If you adhere to some specific coding standards in your code, or work with code which does, you might be able to develop some simple partial parser which at least breaks the code into complete chunks (function defenitions, Module-s etc, CompoundExpressions). Then, you could use ToExpression (say, Map it on a list of string code chunks returned by your partial parser), to see which piece of code is problematic (it will return $Failed on it). But if you use Workbench, this is unnecessary altogether.

How do you reproduce bugs that occur sporadically?

We have a bug in our application that does not occur every time and therefore we don't know its "logic". I don't even get it reproduced in 100 times today.
Disclaimer: This bug exists and I've seen it. It's not a pebkac or something similar.
What are common hints to reproduce this kind of bug?
Analyze the problem in a pair and pair-read the code. Make notes of the problems you KNOW to be true and try to assert which logical preconditions must hold true for this happen. Follow the evidence like a CSI.
Most people instinctively say "add more logging", and this may be a solution. But for a lot of problems this just makes things worse, since logging can change timing-dependencies sufficiently to make the problem more or less frequent. Changing the frequency from 1 in 1000 to 1 in 1,000,000 will not bring you closer to the true source of the problem.
So if your logical reasoning does not solve the problem, it'll probably give you a few specifics you could investigate with logging or assertions in your code.
There is no general good answer to the question, but here is what I have found:
It takes a talent for this kind of thing. Not all developers are best suited for it, even if they are superstars in other areas. So know your team, who has a talent for it, and hope you can give them enough candy to get them excited about helping you out, even if it isn't their area.
Work backwards, and treat it like a scientific investigation. Start with the bug, what you see is wrong. Develop hypotheses about what could cause it (this is the creative/imaginative part, the art that not everyone has the talent for) - and it helps a lot to know how the code works. For each of those hypotheses (preferably sorted by what you think is most likely - again pure gut feel here), develop a test that tries to eliminate it as the cause, and test the hypothesis. Any given failure to meet a prediction doesn't mean the hypothesis is wrong. Test the hypothesis until it is confirmed to be wrong (although as it gets less likely you may want to move on to another hypothesis first, just don't discount this one until you have a definitive failure).
Gather as much data as you can during this process. Extensive logging and whatever else is applicable. Do not discount a hypothesis because you lack the data, rather remedy the lack of data. Quite often the inspiration for the right hypothesis comes from examining the data. Noticing something off in a stack trace, weird issue in a log, something missing that should be there in a database, etc.
Double check every assumption. So many times I have seen an issue not get fixed quickly because some general method call was not further investigated, so the problem was just assumed to be not applicable. "Oh that, that should be simple." (See point 1).
If you run out of hypotheses, that is generally caused by insufficient knowledge of the system (this is true even if you wrote every line of code yourself), and you need to run through and review code and gain additional insight into the system to come up with a new idea.
Of course, none of the above guarantees anything, but that is the approach that I have found gets results consistently.
Add some sort of logging or tracing. For example log the last X actions the user committed before causing the bug (only if you can set a condition to match bug).
It's quite common for programmers not to be able to reiterate a user-experienced crash simply because you have developed a certain workflow and habits in using the application that obviously goes around the bug.
At this frequency of 1/100, I'd say that the first thing to do is to handle exceptions and log anything anywhere or you could be spending another week hunting this bug.
Also make a priority list of potentially sensitive articulations and features in your project. For example :
1 - Multithreading
2 - Wild pointers/ loose arrays
3 - Reliance on input devices
etc.
This will help you segment areas that you can brute-force-until-break-again as suggested by other posters.
Since this is language-agnostic, I'll mention a few axioms of debugging.
Nothing a computer ever does is random. A 'random occurrence' indicates a as-yet-undiscovered pattern. Debugging begins with isolating the pattern. Vary individual elements and assess what makes a change in the behaviour of the bug.
Different user, same computer?
Same user, different computer?
Is the occurrence strongly periodic? Does rebooting change the periodicity?
FYI- I once saw a bug that was experienced by a single person. I literally mean person, not a user account. User A would never see the problem on their system, User B would sit down at that workstation, signed on as User A and could immediately reproduce the bug. There should be no conceivable way for the app to know the difference between the physical body in the chair. However-
The users used the app in different ways. User A habitually used a hotkey to to invoke a action and User B used an on-screen control. The difference in the user behaviour would cascade into a visible error a few actions later.
ANY difference that effects the behaviour of the bug should be investigated, even if it makes no sense.
There's a good chance your application is MTWIDNTBMT (Multi Threaded When It Doesn't Need To Be Multi Threaded), or maybe just multi-threaded (to be polite). A good way to reproduce sporadic errors in multi-threaded applications is to sprinkle code like this around (C#):
Random rnd = new Random();
System.Threading.Thread.Sleep(rnd.Next(2000));
and/or this:
for (int i = 0; i < 4000000000; i++)
{
// tight loop
}
to simulate threads completing their tasks at different times than usual or tying up the processor for long stretches.
I've inherited many buggy, multi-threaded apps over the years, and code like the above examples usually makes the sporadic errors occur much more frequently.
Add verbose logging. It will take multiple -- sometimes dozen(s) -- iterations to add enough logging to understand the scenario.
Now the problem is that if the problem is a race condition, which is likely if it doesn't reproduce reliably, so logging can change timing and the problem will stop happening. In this case do not log to a file, but keep a rotating buffer of the log in memory and only dump it on disk when you detect that the problem has occurred.
Edit: a little more thoughts: if this is a gui application run tests with a qa automation tool which allows you to replay macros. If this is a service-type app, try to come up with at least a guess as to what is happening and then programmatically create 'freak' usage patterns which would exercise the code that you suspect. Create higher than usual loads etc.
What development environment?
For C++, your best bet may be VMWare Workstation record/replay, see:
http://stackframe.blogspot.com/2007/04/workstation-60-and-death-of.html
Other suggestions include inspecting the stack trace, and careful code overview... there is really no silver bullet :)
Try to add code in your app to trace the bug automatically once it happens (or even alert you via mail / SMS)
log whatever you can so when it happens you can catch the right system state.
Another thing- try applying automated testing that can cover more territory than human based testing in a formed manner.. it's a long shot, but a good practice in general.
all the above, plus throw some brute force soft-robot at it that is semi random, and scater a lot of assert/verify (c/c++, probably similar in other langs) through the code
Tons of logging and careful code review are your only options.
These can be especially painful if the app is deployed and you can't adjust the logging. At that point, your only choice is going through the code with a fine-tooth comb and trying to reason about how the program could enter into the bad state (scientific method to the rescue!)
Often these kind of bugs are related to corrupted memory and for that reason they might not appear very often. You should try to run your software with some kind of memory profiler e.g., valgrind, to see if something goes wrong.
Let’s say I’m starting with a production application.
I typically add debug logging around the areas where I think the bug is occurring. I setup the logging statements to give me insight into the state of the application. Then I have the debug log level turned on and ask the user/operator(s) notify me of the time of the next bug occurrence. I then analyze the log to see what hints it gives about the state of the application and if that leads to a better understanding of what could be going wrong.
I repeat step 1 until I have a good idea of where I can start debugging the code in the debugger
Sometimes the number of iterations of the code running is key but other times it maybe the interaction of a component with an outside system (database, specific user machine, operating system, etc.). Take some time to setup a debug environment that matches the production environment as closely as possible. VM technology is a good tool for solving this problem.
Next I proceed via the debugger. This could include creating a test harness of some sort that puts the code/components in the state I’ve observed from the logs. Knowing how to setup conditional break points can save a lot of time, so get familiar with that and other features within your debugger.
Debug, debug , debug. If you’re going nowhere after a few hours, take a break and work on something unrelated for awhile. Come back with a fresh mind and perspective.
If you have gotten nowhere by now, go back to step 1 and make another iteration.
For really difficult problems you may have to resort to installing a debugger on the system where the bug is occurring. That combined with your test harness from step 4 can usually crack the really baffling issues.
Unit Tests. Testing a bug in the app is often horrendous because there is so much noise, so many variable factors. In general the bigger the (hay)stack, the harder it is to pinpoint the issue. Creatively extending your unit test framework to embrace edge cases can save hours or even days of sifting
Having said that there is no silver bullet. I feel your pain.
Add pre and post condition check in methods related to this bug.
You may have a look at Design by contract
Along with a lot of patience, a quiet prayer & cursing you would need:
a good mechanism for logging the user actions
a good mechanism for gathering the data state when the user performs some actions (state in application, database etc.)
Check the server environment (e.g. an anti-virus software running at a particular time etc.) & record the times of the error & see if you can find any trends
some more prayers & cursing...
HTH.
Assuming you're on Windows, and your "bug" is a crash or some sort of corruption in unmanaged code (C/C++), then take a look at Application Verifier from Microsoft. The tool has a number of stops that can be enabled to verify things during runtime. If you have an idea of the scenario where your bug occurs, then try to run through the scenario (or a stress version of the scenario) with AppVerifer running. Make sure to either turn on pageheap in AppVerifier, or consider compiling your code with the /RTCcsu switch (see http://msdn.microsoft.com/en-us/library/8wtf2dfz.aspx for more information).
"Heisenbugs" require great skills to diagnose, and if you want help from people here you have to describe this in much more detail, and patiently listen to various tests and checks, report result here, and iterate this till you solve it (or decide it is too expensive in terms of resources).
You will probably have to tell us your actual situation, language, DB, operative system, workload estimate, time of the day it happened in the past, and a myriad of other things, list tests you did already, how they went, and be ready to do more and share the results.
And this will not guarantee that we collectively can find it, either...
I'd suggest to write down all things that user has been doing. If you have lets say 10 such bug reports You can try to find something that connects them.
Read the stack trace carefully and try to guess what could be happened;
then try to trace\log every line of code that potentially can cause trouble.
Keep your focus on disposing resources; many sneaky sporadical bugs i found were related to close\dispose things :).
For .NET projects You can use Elmah (Error Logging Modules and Handlers) to monitor you application for un-caught exceptions, it's very simple to install and provides a very nice interface to browse unknown errors
http://code.google.com/p/elmah/
This saved me just today in catching a very random error that was occuring during a registration process
Other than that I can only recommend trying to get as much information from your users as possible and having a thorough understanding of the project workflow
They mostly come out at night....
mostly
The team that I work with has enlisted the users in recording their time they spend in our app with CamStudio when we've got a pesky bug to track down. It's easy to install and for them to use, and makes reproducing those nagging bugs much easier, since you can watch what the users are doing. It also has no relationship to the language you're working in, since it's just recording the windows desktop.
However, this route seems to be viable only if you're developing corporate apps and have good relationships with your users.
This varies (as you say), but some of the things that are handy with this can be
immediately going into the debugger when the problem occurs and dumping all the threads (or the equivalent, such as dumping the core immediately or whatever.)
running with logging turned on but otherwise entirely in release/production mode. (This is possible in some random environments like c and rails but not many others.)
do stuff to make the edge conditions on the machine worse... force low memory / high load / more threads / serving more requests
Making sure that you're actually listening to what the users encountering the problem are actually saying. Making sure that they're actually explaining the relevant details. This seems to be the one that breaks people in the field a lot. Trying to reproduce the wrong problem is boring.
Get used to reading assembly that was produced by optimizing compilers. This seems to stop people sometimes, and it isn't applicable to all languages/platforms, but it can help
Be prepared to accept that it is your (the developer's) fault. Don't get into the trap of insisting the code is perfect.
sometimes you need to actually track the problem down on the machine it is happening on.
#p.marino - not enough rep to comment =/
tl;dr - build failures due to time of day
You mentioned time of day and that caught my eye. Had a bug once were someone stayed later at work on night, tried to build and commit before they left and kept getting a failure. They eventually gave up and went home. When they caught in the next morning it built fine, they committed (probably should have been more suspiscious =] ) and the build worked for everyone. A week or two later someone stayed late and had an unexpected build failure. Turns out there was a bug in the code that made any build after 7PM break >.>
We also found a bug in one seldom used corner of the project this january that caused problems marshalling between different schemas because we were not accounting for the different calendars being 0 AND 1 month based. So if no one had messed with that part of the project we wouldn't have possibly found the bug until jan. 2011
These were easier to fix than threading issues, but still interesting I think.
hire some testers!
This has worked for really weird heisenbugs.
(I'd also recommend getting a copy of "Debugging" by Dave Argans, these ideas are partly derived form using his ideas!)
(0) Check the ram of the system using something like Memtest86!
The whole system exhibits the problem, so make a test jig that exercises the whole thing.
Say it's a server side thing with a GUI, you run the whole thing with a GUI test framework doing the necessary input to provoke the problem.
It doesn't fail 100% of the time, so you have to make it fail more often.
Start by cutting the system in half ( binary chop)
worse case, you have to remove sub-systems one at a time.
stub them out if they can't be commented out.
See if it still fails. Does it fail more often ?
Keep proper test records, and only change one variable at a time!
Worst case you use the jig and you test for weeks to get meaningful statistics. This is HARD; but remember, the jig is doing the work.
I've got No threads and only one process, and I don't talk to hardware
If the system has no threads, no communicating processes and contacts no hardware; it's tricky; heisenbugs are generally synchronization, but in the no-thread no processes case it's more likely to be uninitialized data, or data used after being released, either on the heap or the stack. Try to use a checker like valgrind.
For threaded/multi-process problems:
Try running it on a different number of CPU's. If it's running on 1, try on 4! Try forcing a 4-computer system onto 1.
It'll mostly ensure things happen one at a time.
If there are threads or communicating processes this can shake out bugs.
If this is not helping but you suspect it's synchronization or threading, try changing the OS time-slice size.
Make it as fine as your OS vendor allows!
Sometimes this has made race conditions happen almost every time!
Obversely, try going slower on the timeslices.
Then you set the test jig running with debugger(s) attached all over the place and wait for the test jig to stop on a fault.
If all else fails, put the hardware in the freezer and run it there. The timing of everything will be shifted.
Debugging is hard and time consuming especially if you are unable to deterministically reproduce the problem. My advice to you is to find out the steps to reproduce it deterministically (not just sometimes).
There has been a lot of research in the field of failure reproduction in the past years and is still very active. Record&Replay techniques have been (so far) the research direction of most researchers. This is what you need to do:
1) Analyze the source code and determine what are the sources of non-determinism in the application, that is, what are the aspects that may take your application through different execution paths (e.g. user input, OS signals)
2) Log them in the next time you execute the application
3) When your application fails again, you have the steps-to-reproduce the failure in your log.
If your log still does not reproduce the failure, then you are dealing with a concurrency bug. In that case, you should take a look at how your application accesses shared variables. Do not attempt to record the accesses to shared variables, because you would be logging too much data, thereby causing severe slowdowns and large logs. Unfortunately, there is not much I can say that would help you to reproduce concurrency bugs, because research still has a long way to go in this subject. The best I can do is to provide a reference to the most recent advance (so far) in the topic of deterministic replay of concurrency bugs:
http://www.gsd.inesc-id.pt/~nmachado/software/Symbiosis_Tutorial.html
Best regards
Use an enhanced crash reporter. In the Delphi environment, we have EurekaLog and MadExcept. Other tools exist in other environments. Or you can diagnose the core dump. You're looking for the stack trace, which will show you where it's blowing up, how it got there, what's in memory, etc.. It's also useful to have a screenshot of the app, if it's a user-interaction thing. And info about the machine that it crashed on (OS version and patch, what else is running at the time, etc..) Both of the tools that I mentioned can do this.
If it's something that happens with a few users but you can't reproduce it, and they can, go sit with them and watch. If it's not apparent, switch seats - you "drive", and they tell you what to do. You'll uncover the subtle usability issues that way. double-clicks on a single-click button, for example, initiating re-entrancy in the OnClick event. That sort of thing. If the users are remote, use WebEx, Wink, etc., to record them crashing it, so you can analyze the playback.

Is midiOutPrepareHeader a quick call?

Does midiOutPrepareHeader, midiInPrepareHeader just setup some data fields, or does it do something that is more time intensive?
I am trying to decide whether to build and destroy the MIDIHDR's as needed, or to maintain a pool of them.
You really have only two ways to tell (without the Windows source):
1) Profile it. Depending on your findings for how long it takes, have a debug-only scoped timer that logs when it suddenly takes longer than what you think is acceptable for your application, or do your pool solution. Though the docs say not to modify the buffer once you call the prepare function, and it seems if you wanted to re-use it you may have to modify it. I'm not familiar enough with the docs to say one way or the other if your proposed solution would work.
2) Step through the assembly and see. Don't be afraid. Get the MSFT public symbols and see if it looks like it's just filling out fields or if it's doing something complicated.

Resources