Related
What is a good workflow for detecting performance regressions in R packages? Ideally, I'm looking for something that integrates with R CMD check that alerts me when I have introduced a significant performance regression in my code.
What is a good workflow in general? What other languages provide good tools? Is it something that can be built on top unit testing, or that is usually done separately?
This is a very challenging question, and one that I'm frequently dealing with, as I swap out different code in a package to speed things up. Sometimes a performance regression comes along with a change in algorithms or implementation, but it may also arise due to changes in the data structures used.
What is a good workflow for detecting performance regressions in R packages?
In my case, I tend to have very specific use cases that I'm trying to speed up, with different fixed data sets. As Spacedman wrote, it's important to have a fixed computing system, but that's almost infeasible: sometimes a shared computer may have other processes that slow things down 10-20%, even when it looks quite idle.
My steps:
Standardize the platform (e.g. one or a few machines, a particular virtual machine, or a virtual machine + specific infrastructure, a la Amazon's EC2 instance types).
Standardize the data set that will be used for speed testing.
Create scripts and fixed intermediate data output (i.e. saved to .rdat files) that involve very minimal data transformations. My focus is on some kind of modeling, rather than data manipulation or transformation. This means that I want to give exactly the same block of data to the modeling functions. If, however, data transformation is the goal, then be sure that the pre-transformed/manipulated data is as close as possible to standard across tests of different versions of the package. (See this question for examples of memoization, cacheing, etc., that can be used to standardize or speed up non-focal computations. It references several packages by the OP.)
Repeat tests multiple times.
Scale the results relative to fixed benchmarks, e.g. the time to perform a linear regression, to sort a matrix, etc. This can allow for "local" or transient variations in infrastructure, such as may be due to I/O, the memory system, dependent packages, etc.
Examine the profiling output as vigorously as possible (see this question for some insights, also referencing tools from the OP).
Ideally, I'm looking for something that integrates with R CMD check that alerts me when I have introduced a significant performance regression in my code.
Unfortunately, I don't have an answer for this.
What is a good workflow in general?
For me, it's quite similar to general dynamic code testing: is the output (execution time in this case) reproducible, optimal, and transparent? Transparency comes from understanding what affects the overall time. This is where Mike Dunlavey's suggestions are important, but I prefer to go further, with a line profiler.
Regarding a line profiler, see my previous question, which refers to options in Python and Matlab for other examples. It's most important to examine clock time, but also very important to track memory allocation, number of times the line is executed, and call stack depth.
What other languages provide good tools?
Almost all other languages have better tools. :) Interpreted languages like Python and Matlab have the good & possibly familiar examples of tools that can be adapted for this purpose. Although dynamic analysis is very important, static analysis can help identify where there may be some serious problems. Matlab has a great static analyzer that can report when objects (e.g. vectors, matrices) are growing inside of loops, for instance. It is terrible to find this only via dynamic analysis - you've already wasted execution time to discover something like this, and it's not always discernible if your execution context is pretty simple (e.g. just a few iterations, or small objects).
As far as language-agnostic methods, you can look at:
Valgrind & cachegrind
Monitoring of disk I/O, dirty buffers, etc.
Monitoring of RAM (Cachegrind is helpful, but you could just monitor RAM allocation, and lots of details about RAM usage)
Usage of multiple cores
Is it something that can be built on top unit testing, or that is usually done separately?
This is hard to answer. For static analysis, it can occur before unit testing. For dynamic analysis, one may want to add more tests. Think of it as sequential design (i.e. from an experimental design framework): if the execution costs appear to be, within some statistical allowances for variation, the same, then no further tests are needed. If, however, method B seems to have an average execution cost greater than method A, then one should perform more intensive tests.
Update 1: If I may be so bold, there's another question that I'd recommend including, which is: "What are some gotchas in comparing the execution time of two versions of a package?" This is analogous to assuming that two programs that implement the same algorithm should have the same intermediate objects. That's not exactly true (see this question - not that I'm promoting my own questions, here - it's just hard work to make things better and faster...leading to multiple SO questions on this topic :)). In a similar way, two executions of the same code can differ in time consumed due to factors other than the implementation.
So, some gotchas that can occur, either within the same language or across languages, within the same execution instance or across "identical" instances, which can affect runtime:
Garbage collection - different implementations or languages can hit garbage collection under different circumstances. This can make two executions appear different, though it can be very dependent on context, parameters, data sets, etc. The GC-obsessive execution will look slower.
Cacheing at the level of the disk, motherboard (e.g. L1, L2, L3 caches), or other levels (e.g. memoization). Often, the first execution will pay a penalty.
Dynamic voltage scaling - This one sucks. When there is a problem, this may be one of the hardest beasties to find, since it can go away quickly. It looks like cacheing, but it isn't.
Any job priority manager that you don't know about.
One method uses multiple cores or does some clever stuff about how work is parceled among cores or CPUs. For instance, getting a process locked to a core can be useful in some scenarios. One execution of an R package may be luckier in this regard, another package may be very clever...
Unused variables, excessive data transfer, dirty caches, unflushed buffers, ... the list goes on.
The key result is: Ideally, how should we test for differences in expected values, subject to the randomness created due to order effects? Well, pretty simple: go back to experimental design. :)
When the empirical differences in execution times are different from the "expected" differences, it's great to have enabled additional system and execution monitoring so that we don't have to re-run the experiments until we're blue in the face.
The only way to do anything here is to make some assumptions. So let us assume an unchanged machine, or else require a 'recalibration'.
Then use a unit-test alike framework, and treat 'has to be done in X units of time' as just yet another testing criterion to be fulfilled. In other words, do something like
stopifnot( timingOf( someExpression ) < savedValue plus fudge)
so we would have to associate prior timings with given expressions. Equality-testing comparisons from any one of the three existing unit testing packages could be used as well.
Nothing that Hadley couldn't handle so I think we can almost expect a new package timr after the next long academic break :). Of course, this has to be either be optional because on a "unknown" machine (think: CRAN testing the package) we have no reference point, or else the fudge factor has to "go to 11" to automatically accept on a new machine.
A recent change announced on the R-devel feed could give a crude measure for this.
CHANGES IN R-devel UTILITIES
‘R CMD check’ can optionally report timings on various parts of the check: this is controlled by environment variables documented in ‘Writing R Extensions’.
See http://developer.r-project.org/blosxom.cgi/R-devel/2011/12/13#n2011-12-13
The overall time spent running the tests could be checked and compared to previous values. Of course, adding new tests will increase the time, but dramatic performance regressions could still be seen, albeit manually.
This is not as fine grained as timing support within individual test suites, but it also does not depend on any one specific test suite.
I am searching for good (preferably plug-and-play) solutions for performing diagnostics on software I am developing. The software I am working on has several components that require extensive computing resources, and so we're attempting to capture the performance of these components for two reasons: 1) estimate required computing resources and thus the costs of running the software, and 2) quantify what an "improvement" is for the component (i.e. if we modify the code and speed increases, then it's an improvement). Our application is composed of a search engine plus many other components, and understanding the speed of the search engine is also critical to the end-user.
It seems to be hard to search for a solution since I'm not sure how to properly define my problem. But what I've found so far seems to be basic error logging techniques. A solution whose purpose is to run statistics (e.g. statistical regressions) off of the data would be best. Maybe unit testing frameworks have built-in test timers, but we need to capture data from live runs of our application to account for the numerous different scenarios.
So really there are two questions:
1) Is there a predefined solution for these sorts of tests?
2) Is there any good reference for running statistical regressions on this kind of data? Let's say we captured execution time of the script and size of the input data (e.g. query). We can regress time on data size to understand the effect of changing the data size on the execution time. But these sorts of regressions are tricky since it's not clear what all of the relevant variables are. Any reference to analyzing performance data would be excellent, and benefit to many people I believe!
Thanks
Matt
Big apps like these are going to be doing a lot of non-CPU processing,
so to find optimization points
you're going to need wall-clock-based, not CPU-based, sampling.
gprof and some others only sample on CPU time, so they cannot see needless I/O or other system calls.
If you do manage to find and remove CPU-intensive performance problems, the I/O-intensive ones will only become a larger fraction of the time.
Take a look at Zoom.
It's a stack sampler that reports, by line of code, the percent of wall-clock time that line is on the stack.
Any code point worth optimizing will probably be such a line.
It also has a nice butterfly view for browsing the call graph.
(You don't want the call graph as a whole. It will be a meaningless rat's nest.)
Is it possible to write a program that can extract a melody/beat/rhythm provided by a specific instument in a wave (or other music format) file made up of multiple instruments?
Which algorithms could be used for this and what programming language would be best suited to it?
This is a fascinating area. The basic mathematical tool here is the Fourier Transform. To get an idea of how it works, and how challenging it can be, take a look at the analysis of the opening chord to A Hard Day's Night.
An instrument produces a sound signature, just the same way our voices do. There are algorithms out there that can pick a single voice out of a crowd and identify that voice from its signature in a database which is used in forensics. In the exact same way, the sound signature of a single instrument can be picked out of a soundscape (such as your mixed wave) and be used to pick out a beat, or make a copy of that instrument on its own track.
Obviously if you're thinking about making copies of tracks, i.e. to break down the mixed wave into a single track per instrument you're going to be looking at a lot of work. My understanding is that because of the frequency overlaps of instruments, this isn't going to be straightforward by any means... not impossible though as you've already been told.
There's quite an interesting blog post by Comparisonics about sound matching technologies which might be useful as a start for your quest for information: http://www.comparisonics.com/SearchingForSounds.html
To extract the beat or rhythm, you might not need perfect isolation of the instrument you're targeting. A general solution may be hard, but if you're trying to solve it for a particular piece, it may be possible. Try implementing a band-pass filter and see if you can tune it to selects th instrument you're after.
Also, I just found this Mac product called PhotoSounder. They have a blog showing different ways it can be used, including isolating an individual instrument (with manual intervention).
Look into Karaoke machine algorithms. If they can remove voice from a song, I'm sure the same principles can be applied to extract a single instrument.
Most instruments make sound within certain frequency ranges.
If you write a tunable bandpass filter - a filter that only lets a certain frequency range through - it'll be about as close as you're likely to get. It will not be anywhere near perfect; you're asking for black magic. The only way to perfectly extract a single instrument from a track is to have an audio sample of the track without that instrument, and do a difference of the two waveforms.
C, C++, Java, C#, Python, Perl should all be able to do all of this with the right libraries. Which one is "best" depends on what you already know.
It's possible in principle, but very difficult - an open area of research, even. You may be interested in the project paper for Dancing Monkeys, a step generation program for StepMania. It does some fairly sophisticated beat detection and music analysis, which is detailed in the paper (linked near the bottom of that page).
I want to answer questions about data in Erlang: count things, correlate messages, provide arbitrary statistics. I had thought about resorting to Hadoop for this but is it possible to build a solution in raw Erlang to do rather arbitrary data analysis not necessarily via map/reduce but somehow? I have seen some hints of people doing this but no explicit blog posts or examples of this being done. I know that Powerset's natural language capabilities are written in Erlang. I also know about CouchDB but was looking for some other solutions.
Yes.
For general-purpose computation and statistics, Erlang works just fine. It isn't optimized heavily for such work, so it will have trouble keeping up with similar numeric code in, say MatLab, ForTran, or any of the major C package for this work -- but for most uses it will do just fine. And of course if your code parallelizes neatly and you have multiple CPUs available, Erlang will catch up more easily.
(You also mentioned the map/reduce pattern; it is relatively trivial given the Erlang/OTP runtime and libraries.)
I and my colleagues have written plenty of "raw" Erlang to do counting, statistics, and so on. We have found it to be more than sufficient for most tasks.
I need to display profiling information pulled from a deeply embedded CPU, presenting it in a way which other developers on my team will be able to act upon. The profiling data is a snapshot of a cycle counter at the entry and exit of every function, so we have a call graph annotated with sub-microsecond timing accuracy. I'd prefer not to just dump out function names and timing like gprof, I'm looking for something easier to understand and act upon.
Has anyone worked with a particularly good profiling tool (on any platform), which made it easy to identify areas of the code to drill into? I'm looking for an inspirational example to follow for how to display the call graph, but if there is good tool with an input format I can massage my data to I'll use it. I could use Windows, Linux, or MacOS X to run the visualization tool.
A profiling article on IBM DeveloperWorks led me to GraphViz, with a profiling example on their site. Barring another suggestion here, I'll use GraphViz and mimic their profiling example.
Another neat tool to visualize profiling data is the gprof2dot.py python script.
It can be used to visualize several different formats: "This is a Python script to convert the output from prof, gprof, oprofile, Shark, AQtime, and python profilers into a dot graph." This is what the output look like:
(source: googlecode.com)
I use Kprof
http://kprof.sourceforge.net/
it is old, but I never found a better tool to inspect the results from gprof.
How about "GTKWave"?
But you have to insert the probe in your code.
Valgrind does profiling (and more), and there are GUIs for visualization.
I suggest you drop gprof+graphviz for OProfileUI, unless you don't have a choice.
JetBrains dotTrace (has a trial demo you can play with). It organizes the call stacks and can easily find the trouble spots. Has a lot of filtering capabilities as well. Very easy to navigate and find what you're looking for.
IE 8b2 offers a simple display of the call tree for javascript that I believe is much more useful than the GraphViz chart.
The GraphViz chart is wonderful for visualizing the call tree but makes it very difficult to visualize timing issues (IMHO the more important data).
**Edit: I thought it is worth pointing out that all of the tools suggested use a grid based tree to visualize the call tree. This allows you to see the calling structure without downplaying the timing data as I believe you do with the GraphViz chart.*
You can use Senseo, a plugin for Eclipse. It shows you the performance, memory allocation, objects created, time spent, actual methods invoked, hover over method signatures or calls, call context tree, package explorer and more.
I've written a browser-based visualization tool, profile_eye, which operates on the output of gprof2dot.
gprof2dot is great at grokking many profiling-tool outputs, and does a great job at graph-element placement. The final rendering is a static graphic, which is often very cluttered.
Using d3.js it's possible to remove much of that clutter, through relative fading of unfocused elements, tooltips, and a fisheye distortion.
For comparison, see profile_eye's visualization of the canonical example used by gprof2dot.