Related
Since I am a Lone Developer, I have to think about every aspect of the systems I am working on. Lately I've been thinking about performance of my two websites, and ways to improve it. Sites like StackOverflow proclaim, "performance is a feature." However, "premature optimization is the root of all evil," and none of my customers have complained yet about the sites' performance.
My question is, is performance always important? Should performance always be a feature?
Note: I don't think this question is the same as this one, as that poster is asking when to consider performance and I am asking if the answer to that question is always, and if so, why. I also don't think this question should be CW, as I believe there is an answer and reasoning for that answer.
Adequate performance is always important.
Absolute fastest possible performance is almost never important.
It's always worth keeping an eye on performance and being aware of anything outrageously non-optimal that you're doing (particularly at a design/architecture level) but that's not the same as micro-optimising every line of code.
Performance != Optimization.
Performance is a feature indeed, but premature optimization will cost you time and will not yield the same result as when you optimize the parts that need optimization. And you can't really know which parts need optimization until you can actually profile something.
Performance is the feature that your clients will not tell you about if it's missing, unless it's really painfully slow and they're forced to use your product. Existing customers may report it in the end, but new customers will simply not bother if the performance is required.
You need to know what performance you need, and formulate it as a requirement. Then, you have to meet your own requirement.
That 'root of all evil' quote is almost always misused and misunderstood.
Designing your application to perform well can be mostly be done with just good design. Good design != premature optimization, and it's utterly ridiculous to go off writing crap code and blowing off doing a better job on the design as an 'evil' waste. Now, I'm not specifically talking about you here... but I see people do this a lot.
It usually saves you time to do a good job on the design. If you emphasize that, you'll get better at it... and get faster and faster at writing systems that perform well from the start.
Understanding what kinds of structures and access methods work best in certain situations is key here.
Sure, if you're app becomes truly massive or has insane speed requirements you may find yourself doing tricked out optimizations that make your code uglier or harder to maintain... and it would be wrong to do those things before you need to.
But that is absolutely NOT the same thing as making an effort to understand and use the right algorithms or data patterns or whatever in the first place.
Your users are probably not going to complain about bad performance if it's bearable. They possibly wouldn't even know it could be faster. Reacting to complaints as a primary driver is a bad way to operate. Sure, you need to address complaints you receive... but a lack of them does not mean there isn't a problem. The fact that you are considering improving performance is a bit of an indicator right there. Was it just a whim, or is some part of you telling you it should be better? Why did you consider improving it?
Just don't go crazy doing unnecessary stuff.
Keep performance in mind but given your situation it would be unwise to spend too much time up front on it.
Performance is important but it's often hard to know where your bottleneck will be. Therefore I'd suggest planning to dedicate some time to this feature once you've got something to work with.
Thus you need to set up metrics that are important to your clients and you. Keep and analyse these measurements. Then estimate how long and how much each would take to implement. Now you can aim on getting as much bang for you buck/time.
If it's web it would be wise to note your page size and performance using Firebug + yslow and/or google page speed. Again, know what applies to a small site like yours and things that only apply to yahoo and google.
Jackson’s Rules of Optimization:
Rule 1. Don’t do it.
Rule 2 (for experts only). Don’t do it
yet— that is, not until you have a
perfectly clear and unoptimized
solution.
—M. A. Jackson
Extracted from Code Complete 2nd edition.
To give a generalized answer to a general question:
First make it work, then make it right, then make it fast.
http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast
This puts a more constructive perspective on "premature optimization is the root of all evil".
So to parallel Jon Skeet's answer, adequate performance (as part of making something work, and making it right) is always important. Even then it can often be addressed after other functionality.
Jon Skeets 'adequate' nails it, with the additional provision that for a library you don't know yet what's adequate, so it's better to err on the safe side.
It is one of the many stakes you must not get wrong, but the quality of your app is largely determined by the weakest link.
Performance is definitely always important in a certain sense - maybe not the one you mean: namely in all phases of development.
In Big O notation, what's inside the parantheses is largely decided by design - both components isolation and data storage. Choice of algorithm will usually only best/worst case behavior (unless you start with decidedly substandard algorithms). Code optimizations will mostly affect the constant factor - which shouldn't be neglected, either.
But that's true for all aspects of code: in any stage, you have a good chance to fail any aspect - stability, maintainability, compatibility etc. Performance needs to be balanced, so that no aspect is left behind.
In most applications 90% or more of execution time is spend in 10% or less of the code. Usually there is little use in optimizing other code than these 10%.
performance is only important to the extent that developing the performance improvement takes less time than the total amount of time that will be saved for the user(s).
the result is that if you're developing something for millions... yeah it's important to save them time. if you're coding up a tool for your own use... it might be more trouble than it's worth to save a minute or even an hour or more.
(this is clearly not a rule set in stone... there are times when performance is truly critical no matter how much development time it takes)
There should be a balance to everything. Cost (or time to develop) vs Performance for instance. More performance = more cost. If a requirement of the system being built is high performance then the cost should not matter, but if cost is a factor then you optimize within reason. After a while, your return on investment suffers in that more performance does not bring in more returns.
The importance of performance is IMHO highly correlated to your problem set. If you are creating a site with an expectation of a heavy load and lot of server side processing, then you might want to put some more time into performance (otherwise your site might end up being unusable). However, for most applications the the time put into optimizing your perfomance on a website is not going to pay off - users won't notice the difference.
So I guess it breaks down to this:
Will users notice the improvements?
How does this improvement compare to competing sites?
If users will notice AND the improvement would be enough to differentiate you from the competition - performance is an important feature - otherwise not so much. (To a point - I don't recommend ignoring it entirely - you don't want your site to turtle along after all).
No. Fast enough is generally good enough.
It's not necessarily true, however, that your client's ideas about "fast enough" should trump your own. If you think it's fast enough and your client doesn't then yes, you need to accommodate your ideas to theirs. But if you're client thinks it's fast enough and you don't you should seriously consider going with your opinion, no theirs (since you may be more knowledgeable about performance standards in the wider world).
How important performance is depends largely and foremost on what you do.
For example, if you write a library that can be used in any environment, this can hardly ever have too much performance. In some environments, a 10% performance advantage can be a major feature for a library.
If you, OTOH, write an application, there's always a point where it is fast enough. Users won't neither realize nor care whether a button pressed reacts within 0.05 or 0.2 seconds - even though that's a factor of 4.
However, it is always easier to get working code faster, than it is to get fast code working.
No. Performance is not important.
Lack of performance is important.
Performance is something to be designed in from the outset, not tacked on at the end. For the past 15 years I have been working in the performance engineering space and the cause of most project failures that I work on is a lack of requirements on performance. A couple of posts have noted "fast enough" as an observation and whether your expectation matches that of your clients, but what about when you have a situation of your client, your architectural team, your platform engineering team, your functional test team, your performance test team and your operations team all have different expectations on performance, none of which have been committed to stone and measured against. Bad Magic to be certain.
Capture those expectations on the part of your clients. Commit them to a specific, objective, measurable requirement that you can evaluate at each stage of production of your software. Expectations may not be uniform, with one section of your app/code needing to be faster than others, nor will each customer have the same expectations on what is considered acceptable. Having this information will force you to confront decisions in the design and implementation that you may have overlooked in the past and it will result in a product which is a better match to your clients expectations.
I am currently working for a client who are petrified of changing lousy un-testable and un-maintainable code because of "performance reasons". It is clear that there are many misconceptions running rife and reasons are not understood, but merely followed with blind faith.
One such anti-pattern I have come across is the need to mark as many classes as possible as sealed internal...
*RE-Edit: I see marking everything as sealed internal (in C#) as a premature optimisation.*
I am wondering what are some of the other performance anti-patterns people may be aware of or come across?
The biggest performance anti-pattern I have come across is:
Not measuring performance before and
after the changes.
Collecting performance data will show if a certain technique was successful or not. Not doing so will result in pretty useless activities, because someone has the "feeling" of increased performance when nothing at all has changed.
The elephant in the room: Focusing on implementation-level micro-optimization instead of on better algorithms.
Variable re-use.
I used to do this all the time figuring I was saving a few cycles on the declaration and lowering memory footprint. These savings were of minuscule value compared with how unruly it made the code to debug, especially if I ended up moving a code block around and the assumptions about starting values changed.
Premature performance optimizations comes to mind. I tend to avoid performance optimizations at all costs and when I decide I do need them I pass the issue around to my collegues several rounds trying to make sure we put the obfu... eh optimization in the right place.
One that I've run into was throwing hardware at seriously broken code, in an attempt to make it fast enough, sort of the converse of Jeff Atwood's article mentioned in Rulas' comment. I'm not talking about the difference between speeding up a sort that uses a basic, correct algorithm by running it on faster hardware vs. using an optimized algorithm. I'm talking about using a not obviously correct home brewed O(n^3) algorithm when a O(n log n) algorithm is in the standard library. There's also things like hand coding routines because the programmer doesn't know what's in the standard library. That one's very frustrating.
Using design patterns just to have them used.
Using #defines instead of functions to avoid the penalty of a function call.
I've seen code where expansions of defines turned out to generate huge and really slow code. Of course it was impossible to debug as well. Inline functions is the way to do this, but they should be used with care as well.
I've seen code where independent tests has been converted into bits in a word that can be used in a switch statement. Switch can be really fast, but when people turn a series of independent tests into a bitmask and starts writing some 256 optimized special cases they'd better have a very good benchmark proving that this gives a performance gain. It's really a pain from maintenance point of view and treating the different tests independently makes the code much smaller which is also important for performance.
Lack of clear program structure is the biggest code-sin of them all. Convoluted logic that is believed to be fast almost never is.
Do not refactor or optimize while writing your code. It is extremely important not to try to optimize your code before you finish it.
Julian Birch once told me:
"Yes but how many years of running the application does it actually take to make up for the time spent by developers doing it?"
He was referring to the cumulative amount of time saved during each transaction by an optimisation that would take a given amount of time to implement.
Wise words from the old sage... I often think of this advice when considering doing a funky optimisation. You can extend the same notion a little further by considering how much developer time is being spent dealing with the code in its present state versus how much time is saved by the users. You could even weight the time by hourly rate of the developer versus the user if you wanted.
Of course, sometimes its impossible to measure, for example, if an e-commerce application takes 1 second longer to respond you will loose some small % money from users getting bored during that 1 second. To make up that one second you need to implement and maintain optimised code. The optimisation impacts gross profit positively, and net profit negatively, so its much harder to balance. You could try - with good stats.
Exploiting your programming language. Things like using exception handling instead of if/else just because in PLSnakish 1.4 it's faster. Guess what? Chances are it's not faster at all and that two years from now someone maintaining your code will get really angry with you because you obfuscated the code and made it run much slower, because in PLSnakish 1.8 the language maintainers fixed the problem and now if/else is 10 times faster than using exception handling tricks. Work with your programming language and framework!
Changing more than one variable at a time. This drives me absolutely bonkers! How can you determine the impact of a change on a system when more than one thing's been changed?
Related to this, making changes that are not warranted by observations. Why add faster/more CPUs if the process isn't CPU bound?
General solutions.
Just because a given pattern/technology performs better in one circumstance does not mean it does in another.
StringBuilder overuse in .Net is a frequent example of this one.
Once I had a former client call me asking for any advice I had on speeding up their apps.
He seemed to expect me to say things like "check X, then check Y, then check Z", in other words, to provide expert guesses.
I replied that you have to diagnose the problem. My guesses might be wrong less often than someone else's, but they would still be wrong, and therefore disappointing.
I don't think he understood.
Some developers believe a fast-but-incorrect solution is sometimes preferable to a slow-but-correct one. So they will ignore various boundary conditions or situations that "will never happen" or "won't matter" in production.
This is never a good idea. Solutions always need to be "correct".
You may need to adjust your definition of "correct" depending upon the situation. What is important is that you know/define exactly what you want the result to be for any condition, and that the code gives those results.
Michael A Jackson gives two rules for optimizing performance:
Don't do it.
(experts only) Don't do it yet.
If people are worried about performance, tell 'em to make it real - what is good performance and how do you test for it? Then if your code doesn't perform up to their standards, at least it's something the code writer and the application user agree on.
If people are worried about non-performance costs of rewriting ossified code (for example, the time sink) then present your estimates and demonstrate that it can be done in the schedule. Assuming it can.
I believe it is a common myth that super lean code "close to the metal" is more performant than an elegant domain model.
This was apparently de-bunked by the creator/lead developer of DirectX, who re-wrote the c++ version in C# with massive improvements. [source required]
Appending to an array using (for example) push_back() in C++ STL, ~= in D, etc. when you know how big the array is supposed to be ahead of time and can pre-allocate it.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
If something is making a single-thread program take, say, 10 times as long as it should, you could run a profiler on it. You could also just halt it with a "pause" button, and you'll see exactly what it's doing.
Even if it's only 10% slower than it should be, if you halt it more times, before long you'll see it repeatedly doing the unnecessary thing. Usually the problem is a function call somewhere in the middle of the stack that isn't really needed. This doesn't measure the problem, but it sure does find it.
Edit: The objections mostly assume that you only take 1 sample. If you're serious, take 10. Any line of code causing some percentage of wastage, like 40%, will appear on the stack on that fraction of samples, on average. Bottlenecks (in single-thread code) can't hide from it.
EDIT: To show what I mean, many objections are of the form "there aren't enough samples, so what you see could be entirely spurious" - vague ideas about chance. But if something of any recognizable description, not just being in a routine or the routine being active, is in effect for 30% of the time, then the probability of seeing it on any given sample is 30%.
Then suppose only 10 samples are taken. The number of times the problem will be seen in 10 samples follows a binomial distribution, and the probability of seeing it 0 times is .028. The probability of seeing it 1 time is .121. For 2 times, the probability is .233, and for 3 times it is .267, after which it falls off. Since the probability of seeing it less than two times is .028 + .121 = .139, that means the probability of seeing it two or more times is 1 - .139 = .861. The general rule is if you see something you could fix on two or more samples, it is worth fixing.
In this case, the chance of seeing it in 10 samples is 86%. If you're in the 14% who don't see it, just take more samples until you do. (If the number of samples is increased to 20, the chance of seeing it two or more times increases to more than 99%.) So it hasn't been precisely measured, but it has been precisely found, and it's important to understand that it could easily be something that a profiler could not actually find, such as something involving the state of the data, not the program counter.
On Java servers it's always been a neat trick to do 2-3 quick Ctrl-Breakss in a row and get 2-3 threaddumps of all running threads. Simply looking at where all the threads "are" may extremely quickly pinpoint where your performance problems are.
This technique can reveal more performance problems in 2 minutes than any other technique I know of.
Because sometimes it works, and sometimes it gives you completely wrong answers. A profiler has a far better record of finding the right answer, and it usually gets there faster.
Doing this manually can't really be called "quick" or "effective", but there are several profiling tools which do this automatically; also known as statistical profiling.
Callstack sampling is a very useful technique for profiling, especially when looking at a large, complicated codebase that could be spending its time in any number of places. It has the advantage of measuring the CPU's usage by wall-clock time, which is what matters for interactivity, and getting callstacks with each sample lets you see why a function is being called. I use it a lot, but I use automated tools for it, such as Luke Stackwalker and OProfile and various hardware-vendor-supplied things.
The reason I prefer automated tools over manual sampling for the work I do is statistical power. Grabbing ten samples by hand is fine when you've got one function taking up 40% of runtime, because on average you'll get four samples in it, and always at least one. But you need more samples when you have a flat profile, with hundreds of leaf functions, none taking more than 1.5% of the runtime.
Say you have a lake with many different kinds of fish. If 40% of the fish in the lake are salmon (and 60% "everything else"), then you only need to catch ten fish to know there's a lot of salmon in the lake. But if you have hundreds of different species of fish, and each species is individually no more than 1%, you'll need to catch a lot more than ten fish to be able to say "this lake is 0.8% salmon and 0.6% trout."
Similarly in the games I work on, there are several major systems each of which call dozens of functions in hundreds of different entities, and all of this happens 60 times a second. Some of those functions' time funnels into common operations (like malloc), but most of it doesn't, and in any case there's no single leaf that occupies more than 1000 μs per frame.
I can look at the trunk functions and see, "we're spending 10% of our time on collision", but that's not very helpful: I need to know exactly where in collision, so I know which functions to squeeze. Just "do less collision" only gets you so far, especially when it means throwing out features. I'd rather know "we're spending an average 600 μs/frame on cache misses in the narrow phase of the octree because the magic missile moves so fast and touches lots of cells," because then I can track down the exact fix: either a better tree, or slower missiles.
Manual sampling would be fine if there were a big 20% lump in, say, stricmp, but with our profiles that's not the case. Instead I have hundreds of functions that I need to get from, say, 0.6% of frame to 0.4% of frame. I need to shave 10 μs off every 50 μs function that is called 300 times per second. To get that kind of precision, I need more samples.
But at heart what Luke Stackwalker does is what you describe: every millisecond or so, it halts the program and records the callstack (including the precise instruction and line number of the IP). Some programs just need tens of thousands of samples to be usefully profiled.
(We've talked about this before, of course, but I figured this was a good place to summarize the debate.)
There's a difference between things that programmers actually do, and things that they recommend others do.
I know of lots of programmers (myself included) that actually use this method. It only really helps to find the most obvious of performance problems, but it's quick and dirty and it works.
But I wouldn't really tell other programmers to do it, because it would take me too long to explain all the caveats. It's far too easy to make an inaccurate conclusion based on this method, and there are many areas where it just doesn't work at all. (for example, that method doesn't reveal any code that is triggered by user input).
So just like using lie detectors in court, or the "goto" statement, we just don't recommend that you do it, even though they all have their uses.
I'm surprised by the religous tone on both sides.
Profiling is great, and certainly is a more refined and precise when you can do it. Sometimes you can't, and it's nice to have a trusty back-up. The pause technique is like the manual screwdriver you use when your power tool is too far away or the bateries have run-down.
Here is a short true story. An application (kind of a batch proccessing task) had been running fine in production for six months, suddenly the operators are calling developers because it is going "too slow". They aren't going to let us attach a sampling profiler in production! You have to work with the tools already installed. Without stopping the production process, just using Process Explorer, (which operators had already installed on the machine) we could see a snapshot of a thread's stack. You can glance at the top of the stack, dismiss it with the enter key and get another snapshot with another mouse click. You can easily get a sample every second or so.
It doesn't take long to see if the top of the stack is most often in the database client library DLL (waiting on the database), or in another system DLL (waiting for a system operation), or actually in some method of the application itself. In this case, if I remember right, we quickly noticed that 8 times out of 10 the application was in a system DLL file call reading or writing a network file. Sure enough recent "upgrades" had changed the performance characteristics of a file share. Without a quick and dirty and (system administrator sanctioned) approach to see what the application was doing in production, we would have spent far more time trying to measure the issue, than correcting the issue.
On the other hand, when performance requirements move beyond "good enough" to really pushing the envelope, a profiler becomes essential so that you can try to shave cycles from all of your closely-tied top-ten or twenty hot spots. Even if you are just trying to hold to a moderate performance requirement durring a project, when you can get the right tools lined-up to help you measure and test, and even get them integrated into your automated test process it can be fantasticly helpful.
But when the power is out (so to speak) and the batteries are dead, it's nice know how to use that manual screwdriver.
So the direct answer: Know what you can learn from halting the program, but don't be afraid of precision tools either. Most importantly know which jobs call for which tools.
Hitting the pause button during the execution of a program in "debug" mode might not provide the right data to perform any performance optimizations. To put it bluntly, it is a crude form of profiling.
If you must avoid using a profiler, a better bet is to use a logger, and then apply a slowdown factor to "guesstimate" where the real problem is. Profilers however, are better tools for guesstimating.
The reason why hitting the pause button in debug mode, may not give a real picture of application behavior is because debuggers introduce additional executable code that can slowdown certain parts of the application. One can refer to Mike Stall's blog post on possible reasons for application slowdown in a debugging environment. The post sheds light on certain reasons like too many breakpoints,creation of exception objects, unoptimized code etc. The part about unoptimized code is important - the "debug" mode will result in a lot of optimizations (usually code in-lining and re-ordering) being thrown out of the window, to enable the debug host (the process running your code) and the IDE to synchronize code execution. Therefore, hitting pause repeatedly in "debug" mode might be a bad idea.
If we take the question "Why isn't it better known?" then the answer is going to be subjective. Presumably the reason why it is not better known is because profiling provides a long term solution rather than a current problem solution. It isn't effective for multi-threaded applications and isn't effective for applications like games which spend a significant portion of its time rendering.
Furthermore, in single threaded applications if you have a method that you expect to consume the most run time, and you want to reduce the run-time of all other methods then it is going to be harder to determine which secondary methods to focus your efforts upon first.
Your process for profiling is an acceptable method that can and does work, but profiling provides you with more information and has the benefit of showing you more detailed performance improvements and regressions.
If you have well instrumented code then you can examine more than just the how long a particular method; you can see all the methods.
With profiling:
You can then rerun your scenario after each change to determine the degree of performance improvement/regression.
You can profile the code on different hardware configurations to determine if your production hardware is going to be sufficient.
You can profile the code under load and stress testing scenarios to determine how the volume of information impacts performance
You can make it easier for junior developers to visualise the impacts of their changes to your code because they can re-profile the code in six months time while you're off at the beach or the pub, or both. Beach-pub, ftw.
Profiling is given more weight because enterprise code should always have some degree of profiling because of the benefits it gives to the organisation of an extended period of time. The more important the code the more profiling and testing you do.
Your approach is valid and is another item is the toolbox of the developer. It just gets outweighed by profiling.
Sampling profilers are only useful when
You are monitoring a runtime with a small number of threads. Preferably one.
The call stack depth of each thread is relatively small (to reduce the incredible overhead in collecting a sample).
You are only concerned about wall clock time and not other meters or resource bottlenecks.
You have not instrumented the code for management and monitoring purposes (hence the stack dump requests)
You mistakenly believe removing a stack frame is an effective performance improvement strategy whether the inherent costs (excluding callees) are practically zero or not
You can't be bothered to learn how to apply software performance engineering day-to-day in your job
....
Stack trace snapshots only allow you to see stroboscopic x-rays of your application. You may require more accumulated knowledge which a profiler may give you.
The trick is knowing your tools well and choose the best for the job at hand.
These must be some trivial examples that you are working with to get useful results with your method. I can't think of a project where profiling was useful (by whatever method) that would have gotten decent results with your "quick and effective" method. The time it takes to start and stop some applications already puts your assertion of "quick" in question.
Again, with non-trivial programs the method you advocate is useless.
EDIT:
Regarding "why isn't it better known"?
In my experience code reviews avoid poor quality code and algorithms, and profiling would find these as well. If you wish to continue with your method that is great - but I think for most of the professional community this is so far down on the list of things to try that it will never get positive reinforcement as a good use of time.
It appears to be quite inaccurate with small sample sets and to get large sample sets would take lots of time that would have been better spent with other useful activities.
What if the program is in production and being used at the same time by paying clients or colleagues. A profiler allows you to observe without interferring (as much, because of course it will have a little hit too as per the Heisenberg principle).
Profiling can also give you much richer and more detailed accurate reports. This will be quicker in the long run.
EDIT 2008/11/25: OK, Vineet's response has finally made me see what the issue is here. Better late than never.
Somehow the idea got loose in the land that performance problems are found by measuring performance. That is confusing means with ends. Somehow I avoided this by single-stepping entire programs long ago. I did not berate myself for slowing it down to human speed. I was trying to see if it was doing wrong or unnecessary things. That's how to make software fast - find and remove unnecessary operations.
Nobody has the patience for single-stepping these days, but the next best thing is to pick a number of cycles at random and ask what their reasons are. (That's what the call stack can often tell you.) If a good percentage of them don't have good reasons, you can do something about it.
It's harder these days, what with threading and asynchrony, but that's how I tune software - by finding unnecessary cycles. Not by seeing how fast it is - I do that at the end.
Here's why sampling the call stack cannot give a wrong answer, and why not many samples are needed.
During the interval of interest, when the program is taking more time than you would like, the call stack exists continuously, even when you're not sampling it.
If an instruction I is on the call stack for fraction P(I) of that time, removing it from the program, if you could, would save exactly that much. If this isn't obvious, give it a bit of thought.
If the instruction shows up on M = 2 or more samples, out of N, its P(I) is approximately M/N, and is definitely significant.
The only way you can fail to see the instruction is to magically time all your samples for when the instruction is not on the call stack. The simple fact that it is present for a fraction of the time is what exposes it to your probes.
So the process of performance tuning is a simple matter of picking off instructions (mostly function call instructions) that raise their heads by turning up on multiple samples of the call stack. Those are the tall trees in the forest.
Notice that we don't have to care about the call graph, or how long functions take, or how many times they are called, or recursion.
I'm against obfuscation, not against profilers. They give you lots of statistics, but most don't give P(I), and most users don't realize that that's what matters.
You can talk about forests and trees, but for any performance problem that you can fix by modifying code, you need to modify instructions, specifically instructions with high P(I). So you need to know where those are, preferably without playing Sherlock Holmes. Stack sampling tells you exactly where they are.
This technique is harder to employ in multi-thread, event-driven, or systems in production. That's where profilers, if they would report P(I), could really help.
Stepping through code is great for seeing the nitty-gritty details and troubleshooting algorithms. It's like looking at a tree really up close and following each vein of bark and branch individually.
Profiling lets you see the big picture, and quickly identify trouble points -- like taking a step backwards and looking at the whole forest and noticing the tallest trees. By sorting your function calls by length of execution time, you can quickly identify the areas that are the trouble points.
I used this method for Commodore 64 BASIC many years ago. It is surprising how well it works.
I've typically used it on real-time programs that were overrunning their timeslice. You can't manually stop and restart code that has to run 60 times every second.
I've also used it to track down the bottleneck in a compiler I had written. You wouldn't want to try to break such a program manually, because you really have no way of knowing if you are breaking at the spot where the bottlenck is, or just at the spot after the bottleneck when the OS is allowed back in to stop it. Also, what if the major bottleneck is something you can't do anything about, but you'd like to get rid of all the other largeish bottlenecks in the system? How to you prioritize which bottlenecks to attack first, when you don't have good data on where they all are, and what their relative impact each is?
The larger your program gets, the more useful a profiler will be. If you need to optimize a program which contains thousands of conditional branches, a profiler can be indispensible. Feed in your largest sample of test data, and when it's done import the profiling data into Excel. Then you check your assumptions about likely hot spots against the actual data. There are always surprises.
Let's just assume for now that you have narrowed down where the typical bottlenecks in your app are. For all you know, it might be the batch process you run to reindex your tables; it could be the SQL queries that runs over your effective-dated trees; it could be the XML marshalling of a few hundred composite objects. In other words, you might have something like this:
public Result takeAnAnnoyingLongTime(Input in) {
// impl of above
}
Unfortunately, even after you've identified your bottleneck, all you can do is chip away at it. No simple solution is available.
How do you measure the performance of your bottleneck so that you know your fixes are headed in the right direction?
Two points:
Beware of the infamous "optimizing the idle loop" problem. (E.g. see the optimization story under the heading "Porsche-in-the-parking-lot".) That is, just because a routine is taking a significant amount of time (as shown by your profiling), don't assume that it's responsible for slow performance as perceived by the user.
The biggest performance gains often come not from that clever tweak or optimization to the implementation of the algorithm, but from realising that there's a better algorithm altogether. Some improvements are relatively obvious, while others require more detailed analysis of the algorithms, and possibly a major change to the data structures involved. This may include trading off processor time for I/O time, in which case you need to make sure that you're not optimizing only one of those measures.
Bringing it back to the question asked, make sure that whatever you're measuring represents what the user actually experiences, otherwise your efforts could be a complete waste of time.
Profile it
Find the top line in the profiler, attempt to make it faster.
Profile it
If it worked, go to 1. If it didn't work, go to 2.
I'd measure them using the same tools / methods that allowed me to find them in the first place.
Namely, sticking timing and logging calls all over the place. If the numbers start going down, then you just might be doing the right thing.
As mentioned in this msdn column, performance tuning is compared to the job of painting Golden Gate Bridge: once you finish painting the entire thing, it's time to go back to the beginning and start again.
This is not a hard problem. The first thing you need to understand is that measuring performance is not how you find performance problems. Knowing how slow something is doesn't help you find out why. You need a diagnostic tool, and a good one. I've had a lot of experience doing this, and this is the best method. It is not automatic, but it runs rings around most profilers.
It's an interesting question. I don't think anyone knows the answer. I believe that significant part of the problem is that for more complicated programs, no one can predict their complexity. Therefore, even if you have profiler results, it's very complicated to interpret it in terms of changes that should be made to the program, because you have no theoretical basis for what the optimal solution is.
I think this is a reason why we have so bloated software. We optimize only so that quite simple cases would work on our fast machines. But once you put such pieces together into a large system, or you use order of magnitude larger input, wrong algorithms used (which were until then invisible both theoretically and practically) will start showing their true complexity.
Example: You create a string class, which handles Unicode. You use it somewhere like computer-generated XML processing where it really doesn't matter. But Unicode processing is in there, taking part of the resources. By itself, the string class can be very fast, but call it million times, and the program will be slow.
I believe that most of the current software bloat is of this nature. There is a way to reduce it, but it contradicts OOP. There is an interesting book There is an interesting book about various techniques, it's memory oriented but most of them could be reverted to get more speed.
I'd identify two things:
1) what complexity is it? The easiest way is to graph time-taken verses size of input.
2) how is it bound? Is it memory, or disk, or IPC with other processes or machines, or..
Now point (2) is the easier to tackle and explain: Lots of things go faster if you have more RAM or a faster machine or faster disks or move over to gig ethernet or such. If you identify your pain, you can put some money into hardware to make it tolerable.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Oftentimes a developer will be faced with a choice between two possible ways to solve a problem -- one that is idiomatic and readable, and another that is less intuitive, but may perform better. For example, in C-based languages, there are two ways to multiply a number by 2:
int SimpleMultiplyBy2(int x)
{
return x * 2;
}
and
int FastMultiplyBy2(int x)
{
return x << 1;
}
The first version is simpler to pick up for both technical and non-technical readers, but the second one may perform better, since bit shifting is a simpler operation than multiplication. (For now, let's assume that the compiler's optimizer would not detect this and optimize it, though that is also a consideration).
As a developer, which would be better as an initial attempt?
You missed one.
First code for correctness, then for clarity (the two are often connected, of course!). Finally, and only if you have real empirical evidence that you actually need to, you can look at optimizing. Premature optimization really is evil. Optimization almost always costs you time, clarity, maintainability. You'd better be sure you're buying something worthwhile with that.
Note that good algorithms almost always beat localized tuning. There is no reason you can't have code that is correct, clear, and fast. You'll be unreasonably lucky to get there starting off focusing on `fast' though.
IMO the obvious readable version first, until performance is measured and a faster version is required.
Take it from Don Knuth
Premature optimization is the root of all evil (or at least most of it) in programming.
Readability 100%
If your compiler can't do the "x*2" => "x <<1" optimization for you -- get a new compiler!
Also remember that 99.9% of your program's time is spent waiting for user input, waiting for database queries and waiting for network responses. Unless you are doing the multiple 20 bajillion times, it's not going to be noticeable.
Readability for sure. Don't worry about the speed unless someone complains
In your given example, 99.9999% of the compilers out there will generate the same code for both cases. Which illustrates my general rule - write for readability and maintainability first, and optimize only when you need to.
Readability.
Coding for performance has it's own set of challenges. Joseph M. Newcomer said it well
Optimization matters only when it
matters. When it matters, it matters a
lot, but until you know that it
matters, don't waste a lot of time
doing it. Even if you know it matters,
you need to know where it matters.
Without performance data, you won't
know what to optimize, and you'll
probably optimize the wrong thing.
The result will be obscure, hard to
write, hard to debug, and hard to
maintain code that doesn't solve your
problem. Thus it has the dual
disadvantage of (a) increasing
software development and software
maintenance costs, and (b) having no
performance effect at all.
I would go for readability first. Considering the fact that with the kind of optimized languages and hugely loaded machines we have in these days, most of the code we write in readable way will perform decently.
In some very rare scenarios, where you are pretty sure you are going to have some performance bottle neck (may be from some past bad experiences), and you managed to find some weird trick which can give you huge performance advantage, you can go for that. But you should comment that code snippet very well, which will help to make it more readable.
Readability. The time to optimize is when you get to beta testing. Otherwise you never really know what you need to spend the time on.
A often overlooked factor in this debate is the extra time it takes for a programmer to navigate, understand and modify less readible code. Considering a programmer's time goes for a hundred dollars an hour or more, this is a very real cost.
Any performance gain is countered by this direct extra cost in development.
Putting a comment there with an explanation would make it readable and fast.
It really depends on the type of project, and how important performance is. If you're building a 3D game, then there are usually a lot of common optimizations that you'll want to throw in there along the way, and there's no reason not to (just don't get too carried away early). But if you're doing something tricky, comment it so anybody looking at it will know how and why you're being tricky.
The answer depends on the context. In device driver programming or game development for example, the second form is an acceptable idiom. In business applications, not so much.
Your best bet is to look around the code (or in similar successful applications) to check how other developers do it.
If you're worried about readability of your code, don't hesitate to add a comment to remind yourself what and why you're doing this.
using << would by a micro optimization.
So Hoare's (not Knuts) rule:
Premature optimization is the root of all evil.
applies and you should just use the more readable version in the first place.
This is rule is IMHO often misused as an excuse to design software that can never scale, or perform well.
Both. Your code should balance both; readability and performance. Because ignoring either one will screw the ROI of the project, which in the end of the day is all that matters to your boss.
Bad readability results in decreased maintainability, which results in more resources spent on maintenance, which results in a lower ROI.
Bad performance results in decreased investment and client base, which results in a lower ROI.
Readability is the FIRST target.
In the 1970's the army tested some of the then "new" techniques of software development (top down design, structured programming, chief programmer teams, to name a few) to determine which of these made a statistically significant difference.
THe ONLY technique that made a statistically significant difference in development was...
ADDING BLANK LINES to program code.
The improvement in readability in those pre-structured, pre-object oriented code was the only technique in these studies that improved productivity.
==============
Optimization should only be addressed when the entire project is unit tested and ready for instrumentation. You never know WHERE you need to optimize the code.
In their landmark books Kernigan and Plauger in the late 1970's SOFTWARE TOOLS (1976) and SOFTWARE TOOLS IN PASCAL (1981) showed ways to create structured programs using top down design. They created text processing programs: editors, search tools, code pre-processors.
When the completed text formating function was INSTRUMENTED they discovered that most of the processing time was spent in three routines that performed text input and output ( In the original book, the i-o functions took 89% of the time. In the pascal book, these functions consumed 55%!)
They were able to optimize these THREE routines and produced the results of increased performance with reasonable, manageable development time and cost.
The larger the codebase, the more readability is crucial. Trying to understand some tiny function isn't so bad. (Especially since the Method Name in the example gives you a clue.) Not so great for some epic piece of uber code written by the loner genius who just quit coding because he has finally seen the top of his ability's complexity and it's what he just wrote for you and you'll never ever understand it.
As almost everyone said in their answers, I favor readability. 99 out of 100 projects I run have no hard response time requirements, so it's an easy choice.
Before you even start coding you should already know the answer. Some projects have certain performance requirements, like 'need to be able to run task X in Y (milli)seconds'. If that's the case, you have a goal to work towards and you know when you have to optimize or not. (hopefully) this is determined at the requirements stage of your project, not when writing the code.
Good readability and the ability to optimize later on are a result of proper software design. If your software is of sound design, you should be able to isolate parts of your software and rewrite them if needed, without breaking other parts of the system. Besides, most true optimization cases I've encountered (ignoring some real low level tricks, those are incidental) have been in changing from one algorithm to another, or caching data to memory instead of disk/network.
If there is no readability , it will be very hard to get performance improvement when you really need it.
Performance should be only improved when it is a problem in your program, there are many places would be a bottle neck rather than this syntax. Say you are squishing 1ns improvement on a << but ignored that 10 mins IO time.
Also, regarding readability, a professional programmer should be able to read/understand computer science terms. For example we can name a method enqueue rather than we have to say putThisJobInWorkQueue.
The bitshift versus the multiplication is a trivial optimization that gains next to nothing. And, as has been pointed out, your compiler should do that for you. Other than that, the gain is neglectable anyhow as is the CPU this instruction runs on.
On the other hand, if you need to perform serious computation, you will require the right data structures. But if your problem is complex, finding out about that is part of the solution. As an illustration, consider searching for an ID number in an array of 1000000 unsorted objects. Then reconsider using a binary tree or a hash map.
But optimizations like n << C are usually neglectible and trivial to change to at any point. Making code readable is not.
It depends on the task needed to be solved. Usually readability is more importrant, but there are still some tasks when you shoul think of performance in the first place. And you can't just spend a day or to for profiling and optimization after everything works perfectly, because optimization itself may require rewriting sufficiant part of a code from scratch. But it is not common nowadays.
I'd say go for readability.
But in the given example, I think that the second version is already readable enough, since the name of the function exactly states, what is going on in the function.
If we just always had functions that told us, what they do ...
You should always maximally optimize, performance always counts. The reason we have bloatware today, is that most programmers don't want to do the work of optimization.
Having said that, you can always put comments in where slick coding needs clarification.
There is no point in optimizing if you don't know your bottlenecks. You may have made a function incredible efficient (usually at the expense of readability to some degree) only to find that portion of code hardly ever runs, or it's spending more time hitting the disk or database than you'll ever save twiddling bits.
So you can't micro-optimize until you have something to measure, and then you might as well start off for readability.
However, you should be mindful of both speed and understandability when designing the overall architecture, as both can have a massive impact and be difficult to change (depending on coding style and methedologies).
It is estimated that about 70% of the cost of software is in maintenance. Readability makes a system easier to maintain and therefore brings down cost of the software over its life.
There are cases where performance is more important the readability, that said they are few and far between.
Before sacrifing readability, think "Am I (or your company) prepared to deal with the extra cost I am adding to the system by doing this?"
I don't work at google so I'd go for the evil option. (optimization)
In Chapter 6 of Jon Bentley's "Programming Pearls", he describes how one system had a 400 times speed up by optimizing at 6 different design levels. I believe, that by not caring about performance at these 6 design levels, modern implementors can easily achieve 2-3 orders of magnitude of slow down in their programs.
Readability first. But even more than readability is simplicity, especially in terms of data structure.
I'm reminded of a student doing a vision analysis program, who couldn't understand why it was so slow. He merely followed good programming practice - each pixel was an object, and it worked by sending messages to its neighbors...
check this out
Write for readability first, but expect the readers to be programmers. Any programmer worth his or her salt should know the difference between a multiply and a bitshift, or be able to read the ternary operator where it is used appropriately, be able to look up and understand a complex algorithm (you are commenting your code right?), etc.
Early over-optimization is, of course, quite bad at getting you into trouble later on when you need to refactor, but that doesn't really apply to the optimization of individual methods, code blocks, or statements.
How much does an hour of processor time cost?
How much does an hour of programmer time cost?
IMHO both things have nothing to do. You should first go for code that works, as this is more important than performance or how well it reads. Regarding readability: your code should always be readable in any case.
However I fail to see why code can't be readable and offer good performance at the same time. In your example, the second version is as readable as the first one to me. What is less readable about it? If a programmer doesn't know that shifting left is the same as multiplying by a power of two and shifting right is the same as dividing by a power of two... well, then you have much more basic problems than general readability.