Do you find cyclomatic complexity a useful measure? - refactoring

I've been playing around with measuring the cyclomatic complexity of a big code base.
Cyclomatic complexity is the number of linearly independent paths through a program's source code and there are lots of free tools for your language of choice.
The results are interesting but not surprising. That is, the parts I know to be the hairiest were in fact the most complex (with a rating of > 50). But what I am finding useful is that a concrete "badness" number is assigned to each method as something I can point to when deciding where to start refactoring.
Do you use cyclomatic complexity? What's the most complex bit of code you found?

We refactor mercilessly, and use Cyclomatic complexity as one of the metrics that gets code on our 'hit list'. 1-6 we don't flag for complexity (although it could get questioned for other reasons), 7-9 is questionable, and any method over 10 is assumed to be bad unless proven otherwise.
The worst we've seen was 87 from a monstrous if-else-if chain in some legacy code we had to take over.

Actually, cyclomatic complexity can be put to use beyond just method level thresholds. For starters, one big method with high complexity may be broken into several small methods with lower complexity. But has it really improved the codebase? Granted, you may get somewhat better readability by all those method names. But the total conditional logic hasn't changed. And the total conditional logic can often be reduced by replacing conditionals with polymorphism.
We need a metric that doesn't turn green by mere method decomposition. I call this CC100.
CC100 = 100 * (Total cyclomatic complexity of codebase) / (Total lines of code)

It's useful to me in the same way that big-O is useful: I know what it is, and can use it to get a gut feeling for whether a method is good or bad, but I don't need to compute it for every function I've written.
I think simpler metrics, like LOC, are at least as good in most cases. If a function doesn't fit on one screen, it almost doesn't matter how simple it is. If a function takes 20 parameters and makes 40 local variables, it doesn't matter if its cyclomatic complexity is 1.

Until there is a tool that can work well with C++ templates, and meta-programming techniques, it's not much help in my situation. Anyways just remember that
"not all things that count can be
measured, and not all things that can
be measured count"
Einstein
So remember to pass any information of this type through human filtering too.

We recently started to use it. We use NDepend to do some static code analysis, and it measures cyclomatic complexity. I agree, it's a decent way to identify methods for refactoring.
Sadly, we have seen #'s above 200 for some methods created by our developers offshore.

You'll know complexity when you see it. The main thing this kind of tool is useful for is flagging the parts of the code that were escaping your attention.

I frequently measure the cyclomatic complexity of my code. I've found it helps me spot areas of code that are doing too much. Having a tool point out the hot-spots in my code is much less time consuming than having to read through thousands of lines of code trying to figure out which methods are not following the SRP.
However, I've found that when I do a cyclomatic complexity analysis on other people's code it usually leads to feelings of frustration, angst, and general anger when I find code with cyclomatic complexity in the 100's. What compels people to write methods that have several thousand lines of code in them?!

It's great for help identifying candidates for refactoring, but it's important to keep your judgment around. I'd support kenj0418's ranges for pruning guides.

There's a Java metric called CRAP4J that empirically combines cyclomatic complexity and JUnit test coverage to come up with a single metric. He's been doing research to try and improve his empirical formula. I'm not sure how widespread it is.

Cyclomatic Complexity is just one composant of what could be called Fabricated Complexity. A while back, I wrote an article to summarize several dimensions of code complexity:
Fighting Fabricated Complexity
Tooling is needed to be efficient at handling code complexity. The tool NDepend for .NET code will let you analyze many dimensions of the code complexity including code metrics like:
Cyclomatic Complexity, Nesting Depth, Lack Of Cohesion of Methods, Coverage by Tests...
including dependencies analysis and including a language (Code Query Language) dedicated to ask, what is complex in my code, and to write rule?

Yes, we use it and I have found it useful too. We have a big legacy code base to tame and we found alarming high cyclomatic complexity. (387 in one method!). CC points you directly to areas that are worth to refactor. We use CCCC on C++ code.

I haven't used it in a while, but on a previous project it really helped identify potential trouble spots in someone elses code (wouldn't be mine of course!)
Upon finding the area's to check out, i quickly found numerious problems (also lots of GOTOS would you believe!) with logic and some really strange WTF code.
Cyclomatic complexity is great for showing areas which probably are doing to much and therefore breaking the single responsibilty prinicpal. These's ideally should be broken up into mulitple functions

I'm afraid that for the language of the project for which I would most like metrics like this, LPC, there are not, in fact, lots of free tools for producing it available. So no, not so useful to me.

+1 for kenj0418's hit list values.
The worst I've seen was a 275. There were a couple others over 200 that we were able to refactor down to much smaller CCs; they were still high but it got them pushed further back in line. We didn't have much luck with the 275 beast -- it was (probably still is) a web of if- and switch-statements that was just way too complex. It's only real value is as a step-through when they decide to rebuild the system.
The exceptions to high CC that I was comfortable with were factories; IMO, they are supposed to have a high CC but only if they are only doing simple object creation and returning.

After understanding what it means, I now have started to use it on a "trial" basis. So far I have found it to be useful, because usually high CC goes hand in hand with the Arrow Anti-Pattern, which makes code harder to read and understand. I do not have a fixed number yet, but NDepend is alerting for everything above 5, which looks like a good start to investigate methods.

Related

Comparing algorithmic performance to old methods

I have written a new algorithm for something. Now I need to compare it with existing methods, some of which are old about 10 years.
The idea I had is to look at benchmarks of different processors over the years in order to establish how much faster my processor (i7-920) is than average processor from 2003. Then I would simply divide old methods' execution time by the speedup factor and use those numbers to compare with my own algorithm.
Has something like this been done? So I don't redo the existing work.
Can such a comparison be done some other way?
Are there some scientific papers written about such comparisons which I can reference?
I don't know which of these are possible for you, but here's a list of options I can think of:
Run their implementation side-by-side on your machine against yours.
This is the best option.
Rewrite their implementations and do (1).
You preferably need to compare it against their test to ensure you get vaguely similar results.
Find a library that implements their algorithm (or multiple libraries) and do (1).
I suggest multiple libraries, if possible, since a single one may not have implemented the algorithm efficiently. You may also want to compare these against their test.
Compare the algorithms mathematically.
This may be difficult, but it's not impossible.
Do what you presented.
(a) I would not recommend this as there are other determining factors in your computer other than the processor speed that affect the speed of an algorithm. Getting an equation that perfectly balances these will likely be very difficult.
(b) There is a massive difference between top and bottom of the line computers, so using the average is not a particularly good idea. If the author didn't provide details regarding this, I'm afraid your benchmark is not likely to be too accurate.
Go out and buy a machine of similar specs to the one used by the desired test to benchmark on.
A 10-year-old machine should be pretty cheap, if you can find one. Also, see (5.b).
Contact the author to allow for any of the other options.
Papers often provide contact details of the authors, or you should be able to find them elsewhere if they have any sort of online presence and you're half-decent at using Google.
If I were reviewing your results, I would be annoyed if you attempted to demonstrate less than an order of magnitude speedup this way. There are a lot of variables determining algorithm performance, and I would be skeptical that a generic benchmark could capture the right ones. My gold standard is old and new algorithms implemented by the same programmer, with similar effort made to optimize, running on the same hardware. Using the previous authors' implementation instead of making a new one is commonplace in the experimental algorithms literature, but using different hardware isn't.
Algorithmic performance is usually measured in big-O terms, for which it is better to count basic operations, like comparisons, and do it for a range of input sizes.
If you must measure overall time, at least eliminate other sources of difference.
As #larsmans said, do it on the same processor.
Also, if there is existing work, there's no harm in repeating it.
Generally, in science, that's a good thing.
You should attempt to reduce the amount of differing factors between the two runs. I think just run-timing the two algorithms side by side on the same machine and/or comparing their Big O times are both equally valid and important. You should also attempt to use updated libraries and other external functions; using outdated ones my also be the cause of timing results.

Most hazardous performance bottleneck misconceptions

The guys who wrote Bespin (cloud-based canvas-based code editor [and more]) recently spoke about how they re-factored and optimize a portion of the Bespin code because of a misconception that JavaScript was slow. It turned out that when all was said and done, their optimization produced no significant improvements.
I'm sure many of us go out of our way to write "optimized" code based on misconceptions similar to that of the Bespin team.
What are some common performance bottleneck misconceptions developers commonly subscribe to?
In no particular order:
"Ready, Fire, Aim" - thinking you know what needs to be optimized without proving it (i.e. guessing) and then acting on that, and since it doesn't help much, therefore assuming the code must have been optimal to begin with.
"Penny Wise, Pound Foolish" - thinking that optimization is all about compiler optimization, fussing about ++i vs. i++ while mountains of time are being spent needlessly in overblown designs, especially of data structures and databases.
"Swat Flies With a Bazooka" - being so enamored of the fanciest ideas heard in classrooms that they are just used for everything, regardless of scale.
"Fuzzy Thinking about Performance" - throwing around terms like "hotspot" and "bottleneck" and "profiler" and "measure" as if these things were well understood and / or relevant. (I bet I get clobbered for that!) OK, one at a time:
hotspot - What's the definition? I have one: it is a region of physical addresses where the PC register is found a significant fraction of time. It is the kind of thing PC samplers are good at finding. Many performance problems exhibit hotspots, but only in the simplest programs is the problem in the same place as the hotspot is.
bottleneck - A catch-all term used for performance problems, it implies a limited channel constraining how fast work can be accomplished. The unstated assumption is that the work is necessary. In my decades of performance tuning, I have in fact found a few problems like that - very few. Nearly all are of a very different nature. Rather than taking the shortest route from point A to B, little detours are taken, in the form of function calls which take little code, but not little time. Then those detours take further nested detours, sometimes 30 levels deep. The more the detours are nested, the more likely it is that some of them are less than necessary - wasteful, in fact - and it nearly always arises from galloping generality - unquestioning over-indulgence in "abstraction".
profiler - a universal good thing, right? All you gotta do is get a profiler and do some profiling, right? Ever think about how easy it is to fool a profiler into telling you a lot of nothing, when your goal is to find out what you need to fix to get better performance? Somewhere in your call tree, tuck a little file I/O, or a little call to some system routine, or have your evil twin do it without your knowledge. At some point, that will be your problem, and most profilers will miss it completely because the only inefficiency they contemplate is algorithmic inefficiency. Or, not all your routines will be little, and they may not call another routine in a small number of places, so your call graph says there's a link between the two routines, but which call? Or suppose you can figure out that some big percent of time is spent in a path A calls B calls C. You can look at that and think there's not much you can do about it, when if you could also look at the data being passed in those calls, you could see if it's necesssary. Here's a fun project - pick your favorite profiler, and then see how many ways you could fool it. That is, find ways to make the program take longer without the profiler being able to tell what you did, because if you can do it intentionally, you can also do it without intending to.
measure - (i.e. measure time) that's what profilers have done for decades, and they take pride in the accuracy and precision with which they measure. But measure what time? and why with accuracy? Remember, the goal is to precisely locate performance problems, such that you could fruitfully optimize them to gain a speedup. When you get that speedup, it is what it is, regardless of how precisely you estimated it beforehand. If that precision of measurement is bought at the expense of precision of location, then you've just bought apples when what you needed was oranges.
Here's a list of myths about performance.
And this is what happens when one optimizes without a valid profile in hand. All you are doing without a profile is guessing and probably wasting time and money. I could list a bunch of other misconceptions, but many come down to the fact that if the code in question isn't a top resource consumer, it is probably fine as is. Kinda like unrolling a for loop that is doing disk I/O...
If I convert the whole code base over to [Insert xxx latest technology here], it'll be much faster.
Relational databases are slow.
I'm smarter than the optimizer.
This should be optimized.
Java is slow
And, unrelated:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
-jwz
Optimizing the WRONG part of the code (people, use your profiler!).
The optimizer in my compiler is smart, so I don't have to help it.
Java is fast (LOL)
Relational databases are fast (ROTFL LOL LMAO)
"This has to be as fast as possible."
If you don't have a performance problem, you don't need to worry about optimizing performance (beyond just paying attention to using good algorithms).
This misconception also manifests in attempts to optimize performance for every single aspect of your program. I see this most often with people trying to shave every last millisecond off of a low-volume web application's execution time while failing to take into account that network latency will take orders of magnitude longer than their code's execution time, making any reduction in execution time irrelevant anyhow.
My rules of optimization.
Don't optimize
Don't optimize now.
Profile to identify the problem.
Optimize the component that is taking at least 80% of the time.
Find an optimization that is 10 times faster.
My best optimization has been reducing a report from 3 days to 9 minutes. The optimized code was sped up from three days to three minutes.
In my carreer I have met three people who had been tasked with producing a faster sort on VAX than the native sort. They invariably had been able to produce sorts that took only three times longer.
The rules are simple:
Try to use standard library functions first.
Try to use brute-force and ignorance second.
Prove you've got a problem before trying to do any optimization.

Do you use Big-O complexity evaluation in the 'real world'?

Recently in an interview I was asked several questions related to the Big-O of various algorithms that came up in the course of the technical questions. I don't think I did very well on this... In the ten years since I took programming courses where we were asked to calculate the Big-O of algorithms I have not have one discussion about the 'Big-O' of anything I have worked on or designed. I have been involved in many discussions with other team members and with the architects I have worked with about the complexity and speed of code, but I have never been part of a team that actually used Big-O calculations on a real project. The discussions are always "is there a better or more efficient way to do this given our understanding of out data?" Never "what is the complexity of this algorithm"?
I was wondering if people actually have discussions about the "Big-O" of their code in the real word?
It's not so much using it, it's more that you understand the implications.
There are programmers who do not realise the consequence of using an O(N^2) sorting algorithm.
I doubt many apart from those working in academia would use Big-O Complexity Analysis in anger day-to-day.
No needless n-squared
In my experience you don't have many discussions about it, because it doesn't need discussing. In practice, in my experience, all that ever happens is you discover something is slow and see that it's O(n^2) when in fact it could be O(n log n) or O(n), and then you go and change it. There's no discussion other than "that's n-squared, go fix it".
So yes, in my experience you do use it pretty commonly, but only in the basest sense of "decrease the order of the polynomial", and not in some highly tuned analysis of "yes, but if we switch to this crazy algorithm, we'll increase from logN down to the inverse of Ackerman's function" or some such nonsense. Anything less than a polynomial, and the theory goes out the window and you switch to profiling (e.g. even to decide between O(n) and O(n log n), measure real data).
Big-O notation is rather theoretical, while in practice, you are more interested in actual profiling results which give you a hard number as to how your performance is.
You might have two sorting algorithms which by the book have O(n^2) and O(nlogn) upper bounds, but profiling results might show that the more efficient one might have some overhead (which is not reflected in the theoretical bound you found for it) and for the specific problem set you are dealing with, you might choose the theoretically-less-efficient sorting algorithm.
Bottom line: in real life, profiling results usually take precedence over theoretical runtime bounds.
I do, all the time. When you have to deal with "large" numbers, typically in my case: users, rows in database, promotion codes, etc., you have to know and take into account the Big-O of your algorithms.
For example, an algorithm that generates random promotion codes for distribution could be used to generate billions of codes... Using a O(N^2) algorithm to generate unique codes means weeks of CPU time, whereas a O(N) means hours.
Another typical example is queries in code (bad!). People look up a table then perform a query for each row... this brings up the order to N^2. You can usually change the code to use SQL properly and get orders of N or NlogN.
So, in my experience, profiling is useful ONLY AFTER the correct class of algorithms is used. I use profiling to catch bad behaviours like understanding why a "small" number bound application under-performs.
The answer from my personal experience is - No. Probably the reason is that I use only simple, well understood algorithms and data structures. Their complexity analysis is already done and published, decades ago. Why we should avoid fancy algorithms is better explained by Rob Pike here. In short, a practitioner almost never have to invent new algorithms and as a consequence almost never have to use Big-O.
Well that doesn't mean that you should not be proficient in Big-O. A project might demand the design and analysis of an altogether new algorithm. For some real-world examples, please read the "war stories" in Skiena's The Algorithm Design Manual.
To the extent that I know that three nested for-loops are probably worse than one nested for-loop. In other words, I use it as a reference gut feeling.
I have never calculated an algorithm's Big-O outside of academia. If I have two ways to approach a certain problem, if my gut feeling says that one will have a lower Big-O than the other one, I'll probably instinctively take the smaller one, without further analysis.
On the other hand, if I know for certain the size of n that comes into my algorithm, and I know for certain it to be relatively small (say, under 100 elements), I might take the most legible one (I like to know what my code does even one month after it has been written). After all, the difference between 100^2 and 100^3 executions is hardly noticeable by the user with today's computers (until proven otherwise).
But, as others have pointed out, the profiler has the last and definite word: If the code I write executes slowly, I trust the profiler more than any theoretical rule, and fix accordingly.
I try to hold off optimizations until profiling data proves they are needed. Unless, of course, it is blatently obvious at design time that one algorithm will be more efficient than the other options (without adding too much complexity to the project).
Yes, I use it. And no, it's not often "discussed", just like we don't often discuss whether "orderCount" or "xyz" is a better variable name.
Usually, you don't sit down and analyze it, but you develop a gut feeling based on what you know, and can pretty much estimate the O-complexity on the fly in most cases.
I typically give it a moment's thought when I have to perform a lot of list operations. Am I doing any needless O(n^2) complexity stuff, that could have been done in linear time? How many passes am I making over the list? It's not something you need to make a formal analysis of, but without knowledge of big-O notation, it becomes a lot harder to do accurately.
If you want your software to perform acceptably on larger input sizes, then you need to consider the big-O complexity of your algorithms, formally or informally. Profiling is great for telling you how the program performs now, but if you're using a O(2^n) algorithm, your profiler will tell you that everything is just fine as long as your input size is tiny. And then your input size grows, and runtime explodes.
People often dismiss big-O notation as "theoretical", or "useless", or "less important than profiling". Which just indicates that they don't understand what big-O complexity is for. It solves a different problem than a profiler does. Both are essential in writing software with good performance. But profiling is ultimately a reactive tool. It tells you where your problem is, once the problem exists.
Big-O complexity proactively tells you which parts of your code are going to blow up if you run it on larger inputs. A profiler can not tell you that.
No. I don't use Big-O complexity in 'real world' situations.
My view on the whole issue is this - (maybe wrong.. but its just my take.)
The Big-O complexity stuff is ultimately to understand how efficient an algorithm is. If from experience or by other means, you understand the algorithms you are dealing with, and are able to use the right algo in the right place, thats all that matters.
If you know this Big-O stuff and are able to use it properly, well and good.
If you don't know to talk about algos and their efficiencies in the mathematical way - Big-O stuff, but you know what really matters - the best algo to use in a situation - thats perfectly fine.
If you don't know either, its bad.
Although you rarely need to do deep big-o analysis of a piece of code, it's important to know what it means and to be able to quickly evaluate the complexity of the code you're writing and the consequences it might have.
At development time you often feel like it's "good enough". Eh, no-one will ever put more than 100 elements in this array right ? Then, one day, someone will put 1000 elements in the array (trust users on that: if the code allows it, one of them will do it). And that n^2 algorithm that was good enough now is a big performance problem.
It's sometimes usefull the other way around: if you know that you functionaly have to make n^2 operations and the complexity of your algorithm happens to be n^3, there might be something you can do about it to make it n^2. Once it's n^2, you'll have to work on smaller optimizations.
In the contrary, if you just wrote a sorting algorithm and find out it has a linear complexity, you can be sure that there's a problem with it. (Of course, in real life, occasions were you have to write your own sorting algorithm are rare, but I once saw someone in an interview who was plainly satisfied with his one single for loop sorting algorithm).
Yes, for server-side code, one bottle-neck can mean you can't scale, because you get diminishing returns no matter how much hardware you throw at a problem.
That being said, there are often other reasons for scalability problems, such as blocking on file- and network-access, which are much slower than any internal computation you'll see, which is why profiling is more important than BigO.

Performance anti patterns

I am currently working for a client who are petrified of changing lousy un-testable and un-maintainable code because of "performance reasons". It is clear that there are many misconceptions running rife and reasons are not understood, but merely followed with blind faith.
One such anti-pattern I have come across is the need to mark as many classes as possible as sealed internal...
*RE-Edit: I see marking everything as sealed internal (in C#) as a premature optimisation.*
I am wondering what are some of the other performance anti-patterns people may be aware of or come across?
The biggest performance anti-pattern I have come across is:
Not measuring performance before and
after the changes.
Collecting performance data will show if a certain technique was successful or not. Not doing so will result in pretty useless activities, because someone has the "feeling" of increased performance when nothing at all has changed.
The elephant in the room: Focusing on implementation-level micro-optimization instead of on better algorithms.
Variable re-use.
I used to do this all the time figuring I was saving a few cycles on the declaration and lowering memory footprint. These savings were of minuscule value compared with how unruly it made the code to debug, especially if I ended up moving a code block around and the assumptions about starting values changed.
Premature performance optimizations comes to mind. I tend to avoid performance optimizations at all costs and when I decide I do need them I pass the issue around to my collegues several rounds trying to make sure we put the obfu... eh optimization in the right place.
One that I've run into was throwing hardware at seriously broken code, in an attempt to make it fast enough, sort of the converse of Jeff Atwood's article mentioned in Rulas' comment. I'm not talking about the difference between speeding up a sort that uses a basic, correct algorithm by running it on faster hardware vs. using an optimized algorithm. I'm talking about using a not obviously correct home brewed O(n^3) algorithm when a O(n log n) algorithm is in the standard library. There's also things like hand coding routines because the programmer doesn't know what's in the standard library. That one's very frustrating.
Using design patterns just to have them used.
Using #defines instead of functions to avoid the penalty of a function call.
I've seen code where expansions of defines turned out to generate huge and really slow code. Of course it was impossible to debug as well. Inline functions is the way to do this, but they should be used with care as well.
I've seen code where independent tests has been converted into bits in a word that can be used in a switch statement. Switch can be really fast, but when people turn a series of independent tests into a bitmask and starts writing some 256 optimized special cases they'd better have a very good benchmark proving that this gives a performance gain. It's really a pain from maintenance point of view and treating the different tests independently makes the code much smaller which is also important for performance.
Lack of clear program structure is the biggest code-sin of them all. Convoluted logic that is believed to be fast almost never is.
Do not refactor or optimize while writing your code. It is extremely important not to try to optimize your code before you finish it.
Julian Birch once told me:
"Yes but how many years of running the application does it actually take to make up for the time spent by developers doing it?"
He was referring to the cumulative amount of time saved during each transaction by an optimisation that would take a given amount of time to implement.
Wise words from the old sage... I often think of this advice when considering doing a funky optimisation. You can extend the same notion a little further by considering how much developer time is being spent dealing with the code in its present state versus how much time is saved by the users. You could even weight the time by hourly rate of the developer versus the user if you wanted.
Of course, sometimes its impossible to measure, for example, if an e-commerce application takes 1 second longer to respond you will loose some small % money from users getting bored during that 1 second. To make up that one second you need to implement and maintain optimised code. The optimisation impacts gross profit positively, and net profit negatively, so its much harder to balance. You could try - with good stats.
Exploiting your programming language. Things like using exception handling instead of if/else just because in PLSnakish 1.4 it's faster. Guess what? Chances are it's not faster at all and that two years from now someone maintaining your code will get really angry with you because you obfuscated the code and made it run much slower, because in PLSnakish 1.8 the language maintainers fixed the problem and now if/else is 10 times faster than using exception handling tricks. Work with your programming language and framework!
Changing more than one variable at a time. This drives me absolutely bonkers! How can you determine the impact of a change on a system when more than one thing's been changed?
Related to this, making changes that are not warranted by observations. Why add faster/more CPUs if the process isn't CPU bound?
General solutions.
Just because a given pattern/technology performs better in one circumstance does not mean it does in another.
StringBuilder overuse in .Net is a frequent example of this one.
Once I had a former client call me asking for any advice I had on speeding up their apps.
He seemed to expect me to say things like "check X, then check Y, then check Z", in other words, to provide expert guesses.
I replied that you have to diagnose the problem. My guesses might be wrong less often than someone else's, but they would still be wrong, and therefore disappointing.
I don't think he understood.
Some developers believe a fast-but-incorrect solution is sometimes preferable to a slow-but-correct one. So they will ignore various boundary conditions or situations that "will never happen" or "won't matter" in production.
This is never a good idea. Solutions always need to be "correct".
You may need to adjust your definition of "correct" depending upon the situation. What is important is that you know/define exactly what you want the result to be for any condition, and that the code gives those results.
Michael A Jackson gives two rules for optimizing performance:
Don't do it.
(experts only) Don't do it yet.
If people are worried about performance, tell 'em to make it real - what is good performance and how do you test for it? Then if your code doesn't perform up to their standards, at least it's something the code writer and the application user agree on.
If people are worried about non-performance costs of rewriting ossified code (for example, the time sink) then present your estimates and demonstrate that it can be done in the schedule. Assuming it can.
I believe it is a common myth that super lean code "close to the metal" is more performant than an elegant domain model.
This was apparently de-bunked by the creator/lead developer of DirectX, who re-wrote the c++ version in C# with massive improvements. [source required]
Appending to an array using (for example) push_back() in C++ STL, ~= in D, etc. when you know how big the array is supposed to be ahead of time and can pre-allocate it.

Should a developer aim for readability or performance first? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Oftentimes a developer will be faced with a choice between two possible ways to solve a problem -- one that is idiomatic and readable, and another that is less intuitive, but may perform better. For example, in C-based languages, there are two ways to multiply a number by 2:
int SimpleMultiplyBy2(int x)
{
return x * 2;
}
and
int FastMultiplyBy2(int x)
{
return x << 1;
}
The first version is simpler to pick up for both technical and non-technical readers, but the second one may perform better, since bit shifting is a simpler operation than multiplication. (For now, let's assume that the compiler's optimizer would not detect this and optimize it, though that is also a consideration).
As a developer, which would be better as an initial attempt?
You missed one.
First code for correctness, then for clarity (the two are often connected, of course!). Finally, and only if you have real empirical evidence that you actually need to, you can look at optimizing. Premature optimization really is evil. Optimization almost always costs you time, clarity, maintainability. You'd better be sure you're buying something worthwhile with that.
Note that good algorithms almost always beat localized tuning. There is no reason you can't have code that is correct, clear, and fast. You'll be unreasonably lucky to get there starting off focusing on `fast' though.
IMO the obvious readable version first, until performance is measured and a faster version is required.
Take it from Don Knuth
Premature optimization is the root of all evil (or at least most of it) in programming.
Readability 100%
If your compiler can't do the "x*2" => "x <<1" optimization for you -- get a new compiler!
Also remember that 99.9% of your program's time is spent waiting for user input, waiting for database queries and waiting for network responses. Unless you are doing the multiple 20 bajillion times, it's not going to be noticeable.
Readability for sure. Don't worry about the speed unless someone complains
In your given example, 99.9999% of the compilers out there will generate the same code for both cases. Which illustrates my general rule - write for readability and maintainability first, and optimize only when you need to.
Readability.
Coding for performance has it's own set of challenges. Joseph M. Newcomer said it well
Optimization matters only when it
matters. When it matters, it matters a
lot, but until you know that it
matters, don't waste a lot of time
doing it. Even if you know it matters,
you need to know where it matters.
Without performance data, you won't
know what to optimize, and you'll
probably optimize the wrong thing.
The result will be obscure, hard to
write, hard to debug, and hard to
maintain code that doesn't solve your
problem. Thus it has the dual
disadvantage of (a) increasing
software development and software
maintenance costs, and (b) having no
performance effect at all.
I would go for readability first. Considering the fact that with the kind of optimized languages and hugely loaded machines we have in these days, most of the code we write in readable way will perform decently.
In some very rare scenarios, where you are pretty sure you are going to have some performance bottle neck (may be from some past bad experiences), and you managed to find some weird trick which can give you huge performance advantage, you can go for that. But you should comment that code snippet very well, which will help to make it more readable.
Readability. The time to optimize is when you get to beta testing. Otherwise you never really know what you need to spend the time on.
A often overlooked factor in this debate is the extra time it takes for a programmer to navigate, understand and modify less readible code. Considering a programmer's time goes for a hundred dollars an hour or more, this is a very real cost.
Any performance gain is countered by this direct extra cost in development.
Putting a comment there with an explanation would make it readable and fast.
It really depends on the type of project, and how important performance is. If you're building a 3D game, then there are usually a lot of common optimizations that you'll want to throw in there along the way, and there's no reason not to (just don't get too carried away early). But if you're doing something tricky, comment it so anybody looking at it will know how and why you're being tricky.
The answer depends on the context. In device driver programming or game development for example, the second form is an acceptable idiom. In business applications, not so much.
Your best bet is to look around the code (or in similar successful applications) to check how other developers do it.
If you're worried about readability of your code, don't hesitate to add a comment to remind yourself what and why you're doing this.
using << would by a micro optimization.
So Hoare's (not Knuts) rule:
Premature optimization is the root of all evil.
applies and you should just use the more readable version in the first place.
This is rule is IMHO often misused as an excuse to design software that can never scale, or perform well.
Both. Your code should balance both; readability and performance. Because ignoring either one will screw the ROI of the project, which in the end of the day is all that matters to your boss.
Bad readability results in decreased maintainability, which results in more resources spent on maintenance, which results in a lower ROI.
Bad performance results in decreased investment and client base, which results in a lower ROI.
Readability is the FIRST target.
In the 1970's the army tested some of the then "new" techniques of software development (top down design, structured programming, chief programmer teams, to name a few) to determine which of these made a statistically significant difference.
THe ONLY technique that made a statistically significant difference in development was...
ADDING BLANK LINES to program code.
The improvement in readability in those pre-structured, pre-object oriented code was the only technique in these studies that improved productivity.
==============
Optimization should only be addressed when the entire project is unit tested and ready for instrumentation. You never know WHERE you need to optimize the code.
In their landmark books Kernigan and Plauger in the late 1970's SOFTWARE TOOLS (1976) and SOFTWARE TOOLS IN PASCAL (1981) showed ways to create structured programs using top down design. They created text processing programs: editors, search tools, code pre-processors.
When the completed text formating function was INSTRUMENTED they discovered that most of the processing time was spent in three routines that performed text input and output ( In the original book, the i-o functions took 89% of the time. In the pascal book, these functions consumed 55%!)
They were able to optimize these THREE routines and produced the results of increased performance with reasonable, manageable development time and cost.
The larger the codebase, the more readability is crucial. Trying to understand some tiny function isn't so bad. (Especially since the Method Name in the example gives you a clue.) Not so great for some epic piece of uber code written by the loner genius who just quit coding because he has finally seen the top of his ability's complexity and it's what he just wrote for you and you'll never ever understand it.
As almost everyone said in their answers, I favor readability. 99 out of 100 projects I run have no hard response time requirements, so it's an easy choice.
Before you even start coding you should already know the answer. Some projects have certain performance requirements, like 'need to be able to run task X in Y (milli)seconds'. If that's the case, you have a goal to work towards and you know when you have to optimize or not. (hopefully) this is determined at the requirements stage of your project, not when writing the code.
Good readability and the ability to optimize later on are a result of proper software design. If your software is of sound design, you should be able to isolate parts of your software and rewrite them if needed, without breaking other parts of the system. Besides, most true optimization cases I've encountered (ignoring some real low level tricks, those are incidental) have been in changing from one algorithm to another, or caching data to memory instead of disk/network.
If there is no readability , it will be very hard to get performance improvement when you really need it.
Performance should be only improved when it is a problem in your program, there are many places would be a bottle neck rather than this syntax. Say you are squishing 1ns improvement on a << but ignored that 10 mins IO time.
Also, regarding readability, a professional programmer should be able to read/understand computer science terms. For example we can name a method enqueue rather than we have to say putThisJobInWorkQueue.
The bitshift versus the multiplication is a trivial optimization that gains next to nothing. And, as has been pointed out, your compiler should do that for you. Other than that, the gain is neglectable anyhow as is the CPU this instruction runs on.
On the other hand, if you need to perform serious computation, you will require the right data structures. But if your problem is complex, finding out about that is part of the solution. As an illustration, consider searching for an ID number in an array of 1000000 unsorted objects. Then reconsider using a binary tree or a hash map.
But optimizations like n << C are usually neglectible and trivial to change to at any point. Making code readable is not.
It depends on the task needed to be solved. Usually readability is more importrant, but there are still some tasks when you shoul think of performance in the first place. And you can't just spend a day or to for profiling and optimization after everything works perfectly, because optimization itself may require rewriting sufficiant part of a code from scratch. But it is not common nowadays.
I'd say go for readability.
But in the given example, I think that the second version is already readable enough, since the name of the function exactly states, what is going on in the function.
If we just always had functions that told us, what they do ...
You should always maximally optimize, performance always counts. The reason we have bloatware today, is that most programmers don't want to do the work of optimization.
Having said that, you can always put comments in where slick coding needs clarification.
There is no point in optimizing if you don't know your bottlenecks. You may have made a function incredible efficient (usually at the expense of readability to some degree) only to find that portion of code hardly ever runs, or it's spending more time hitting the disk or database than you'll ever save twiddling bits.
So you can't micro-optimize until you have something to measure, and then you might as well start off for readability.
However, you should be mindful of both speed and understandability when designing the overall architecture, as both can have a massive impact and be difficult to change (depending on coding style and methedologies).
It is estimated that about 70% of the cost of software is in maintenance. Readability makes a system easier to maintain and therefore brings down cost of the software over its life.
There are cases where performance is more important the readability, that said they are few and far between.
Before sacrifing readability, think "Am I (or your company) prepared to deal with the extra cost I am adding to the system by doing this?"
I don't work at google so I'd go for the evil option. (optimization)
In Chapter 6 of Jon Bentley's "Programming Pearls", he describes how one system had a 400 times speed up by optimizing at 6 different design levels. I believe, that by not caring about performance at these 6 design levels, modern implementors can easily achieve 2-3 orders of magnitude of slow down in their programs.
Readability first. But even more than readability is simplicity, especially in terms of data structure.
I'm reminded of a student doing a vision analysis program, who couldn't understand why it was so slow. He merely followed good programming practice - each pixel was an object, and it worked by sending messages to its neighbors...
check this out
Write for readability first, but expect the readers to be programmers. Any programmer worth his or her salt should know the difference between a multiply and a bitshift, or be able to read the ternary operator where it is used appropriately, be able to look up and understand a complex algorithm (you are commenting your code right?), etc.
Early over-optimization is, of course, quite bad at getting you into trouble later on when you need to refactor, but that doesn't really apply to the optimization of individual methods, code blocks, or statements.
How much does an hour of processor time cost?
How much does an hour of programmer time cost?
IMHO both things have nothing to do. You should first go for code that works, as this is more important than performance or how well it reads. Regarding readability: your code should always be readable in any case.
However I fail to see why code can't be readable and offer good performance at the same time. In your example, the second version is as readable as the first one to me. What is less readable about it? If a programmer doesn't know that shifting left is the same as multiplying by a power of two and shifting right is the same as dividing by a power of two... well, then you have much more basic problems than general readability.

Resources