Integer arithmetic errors in modern CPUs [closed] - cpu

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Do I need to plan for possible miscalculations in modern CPUs, where for example an addition of two integers 1 and 1 results in 3 once?
(How often) Do such errors in the ALU occur?
Is there any built-in protection against this nowadays?
Is there a realistic chance that arithmetic errors like mentioned in the example above are the reason behind most "heisenbugs" out there?

CPU feature sizes have gotten small enough that errors like this in data can happen, but they're (much) more likely to happen on data being stored in memory than for an actual miscalculation to happen.
In some radiation-rich environments (e.g., on satellites) it's fairly common to have (for example) multiple CPUs that "vote" on an outcome, or repeat calculations when/if there's a disagreement. Other than that, about the only time it might be reasonable would be in something that was likely to affect human lives.
While it's possible that there's a Heisenbug that's really a result of something like a single-bit upset, it's extremely unlikely, at least IMO. I've seen quite a few bugs, some of which were hard to track down -- but when they were, there were really mistakes in the code.

You should never see errors with integer math. Even with floating point arithmetic it's exceedingly rare, unless someone is using a much older processor or your trying doing something with irrational numbers, incredible precision, and you aren't using a specialized math library.
Are you doing something where you seem integer errors? I'd be interested if you were.

Do I need to plan for possible miscalculations in modern CPUs
Yes. You also need to plan for spontaneous formation of black holes which could suddenly absorb all nearby matter, including you.
Do such errors in the ALU occur?
Well. If only engineers would use error-correcting codes, the odds are very, very small. What would have to happen is that a combination of error bits that happened to look valid would have to spontaneously arise in the circuitry. The odds aren't zero, but they're small.
Is there any built-in protection against this nowadays?
If only Error-correcting codes were not totally forgotten. Remember, "Parity is for farmers".
http://en.wikipedia.org/wiki/Error_detection_and_correction
http://en.wikipedia.org/wiki/Dynamic_random_access_memory#Errors_and_error_correction
http://en.wikipedia.org/wiki/SECDED#Hamming_codes_with_additional_parity_.28SECDED.29
Is there a realistic chance that arithmetic errors like mentioned
Yes. If you define "realistic" as non-zero, but really, really small.
Recent tests give widely varying error
rates with over 7 orders of magnitude
difference, ranging from 10^−10 to 10^−17
error/bit·h,
roughly one bit error,
per hour, per gigabyte of memory to
one bit error, per century, per
gigabyte of memory.

Related

The best way to predict performance without actually porting the code? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I believe there are people with the same experience with me,
where he/she must give a (estimated) performance report of
porting a program from sequential to parallel with some
designated multicore hardwares, with a very few amount of
time given.
For instance, if a 10K LoC sequential program was given and
executes on Intel i7-3770k (not vectorized) in 100 ms, how
long would it take to run if one parallelizes the code to a Tesla
C2075 with NVIDIA CUDA, given that all kinds of
parallelizing optimization techniques were done? (but you're
only given 2-4 days to report the performance? assume that
you didn't know the algorithm at all. Or perhaps it'd be
safer if we just assume that it's an impossible situation
to finish the job)
Therefore, I'm wondering, what most likely be the fastest
way to give such performance report? Is it safe to calculate
solely by the hardware's capability, such as GFLOPs peak and
memory bandwidth rate? Is there a mathematical way to
calculate it? If there is, please prove your method with
the corresponding problem description and the algorithm, and
also the target hardwares' specifications.
Or perhaps there already exists such tool to (roughly)
estimate code porting?
(Please don't the answer: 'kill yourself is the fastest way.')
OK, I'll bite, here's a rule of thumb I just made up:
First calculate the number of Gflops (G floating-point operations per second) that your current architecture and your target architecture can deliver. Next compute the number of Gflop (G floating-point operations) that your benchmark code requires and measure how long it takes to execute. Now calculate the ratio of Gflops that your code consumed to the Gflops your computer delivered, it's probably around 10% for any long-running, numerically-intensive code (the kind that it might be worthwhile porting to a GPU). Now apply that ratio to the target computer Gflops and see how much faster the program might be on the new architecture.
Next, and this is the most important step, throw away all the material you used in making the calculations; under no circumstances must you ever reveal a measurement of a hypothetical speed-up to management, customers, or even your closest relations. If you to, you will have to TWEP them.
I've done a lot of code optimisation-for-performance and am currently managing a team of parallel compute experts improving the performance of a large scientific code. The only commitment I have ever made to management (etc), and the only one you can ever make, is that at the end of the project the code will not be any slower than at the start -- so always build in to your project plan a day at the end to roll back all the changes made if the new version of the code is actually slower.
There are simply too many variables at play to be able to make supportable predictions about improving the performance of a program by moving it to a different platform; the only reliable guide is to port it and measure. For scientific codes, where 80% of the run time is consumed by 20% of the code, you might be able to port only that 20% relatively easily and derive useful measurements from that.
As #BenC has already noted porting to a GPU may, to get the best performance, require a complete rewrite of the code and this leads to my final point -- your question ignores the costs of porting. It's only when you can estimate these that you can start to make informed decisions about whether or not to port. At some stage, though, you're going to have to convince someone that a 3-month effort (say) to port (part of) a code to a new architecture, with no promise of benefits at the end of the work, is a leap in the dark worth taking.

Tips and tricks on improving Fortran code performance [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
As part of my Ph.D. research, I am working on development of numerical models of atmosphere and ocean circulation. These involve numerically solving systems of PDE's on the order of ~10^6 grid points, over ~10^4 time steps. Thus, a typical model simulation takes hours to a few days to complete when run in MPI on dozens of CPUs. Naturally, improving model efficiency as much as possible is important, while making sure the results are byte-to-byte identical.
While I feel quite comfortable with my Fortran programming, and am aware of quite some tricks to make code more efficient, I feel like there is still space to improve, and tricks that I am not aware of.
Currently, I make sure I use as few divisions as possible, and try not to use literal constants (I was taught to do this from very early on, e.g. use half=0.5 instead of 0.5 in actual computations), use as few transcendental functions as possible etc.
What other performance sensitive factors are there? At the moment, I am wondering about a few:
1) Does the order of mathematical operations matter? For example if I have:
a=1E-7 ; b=2E4 ; c=3E13
d=a*b*c
would d evaluate with different efficiency based on the order of multiplication? Nowadays, this must be compiler specific, but is there a straight answer? I notice d getting (slightly) different value based on the order (precision limit), but will this impact the efficiency or not?
2) Passing lots (e.g. dozens) of arrays as arguments to a subroutine versus accessing these arrays from a module within the subroutine?
3) Fortran 95 constructs (FORALL and WHERE) versus DO and IF? I know that these mattered back in the 90's when code vectorization was a big thing, but is there any difference now with modern compilers being able to vectorize explicit DO loops? (I am using PGI, Intel, and IBM compilers in my work)
4) Raising a number to an integer power versus multiplication? E.g.:
b=a**4
or
b=a*a*a*a
I have been taught to always use the latter where possible. Does this affect efficiency and/or precision? (probably compiler dependent as well)
Please discuss and/or add any tricks and tips that you know about improving Fortran code efficiency. What else is out there? If you know anything specific to what each of the compilers above do related to this question, please include that as well.
Added: Note that I do not have any bottlenecks or performance issues per se. I am asking if there are any general rules for optimizing the code in sense of operations.
Thanks!
Sorry but all the tricks you mentioned are simply ... ridiculous. More exactly, they have no meaning in practice. For instance:
what could be the advantage of using half(=0.5) instead of 0.5?
idem for computing a**4 or a*a*a*a. (a*a)** 2 would be another possibility too. My personal taste is a**4 because a good compiler which choose automatically the best way.
For **, the only point which could matter is the difference between a ** 4 and a ** 4., the latter being much more CPU time consuming. But even this point has no sense without a measurement in an actual simulation.
In fact, your approach is wrong. Develop your code as well as possible. After that, measure objectively the cost of the different parts of your code. Optimizing without measuring before is simply non sense.
If a part exhibits a high percentage of the CPU, 50% for instance, don't forget that optimizing that part only cannot divide the cost of the overall code by a factor greater than two. Any way, start the optimization work by the most expensive part (the bottle neck).
Don't forget also that the main improvements are generally coming from better algorithms.
I second the advice that these tricks that you have been taught are silly in this era. Compilers do this for you now; such micro-optimizations are unlikely to make a significant difference and may not be portable. Write clear & understandable code. Carefully select your algorithm. One thing that can make a difference is using indices of multi-dimensions arrays in the correct order ... recasting an M X N array to N X M can help depending on the pattern of data access by your program. After this, if your program is too slow, measure where the CPU is consumed and improve only those parts. Experience shows that guessing is frequently wrong and leads to writing more opaque code for nor reason. If you make a code section in which your program spends 1% of its time twice as fast, it won't make any difference.
Here are previous answers on FORALL and WHERE: How can I ensure that my Fortran FORALL construct is being parallelized? and Do Fortran 95 constructs such as WHERE, FORALL and SPREAD generally result in faster parallel code?
You've got a-priori ideas about what to do, and some of them might actually help,
but the biggest payoff is in a-posteriori anaylsis.
(Added: In other words, getting a*b*c into a different order might save a couple cycles (which I doubt), while at the same time you don't know you're not getting blind-sided by something spending 1000 cycles for no good reason.)
No matter how carefully you code it, there will be opportunities for speedup that you didn't foresee. Here's how I find them. (Some people consider this method controversial).
It's best to start with optimization flags OFF when you do this, so the code isn't all scrambled.
Later you can turn them on and let the compiler do its thing.
Get it running under a debugger with enough of a workload so it runs for a reasonable length of time.
While it's running, manually interrupt it, and take a good hard look at what it's doing and why.
Do this several times, like 10, so you don't draw erroneous conclusions about what it's spending time at.
Here's examples of things you might find:
It could be spending a large fraction of time calling math library functions unnecessarily due to the way some expressions were coded, or with the same argument values as in prior calls.
It could be spending a large fraction of time doing some file I/O, or opening/closing a file, deep inside some routine that seemed harmless to call.
It could be in a general-purpose library function, calling a subordinate subroutine, for the purpose of checking argument flags to the upper function. In such a case, much of that time might be eliminated by writing a special-purpose function and calling that instead.
If you do this entire operation two or three times, you will have removed the stupid stuff that finds its way into any software when it's first written.
After that, you can turn on the optimization, parallelism, or whatever, and be confident no time is being spent on silly stuff.

Proactively using 'lines of code' (LOC) metric in your software-development process? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Codebase size has a lot to do with complexity of a software system (the higher the size the higher the costs for maintenance and extensions). A way to map codebase size is the simple 'lines of code (LOC)' metric (see also blog-entry 'implications of codebase-size').
I wondered how many of you out there are using this metric as a part for retrospective to create awareness (for removing unused functionality or dead code). I think creating awareness that more lines-of-code mean more complexity in maintenance and extension can be valuable.
I am not taking the LOC as fine grained metric (on method or function level), but on subcomponent or complete product level.
I find it a bit useless. Some kinds of functions - user input handling, for example , are going to be a bit long winded no matter what. I'd much rather use some form of complexity metric. Of course, you can combine the two, and/or any other metrics that take your fancy. All you need is a good tool - I use Source Monitor (with whom I have no relationship other than satisfied user) which is free and can do you both LOC and complexity metrics.
I use SM when writing code to make me notice methods that have got too complex. I then go back and take a look at them. About half the time I say, OK, that NEEDS to be that complicated. What I'd really like is (free) tool as good as SM but which also supports a tag list of some sort which says "ignore methods X,Y & Z - they need to be complicated". But I guess that could be dangerous, which is why I have so far not suggested the feature to SM's author.
I'm thinking it could be used to reward the team when the LOC decreases (assuming they are still producing valuable software and readable code...).
Not always true. While it is usually preferable to have a low LOC, it doesn't mean the code is any less complex. In fact, its usually more-so. Code thats been optimized to get the minimal number of cycles can be completely unreadable, even by the person who wrote it a week later.
As an example from a recent project, imagine setting individual color values (RGBa) from a PNG file. You can do this a bunch of ways, the most compact being just 1 line using bitshifts. This is a lot less readable and maintainable then another approach, such as using bitfields, which would take a structure definition and many more lines.
It also depends on the tool doing the LOC calculations. Does it consider lines with just a single symbol on them as code (Ex: { and } in C-style languages)? That definitely doesn't make it more complex, but does make it more readable.
Just my two cents.
LOCs are easy to obtain and deliver reasonable information whithin one not trivial project. My first step in a new project is always counting LOCs.

How often is the performance of a programming language a significant issue? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 13 years ago.
It seems that I often hear people criticize certain programming languages because they "have poor performance", or because some other language is "faster" in general (not necessarily for a specific application). However, my experience and education have taught me that anytime you have a performance problem, at least one of the following is probably happening:
The bottleneck isn't in the CPU, it's in some other device, such as the network or the hard drive.
The poor performance is caused by your algorithms, not by the language you're using.
My general impression is that the speed of a programming language itself is all but irrelevant in the vast majority of cases, with exceptions for serious data processing problems. Even in those cases, I believe you could use a hybrid approach and use a lower-level language only for the CPU-intensive pieces so that you wouldn't lose the benefits of the more abstract language altogether.
Do you agree? Is programming language speed insignificant most of the time, or do the critics have a right to point out language performance issues?
I hope this question isn't too subjective, but it seems to me that there should be a relatively objective answer to this.
Performance can be a serious concern in libraries, operating systems, and the like. However, I believe that upwards of 90% of the time raw performance is irrelevant.
What is more important in many cases is TIMING. Any garbage collected language is going to have some unpredictability in this regard, which makes them unsuited to embedded and realtime design spaces.
The overlap of GC'd and "slow" languages is considerable, and so you may see a language discounted for speed reasons when the real problem is inconsistent timing.
There are some allocation/threading/etc. schemes that allow for garbage collection while also guaranteeing the runtime of parts of the system, such as Realtime Java, though I haven't personally seen it in use anywhere.
Short answer: most of the time the speed of the language is irrelevant (within reason), language choices are made based on familiarity and available libraries.
Amazingly, the performance of a system is a combination of the programming language, the system it's executing on, the operations that system is performing and the external resources (network, disk, slow line printers, etc.) that it relies upon.
If your system is slow, rather than guessing, test it.
If there is any "Rule" in computing, it's "Test your assumptions". Everything else is gross guideline.
This is impossible to answer so broadly. It's like asking if big engines are a waste in cars. Well, for some people, yes. For others, not at all. And all sorts in between.
There are a myriad of factors that come in to play. What is your target environment? End-user deployment or servers? Let's suppose we're talking about web development and coding for a server. RoR is well-known to be (relatively) slow. .NET is pretty fast by comparison. But RoR also has RAD qualities that .NET can't compete with.
Is getting your app up-and-running yesterday more of a priority than scalability?
Does your business model live or die on the milliseconds you serve a page, or the time you went to market?
Does your TCO and application architecture support scaling out or scaling up? Do you even expect to need to scale up?
Those are just a tiny handful of the questions an architect has to answer when making platform/language decisions. Does speed matter? Sometimes. If I am planning to write a LoB service that will eventually need to scale to thousands of transactions per second, and it will be deployed in an enterprise environment, I will probably go with .NET. If I have an idea for a Web 2.0 business like selling Twitter teeshirts, I need to capitalize on that idea yesterday and I can know I probably won't get slammed with enough business to bring the site down before preparing for it.
This is honestly over-simplifying a very complex issue, but hopefully illustrated the point that it's impossible to simply "say" whether it matters or not.
I think it's a good question. To answer it requires having a general framework for thinking about performance, so let me try to provide one. (Some of this is going to sound really obvious, but bear with me.)
To keep things simple,
let's just consider the simple case of applications that have a specific job to do, and that start, and then finish, and what you care about is wall-clock time. Let's assume a standard CPU cycle rate, and a mono-processor.
The time duration consists of a stream of time-slices (nanoseconds, say). To do that job, there is a minimum amount of time required, and it is usually greater than zero. There is no maximum amount of time required. If a program spends longer than the minimum number of nanoseconds, then some of those nanoseconds are being spent, strictly speaking, unnecessarily (i.e. for poor reasons).
So, to optimize a program's execution time, it is necessary to find the nanoseconds it is spending that do not have to be spent (i.e. that do not have good reasons) and remove them.
One way to do this is to, if possible, step through the program and keep track at each step of why it is doing that step. If the reason is not good, there is an opportunity for removing steps.
Another way to do this is to select nanoseconds at random from the program's execution, and inquire their reasons. For example, the program counter can tell you what the program is doing, but the call stack can tell you why. In order for the nanosecond to be spent for a good reason, every call instruction on the call stack has to have a good reason. If any instruction on the call stack does not have a good reason, then there is an opportunity to optimize. In fact, the amount of time that instruction is on the call stack is the amount of time that would be saved by its removal.
In some kinds of software that are highly asynchronous, message-driven, or interpreted, the call stack may not provide enough information. In that case, to answer why a given nanosecond is being spent may be more difficult. It may require examining more state information than just the call stack. For example, in an interpreter, the stack of the program being interpreted may also need to be examined. However, often the hardware call stack does provide sufficient information, so it is a useful thing to examine.
Now, to try to answer your question.
There is such a thing as a "hot spot". This is a small set of addresses that are often at the bottom of the call stack. Nanoseconds spent in that code may or may not have good reasons.
There is such a thing as a "performance problem". This is an instruction that often accounts for why nanoseconds are being spent, but that does not have a good reason. Such an instruction may be in a hot spot. It may also be a subroutine call instruction. (It cannot be both.) It may be an instruction to send a message to be processed later, that does not have a good reason for being spent. To optimize software, such instructions (not functions) are what are being looked for.
Languages, loosely speaking, are either compiled into machine language or interpreted. Interpreted languages are usually 1 or 2 orders of magnitude slower than compiled, because they are constantly re-determining what they need to do. However, roughly speaking, this is only a performance problem if it occurs in a hot spot. If a program spends all its time calling compiled library functions, or waiting for I/O completions, then its speed of execution probably doesn't matter, because most of the nanoseconds are being spent for other reasons.
Now, certainly, any language or program can in principle be highly non-optimal, but in terms of compilers, for hotspot code, they are mostly pretty good, give or take maybe 30%. If there is a background process involved, like garbage collection, that adds an overhead, but it depends on the rate at which the program generates garbage.
So to sum up, the speed of a language matters in hotspot code, but not much elsewhere. When a program has been optimized by removal of all other performance problems, and if the hotspot code is actually seen by the compiler/interpreter, then speed of language matters.
Your question is framed very broadly, so I'll try to give a somewhat narrower answer:
Unless there is some good reason not to do so, the language for a project should always be chosen from among those languages that will help the project team be productive and produce reliable software that can easily be adapted for future needs. The tradeoffs generally favor high-level languages with automatic memory management.
N.B. There are plenty of good reasons to make other choices, such as compatibility with current products and libraries.
It sometimes happens that when a program is too slow, the quickest and easiest way to speed it up is to rewrite the program (or a critical part) in a new language. This happens most often when the implementation language is interpreted and the new language is compiled.
Example: I got about a 4x speedup out of the OSBF-Lua spam filter by rewriting the lexical analysis of the mail headers. By rewriting from Lua to C I not only went from interpreted to compiled but was able to eliminate an array-bounds check for every input character.
To answer your question as stated, it is not very often that language performance per se is an issue.

Should a developer aim for readability or performance first? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Oftentimes a developer will be faced with a choice between two possible ways to solve a problem -- one that is idiomatic and readable, and another that is less intuitive, but may perform better. For example, in C-based languages, there are two ways to multiply a number by 2:
int SimpleMultiplyBy2(int x)
{
return x * 2;
}
and
int FastMultiplyBy2(int x)
{
return x << 1;
}
The first version is simpler to pick up for both technical and non-technical readers, but the second one may perform better, since bit shifting is a simpler operation than multiplication. (For now, let's assume that the compiler's optimizer would not detect this and optimize it, though that is also a consideration).
As a developer, which would be better as an initial attempt?
You missed one.
First code for correctness, then for clarity (the two are often connected, of course!). Finally, and only if you have real empirical evidence that you actually need to, you can look at optimizing. Premature optimization really is evil. Optimization almost always costs you time, clarity, maintainability. You'd better be sure you're buying something worthwhile with that.
Note that good algorithms almost always beat localized tuning. There is no reason you can't have code that is correct, clear, and fast. You'll be unreasonably lucky to get there starting off focusing on `fast' though.
IMO the obvious readable version first, until performance is measured and a faster version is required.
Take it from Don Knuth
Premature optimization is the root of all evil (or at least most of it) in programming.
Readability 100%
If your compiler can't do the "x*2" => "x <<1" optimization for you -- get a new compiler!
Also remember that 99.9% of your program's time is spent waiting for user input, waiting for database queries and waiting for network responses. Unless you are doing the multiple 20 bajillion times, it's not going to be noticeable.
Readability for sure. Don't worry about the speed unless someone complains
In your given example, 99.9999% of the compilers out there will generate the same code for both cases. Which illustrates my general rule - write for readability and maintainability first, and optimize only when you need to.
Readability.
Coding for performance has it's own set of challenges. Joseph M. Newcomer said it well
Optimization matters only when it
matters. When it matters, it matters a
lot, but until you know that it
matters, don't waste a lot of time
doing it. Even if you know it matters,
you need to know where it matters.
Without performance data, you won't
know what to optimize, and you'll
probably optimize the wrong thing.
The result will be obscure, hard to
write, hard to debug, and hard to
maintain code that doesn't solve your
problem. Thus it has the dual
disadvantage of (a) increasing
software development and software
maintenance costs, and (b) having no
performance effect at all.
I would go for readability first. Considering the fact that with the kind of optimized languages and hugely loaded machines we have in these days, most of the code we write in readable way will perform decently.
In some very rare scenarios, where you are pretty sure you are going to have some performance bottle neck (may be from some past bad experiences), and you managed to find some weird trick which can give you huge performance advantage, you can go for that. But you should comment that code snippet very well, which will help to make it more readable.
Readability. The time to optimize is when you get to beta testing. Otherwise you never really know what you need to spend the time on.
A often overlooked factor in this debate is the extra time it takes for a programmer to navigate, understand and modify less readible code. Considering a programmer's time goes for a hundred dollars an hour or more, this is a very real cost.
Any performance gain is countered by this direct extra cost in development.
Putting a comment there with an explanation would make it readable and fast.
It really depends on the type of project, and how important performance is. If you're building a 3D game, then there are usually a lot of common optimizations that you'll want to throw in there along the way, and there's no reason not to (just don't get too carried away early). But if you're doing something tricky, comment it so anybody looking at it will know how and why you're being tricky.
The answer depends on the context. In device driver programming or game development for example, the second form is an acceptable idiom. In business applications, not so much.
Your best bet is to look around the code (or in similar successful applications) to check how other developers do it.
If you're worried about readability of your code, don't hesitate to add a comment to remind yourself what and why you're doing this.
using << would by a micro optimization.
So Hoare's (not Knuts) rule:
Premature optimization is the root of all evil.
applies and you should just use the more readable version in the first place.
This is rule is IMHO often misused as an excuse to design software that can never scale, or perform well.
Both. Your code should balance both; readability and performance. Because ignoring either one will screw the ROI of the project, which in the end of the day is all that matters to your boss.
Bad readability results in decreased maintainability, which results in more resources spent on maintenance, which results in a lower ROI.
Bad performance results in decreased investment and client base, which results in a lower ROI.
Readability is the FIRST target.
In the 1970's the army tested some of the then "new" techniques of software development (top down design, structured programming, chief programmer teams, to name a few) to determine which of these made a statistically significant difference.
THe ONLY technique that made a statistically significant difference in development was...
ADDING BLANK LINES to program code.
The improvement in readability in those pre-structured, pre-object oriented code was the only technique in these studies that improved productivity.
==============
Optimization should only be addressed when the entire project is unit tested and ready for instrumentation. You never know WHERE you need to optimize the code.
In their landmark books Kernigan and Plauger in the late 1970's SOFTWARE TOOLS (1976) and SOFTWARE TOOLS IN PASCAL (1981) showed ways to create structured programs using top down design. They created text processing programs: editors, search tools, code pre-processors.
When the completed text formating function was INSTRUMENTED they discovered that most of the processing time was spent in three routines that performed text input and output ( In the original book, the i-o functions took 89% of the time. In the pascal book, these functions consumed 55%!)
They were able to optimize these THREE routines and produced the results of increased performance with reasonable, manageable development time and cost.
The larger the codebase, the more readability is crucial. Trying to understand some tiny function isn't so bad. (Especially since the Method Name in the example gives you a clue.) Not so great for some epic piece of uber code written by the loner genius who just quit coding because he has finally seen the top of his ability's complexity and it's what he just wrote for you and you'll never ever understand it.
As almost everyone said in their answers, I favor readability. 99 out of 100 projects I run have no hard response time requirements, so it's an easy choice.
Before you even start coding you should already know the answer. Some projects have certain performance requirements, like 'need to be able to run task X in Y (milli)seconds'. If that's the case, you have a goal to work towards and you know when you have to optimize or not. (hopefully) this is determined at the requirements stage of your project, not when writing the code.
Good readability and the ability to optimize later on are a result of proper software design. If your software is of sound design, you should be able to isolate parts of your software and rewrite them if needed, without breaking other parts of the system. Besides, most true optimization cases I've encountered (ignoring some real low level tricks, those are incidental) have been in changing from one algorithm to another, or caching data to memory instead of disk/network.
If there is no readability , it will be very hard to get performance improvement when you really need it.
Performance should be only improved when it is a problem in your program, there are many places would be a bottle neck rather than this syntax. Say you are squishing 1ns improvement on a << but ignored that 10 mins IO time.
Also, regarding readability, a professional programmer should be able to read/understand computer science terms. For example we can name a method enqueue rather than we have to say putThisJobInWorkQueue.
The bitshift versus the multiplication is a trivial optimization that gains next to nothing. And, as has been pointed out, your compiler should do that for you. Other than that, the gain is neglectable anyhow as is the CPU this instruction runs on.
On the other hand, if you need to perform serious computation, you will require the right data structures. But if your problem is complex, finding out about that is part of the solution. As an illustration, consider searching for an ID number in an array of 1000000 unsorted objects. Then reconsider using a binary tree or a hash map.
But optimizations like n << C are usually neglectible and trivial to change to at any point. Making code readable is not.
It depends on the task needed to be solved. Usually readability is more importrant, but there are still some tasks when you shoul think of performance in the first place. And you can't just spend a day or to for profiling and optimization after everything works perfectly, because optimization itself may require rewriting sufficiant part of a code from scratch. But it is not common nowadays.
I'd say go for readability.
But in the given example, I think that the second version is already readable enough, since the name of the function exactly states, what is going on in the function.
If we just always had functions that told us, what they do ...
You should always maximally optimize, performance always counts. The reason we have bloatware today, is that most programmers don't want to do the work of optimization.
Having said that, you can always put comments in where slick coding needs clarification.
There is no point in optimizing if you don't know your bottlenecks. You may have made a function incredible efficient (usually at the expense of readability to some degree) only to find that portion of code hardly ever runs, or it's spending more time hitting the disk or database than you'll ever save twiddling bits.
So you can't micro-optimize until you have something to measure, and then you might as well start off for readability.
However, you should be mindful of both speed and understandability when designing the overall architecture, as both can have a massive impact and be difficult to change (depending on coding style and methedologies).
It is estimated that about 70% of the cost of software is in maintenance. Readability makes a system easier to maintain and therefore brings down cost of the software over its life.
There are cases where performance is more important the readability, that said they are few and far between.
Before sacrifing readability, think "Am I (or your company) prepared to deal with the extra cost I am adding to the system by doing this?"
I don't work at google so I'd go for the evil option. (optimization)
In Chapter 6 of Jon Bentley's "Programming Pearls", he describes how one system had a 400 times speed up by optimizing at 6 different design levels. I believe, that by not caring about performance at these 6 design levels, modern implementors can easily achieve 2-3 orders of magnitude of slow down in their programs.
Readability first. But even more than readability is simplicity, especially in terms of data structure.
I'm reminded of a student doing a vision analysis program, who couldn't understand why it was so slow. He merely followed good programming practice - each pixel was an object, and it worked by sending messages to its neighbors...
check this out
Write for readability first, but expect the readers to be programmers. Any programmer worth his or her salt should know the difference between a multiply and a bitshift, or be able to read the ternary operator where it is used appropriately, be able to look up and understand a complex algorithm (you are commenting your code right?), etc.
Early over-optimization is, of course, quite bad at getting you into trouble later on when you need to refactor, but that doesn't really apply to the optimization of individual methods, code blocks, or statements.
How much does an hour of processor time cost?
How much does an hour of programmer time cost?
IMHO both things have nothing to do. You should first go for code that works, as this is more important than performance or how well it reads. Regarding readability: your code should always be readable in any case.
However I fail to see why code can't be readable and offer good performance at the same time. In your example, the second version is as readable as the first one to me. What is less readable about it? If a programmer doesn't know that shifting left is the same as multiplying by a power of two and shifting right is the same as dividing by a power of two... well, then you have much more basic problems than general readability.

Resources