When should I break a function? - refactoring

Its prudent to break a long function into a chief function and helper functions.
I know that the outside the module only chief function will be called, but its long length may prove to be intimidating.
Textbooks put a limit on the number of lines, but I feel that this is too rigid.
P.S. I am programming in Python and need to process incoming, messages. The function returns a tuple containing the message but in Python's internal data types.
So you can see somewhat independent code for each message type.
Duplicate Question
When is a function too long?

I think you need to go about this from the other end of the problem. Think bottom-up. Identify small units of work, as small as possible, and start composing your code that way. You will only run into spaghetti-code issues when you code top-down and don't keep a structured approach.
If you already have spaghetti code and need to refactor, you pretty much have to start over. It is probably more work to break up existing spaghetti code than to rewrite it, and the result may not be as good.
I don't think there should be a hard number for the lines of code in a method either, but well written code does not have methods with more than 5 to 10 lines in the lower layers, and 20 to 30 lines in the business logic. To give you some kind of metric.

I'm not a big fan of breaking a function into multiple functions unnecessarily. It's not a hard and fast thing - if there are things that seem like distinct logical units, then by all means, break those out and think about them separately. But don't just break things out for the sake of some guideline like "one page per function" or "N lines per function".

One good rule of thumb is that if it doesn't fit on a single screen it is worth thinking about splitting it up. But only if it makes sense to split it up, some long functions are perfectly readable and it doesn't make any sense to slavishly split them into multiple functions just for the sake of it.

Never write a function that, when printed on fanfold paper, is taller than you are.

I like the rule of thumb that you should break out the subfunction if you can think of a good domain-relevant name for it.
When someone can understand the top-level function without necessarily having to look up the definition of the sub-function, you've likely made a net gain. (But when you break it down too far, your names start referring to your implementation artifacts rather than the domain)

I was recently discussing this with a friend. He suggested refactoring to separate concerns and I must say I have to agree. That is, one function should do one thing, if it does more than one thing, split it up. If not, let it be together, it makes no sense to split up a function, only to have it obfuscate the meaning. After all, a function is a block of code that does one thing!

The limit in term of number of lines is often impractical becuase it doesn't account for readability well. It's better to try to seperate groups of lines of code that have just a few inputs and just a few outputs and make this a separate functon. It's not always possible - then it's often wise to just leave the code as it is and not to refactor for the sake of refactoring.

Well since I am coding in Python so I have the liberty to write functions inside functions, unlike C, C++ or Java. This i feel is a better choice.

It's not specified. But line should be as low as possible. But you may follow the Role of 30. I follow this in my PHP scripts when needed.
Rule of 30:
“Rule of 30” in Refactoring in Large Software Projects by Martin Lippert and Stephen Roock:
Methods should not have more than an average of 30 code lines.
A class should contain an average of less than 30 methods.
A package/library shouldn’t contain more than 30 classes.
Subsystems should avoid more than 30 packages.
A system more than 30 subsystems may create problem.
If an element consists of more than 30 subelements, it is highly probable that there is a serious problem.

personally I break a function if it either saves total lines or total processing time.
if I only run the helper once per chief function I don't bother

The point is that in principal it's better to have specialiced functions. But where one sets the limit depends very much on
1) the "usual" programming style in certain languages. (one can observe that, object-oriented langauges tend to shorter procedureds than let's say C or the like
2) it depends on your way of programming. Every hard limit must be questioned. IMHO. Overall there will probably some "natural" distribution of programs
3) I think what one should keep on one's mind is that a function should do a certain task take for example some function for parsing it is usually much longer than a function just settin some field in a structure. Or getting back just consider how a event loop in the Windows API may look. So that all suggests that there may be good reasons for long methods...

If there is independent code (in your case specifics for each message type) those areas should be broken out.

Size matters not. Judge me by my size do you? - Yoda
Your main concerns are readability, simplicity and maintainability. A good indicator is if you need to write comments to explain a section of a function then that section is a good candidate for a separate function.

There are many reasons to break a long function into its constituent pieces. Most important is:
readability
maintainability
code clarity/intent
Some functions simple cannot be broken into smaller pieces without negatively impacting the listed goals, so there is no hard-and-fast rule.

If you didn't write it and it's already in production: NEVER!!! If you break it up, you're likely to break it, it's that simple.
If you are writing it and you're not sure, the on screen rule apples as others have said.

Related

How small should functions be? [duplicate]

This question already has answers here:
When is a function too long? [closed]
(24 answers)
Closed 5 years ago.
How small should I be making functions? For example, if I have a cake baking program.
bakeCake(){
if(cakeType == "chocolate")
fetchIngredients("chocolate")
else
if(cakeType == "plain")
fetchIngredients("plain")
else
if(cakeType == "Red velvet")
fetchIngredients("Red Velvet")
//Rest of program
My question is, while this stuff is simple enough on its own, when I add much more stuff to the bakeCake function it becomes cluttered. But lets say that this program has to bake thousands of cakes per second. From what I've heard, it takes significantly longer (relative to computer time) to use another function compared to just doing the statements in the current function. So something that's similar like this should be very easy to read, and if efficiency is important wouldn't I want to keep it in there?
Basically, at what point do I sacrifice readability for efficiency. And a quick bonus question, at what point does having too many functions decrease readability? Here's an example of Apple's swift tutorial.
func isCandyAmountAcceptable(bandMemberCount: Int, candyCount: Int) -> Bool {
return candyCount % bandMemberCount == 0
They said that because the function name isCandyAmountAcceptable was easier to read than candyCount % bandMemberCount == 0 that it'd be good to make a function for that. But from my perspective it may take a few seconds to figure out what the second option is saying, but it's also more readable when ti comes to knowing how it works.
Sorry about being all over the place and kinda asking 2 questions in one. Just to summarize my questions:
Does using functions extraneously make efficiency (speed) suffer? If it does how can I figure out what the cutoff between readability and efficiency is?
How small and simple should I make functions for? Obviously I'd make them if I ever have to repeat the function, but what about one time use functions?
Thanks guys, sorry if these questions are ignorant or anything but I'd really appreciate an answer.
Does using functions extraneously make efficiency (speed) suffer? If
it does how can I figure out what the cutoff between readability and
efficiency is?
For performance I would generally not factor in any overhead of direct function calls against any decent optimizer, since it can even make those come free of charge. When it doesn't, it's still a negligible overhead in, say, 99.9% of scenarios. That applies even for performance-critical fields. I work in areas like raytracing, mesh processing, and image processing and still the cost of a function call is typically on the bottom of the priority list as opposed to, say, locality of reference, efficient data structures, parallelization, and vectorization. Even when you're micro-optimizing, there are much bigger priorities than the cost of a direct function call, and even when you're micro-optimizing, you often want to leave a lot of the optimization for your optimizer to perform instead of trying to fight against it and do it all by hand (unless you're actually writing assembly code).
Of course with some compilers you might deal with ones that never inline function calls and have a bit of an overhead to every function call. But in that case I'd still say it's relatively negligible since you probably shouldn't be worrying about such micro-level optimizations when using those languages and interpreters/compilers. Even then it will probably often be bottom on the priority list, relatively speaking, as opposed to more impactful things like improving locality of reference and thread efficiency.
It's like if you're using a compiler with very simplistic register allocation that has a stack spill for every single variable you use, that doesn't mean you should be trying to use and reuse as few variables as possible to work around its tendencies. It means reach for a new compiler in those cases where that's a non-negligible overhead (ex: write some C code into a dylib and use that for the most performance-critical parts), or focus on higher-level optimizations like making everything run in parallel.
How small and simple should I make functions for? Obviously I'd make
them if I ever have to repeat the function, but what about one time
use functions?
This is where I'm going to go slightly off-kilter and actually suggest you consider avoiding the teeniest of functions for maintainability reasons. This is admittedly a controversial opinion although at least John Carmack seems to somewhat agree (specifically in respect to inlining code and avoiding excess function calls for cases where side effects occur to make the side effects easier to comprehend).
However, if you are going to make a lot of state changes, having them
all happen inline does have advantages; you should be made constantly
aware of the full horror of what you are doing.
The reason I believe it can sometimes be good to err on the side of meatier functions is because there's often more to comprehend than that of a simple function to understand all the information necessary to make a change or fix a problem.
Which is simpler to comprehend, a function whose logic consists of 80 lines of inlined code, or one distributed across a couple dozen functions and possibly ones that lead to disparate places throughout the codebase?
The answer is not so clear cut. Naturally if the teeny functions are used widely, like say sqrt or abs, then the reader can simply skim over the function call, knowing full well what it does like the back of his hand. But if there are a lot of teeny exotic functions that are only used one time, then the ability to comprehend the operation as a whole requires looking them up and understanding what they all individually do before you can get a proper comprehension of what's going on in terms of the big picture.
I actually disagree with that Apple Swift tutorial somewhat with that one-liner function because while it is easier to understand than figuring out what the arithmetic and comparison are supposed to do, in exchange it might require looking it up to see what it does in scenarios where you can't just say, isCandyAmountAcceptable is enough information for me and need to figure out exactly what makes an amount acceptable. Instead I would actually prefer a simple comment:
// Determine if candy amount is acceptable.
if (candyCount % bandMemberCount == 0)
...
... because then you don't have to jump to disparate places in code (the analogy of a book referring its reader to other pages in the book causing the readers to constantly have to flip back and forth between pages) to figure that out. Of course the idea behind this isCandyAmountAcceptable kind of function is that you shouldn't have to be concerned with such details about what makes a candy amount of acceptable, but too often in practice, we do end up having to understand the details more often than we optimally should to debug the code or make changes to it. If the code never needs to be debugged or changed, then it doesn't really matter how it's written. It could even be written in binary code for all we care. But if it's written to be maintained, as in debugged and changed in the future, then sometimes it is helpful to avoid making the readers have to jump through lots of hoops. The details do often matter in those scenarios.
So sometimes it doesn't help to understand the big picture by fragmenting it into the teeniest of puzzle pieces. It's a balancing act, but certain types of developers can err on the side of overly dicing up their systems into the most granular bits and pieces and finding maintenance problems that way. Those types are still often promising engineers -- they just have to find their balance. The other extreme is the one that writes 500-line functions and doesn't even consider refactoring -- those are kinda hopeless. But I think you fit in the former category, and for you, I'd actually suggest erring on the side of meatier functions ever-so-slightly just to keep the puzzle pieces a healthy size (not too small, not too big).
There's even a balancing act I see between code duplication and minimizing dependencies. An image library doesn't necessarily become easier to comprehend by shaving off a few dozen lines of duplicated math code if the exchange is a dependency to a complex math library with 800,000 lines of code and an epic manual on how to use it. In such cases, the image library might very well be easier to comprehend as well as use and deploy in new projects if it chooses instead to duplicate a few math functions here and there to avoid external dependencies, isolating its complexity instead of distributing it elsewhere.
Basically, at what point do I sacrifice readability for efficiency.
As stated above, I don't think readability of the small picture and comprehensibility of the big picture are synonymous. It can be really easy to read a two-line function and know what it does and still be miles away from understanding what you need to understand to make the necessary changes. Having many of those teeny one-shot two-liners can even delay the ability to comprehend the big picture.
But if I use "comprehensibility vs. efficiency" instead, I'd say upfront at the design-level for cases where you anticipate processing huge inputs. As an example, a video processing application with custom filters knows it's going to be looping over millions of pixels many times per frame. That knowledge should be utilized to come up with an efficient design for looping over millions of pixels repeatedly. But that's with respect to design -- towards the central aspects of the system that many other places will depend upon because big central design changes are too costly to apply late in hindsight.
That doesn't mean it has to start applying hard-to-understand SIMD code right off the bat. That's an implementation detail provided the design leaves enough breathing room to explore such an optimization in hindsight. Such a design would imply abstracting at the Image level, at the level of a million+ pixels, not at the level of a single IPixel. That's the worthy thing to take into consideration upfront.
Then later on, you can optimize hotspots and potentially use some difficult-to-understand algorithms and micro-optimizations here and there for those truly critical cases where there's a strong perceived business need for the operation to go faster, and hopefully with good tools (profilers, i.e.) in hand. The user cases guide you about what operations to optimize based on what the users do most often and find a strong desire to spend less time waiting. The profiler guides you about precisely what parts of the code involved in that operation need to be optimized.
Readability, performance and maintainability are three different things. Readability will make your code look simple and understandable, not necessarily best way to go. Performance is always going to be important, unless you are running this code in non-production environment where end result is more important than how it was achieved. Enter the world of enterprise applications, maintainability suddenly gains lot more importance. What you work on today will be handed over to somebody else after 6 months and they will be fixing/changing your code. This is why suddenly standard design patterns become so important. In a way, the readability is part of maintainability on larger scale. If the cake baking program above is something more complex than what its looking like, first thing stands out as a code smell is existence if if-else. Its gotta get replaced with polymorphism. Same goes with switch case kind of construct.
At what point do you decide to sacrifice one for other? That purely depends upon what business your code is achieving. Is it academic? Its got to be the perfect solution even if it means 90% devs struggle to figure out at first glance what the hell is happening. Is it a website belonging to retail store being maintained by distributed team of 50 devs working from 2 or more different geographic locations? Follow the conventional design patterns.
A rule of thumb I have always seen being followed in almost all situations is that if a function is growing beyond half the screen, its a candidate for refactoring. Do you have functions that end up you having your editor long length scroll bars? Refactor!!!

Code structure: should I use lots of functions to increase readability?

My question has Bash and PowerShell scripts in mind, but I suppose it applies to other languages as well.
It is my understanding that the purpose of a function is to perform the same (or a very similar) task multiple times. This decreases the amount of code in the script and it also makes it easier to maintain.
With that in mind, if you discover that your script only calls a function one time then there's no reason for that function to exist as a function. Instead, you should take the function's code and place it in the location where that function is being called.
Having said all that, here's my question:
If I have a complicated script, should I move each section of code into its own function even though each function will only be called once? This would greatly increase the script's readability because its logic (the functions) would all be at the top of the script and the flow of execution would be at the bottom of the script. Since 50 lines of code would be represented by just 1 line, it would be much easier to understand what the script is doing.
Do other people do this? Are there disadvantages to this approach?
Having functions also increases readability. So a bash script might look better and be easier to follow if it reads:
getParams()
startTask()
doSomethingElse()
finishTask()
# implement functions below
even if the function implementations are simple, it reads better.
Code readability is indeed a major concern, usually (nowadays) more important than sheer amount of code or performance. Not to mention that inlining function calls may not necessarily have noticeable performance benefits (very language specific).
So lots of developers (I venture to say that the better of the breed :-) create small functions/methods like you describe, to partition their code into logically cohesive parts.
A function does a well-defined task. If you have a mega function that does 5 different things, it strongly suggests it should be calling 5 smaller functions.
It is my understanding that the purpose of a function is to perform the same (or a very similar) task multiple times.
Well, it is my understanding that a function is a discrete entity that performs a specific, well defined task.
With that in mind, if you discover that your script calls a given function AT LEAST ONCE, then it's doing its job.
Focus on being able to read and easily understand your code.
Having clear, readable code is definitely more a payoff than being afraid of function calls overhead. That's just a premature optimisation.
Plus, the goal of a function is to accomplish a particular task. A task can be a sub-task, there's nothing wrong with that!
Read this book
http://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882
Here are some quotes from the book.
"Small!
The first rule of functions is that they should be small. The second rule of functions is that
they should be smaller than that."
"FUNCTIONS SHOULD DO ONE THING. THEY SHOULD DO IT WELL.THEY SHOULD DO IT ONLY."
As far as my knowledge is concerned, a function represents a Sequence of steps which become a part of larger program.
Coming to your question, I strongly agree that function(s) improve readability and re-usability. But at the same time breaking every thing into pieces might not be a good practice.
Finally, I want to give one statement : "Anything In Excess Is Not Beneficial!"

How many lines should a function have at most?

Is there a good coding technique that specifies how many lines a function should have ?
No. Lines of code is a pretty bad metric for just about anything. The exception is perhaps functions that have thousands and thousands of lines - you can be pretty sure those aren't well written.
There are however, good coding techniques that usually result in fewer lines of code per function. Things like DRY (Don't Repeat Yourself) and the Unix-philosophy ("Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface." from Wikipedia). In this case replace "programs" with "functions".
I don't think it matters, who is to say that once a functions lengths passes a certain number of lines it breaks a rule.
In general just code clean functions easy to use and reuse.
A function should have a well defined purpose. That is, try to create functions which does a single thing, either by doing the thing itself or by delegating work to a number of other functions.
Most functional compilers are excellent at inlining. Thus there is no inherent price to pay for breaking up your code: The compiler usually does a good job at deciding if a function call should really be one or if it can just inline the code right away.
The size of the function is less relevant though most functions in FP tend to be small, precise and to the point.
There is a McCabe metric of Cyclomatic Complexity which you might read about at this Wikipedia article.
The metric measures how many tests and loops are present in a routine. A rule of thumb might be that under 10 is a manageable amount of complexity while over 11 becomes more fault prone.
I have seen horrendous code that had a Complexity metric above 50. (It was error-prone and difficult to understand or change.) Re-writing it and breaking it down into subroutines reduced the complexity to 8.
Note the Complexity metric is usually proportional to the lines of code. It would provide you a measure on complexity rather than lines of code.
When working in Forth (or playing in Factor) I tend to continually refactor until each function is a single line! In fact, if you browse through the Factor libraries you'll see that the majority of words are one-liners and almost nothing is more than a few lines. In a language with inner-functions and virtually zero cost for calls (that is, threaded code implicitly having no stack frames [only return pointer stack], or with aggressive inlining) there is no good reason not to refractor until each function is tiny.
From my experience a function with a lot of lines of code (more than a few pages) is a nightmare to maintain and test. But having said that I don't think there is a hard and fast rule for this.
I came across some VB.NET code at my previous company that one function of 13 pages, but my record is some VB6 code I have just picked up that is approx 40 pages! Imagine trying to work out which If statement an Else belongs to when they are pages apart on the screen.
The main argument against having functions that are "too long" is that subdividing the function into smaller functions that only do small parts of the entire job improves readability (by giving those small parts actual names, and helping the reader wrap his mind around smaller pieces of behavior, especially when line 1532 can change the value of a variable on line 45).
In a functional programming language, this point is moot:
You can subdivide a function into smaller functions that are defined within the larger function's body, and thus not reducing the length of the original function.
Functions are expected to be pure, so there's no actual risk of line X changing the value read on line Y : the value of the line Y variable can be traced back up the definition list quite easily, even in loops, conditionals or recursive functions.
So, I suspect the answer would be "no one really cares".
I think a long function is a red flag and deserves more scrutiny. If I came across a function that was more than a page or two long during a code review I would look for ways to break it down into smaller functions.
There are exceptions though. A long function that consists of mostly simple assignment statements, say for initialization, is probably best left intact.
My (admittedly crude) guideline is a screenful of code. I have seen code with functions going on for pages. This is emetic, to be charitable. Functions should have a single, focused purpose. If you area trying to do something complex, have a "captain" function call helpers.
Good modularization makes friends and influences people.
IMHO, the goal should be to minimize the amount of code that a programmer would have to analyze simultaneously to make sense of a program. In general, excessively-long methods will make code harder to digest because programmers will have to look at much of their code at once.
On the other hand, subdividing methods into smaller pieces will only be helpful if those smaller pieces can be analyzed separately from the code which calls them. Splitting a method into sub-methods which would only be meaningful in the context where they are called is apt to impair rather than improve legibility. Even if before splitting the method would have been over 250 lines, breaking it into ten pieces which don't make sense in isolation would simply increase the simultaneous-analysis requirement from 250 lines to 300+ (depending upon how many lines are added for method headers, the code that calls them, etc.) When deciding whether a method should be subdivided, it's far more important to consider whether the pieces make sense in isolation, than to consider whether the method is "too long". Some 20-lines routine might benefit from being split into two ten-line routines and a two-line routine that calls them, but some 250-line routines might benefit from being left exactly as they are.
Another point which needs to be considered, btw, is that in some cases the required behavior of a program may not be a good fit with the control structures available in the language it's written in. Most applications have large "don't-care" aspects of their behavior, and it's generally possible to assign behavior that will fit nicely with a language's available control structures, but sometimes behavioral requirements may be impossible to meet without awkward code. In some such cases, confining the awkwardness to a single method which is bloated, but which is structured around the behavioral requirements, may be better than scattering it among many smaller methods which have no clear relationship to the overall behavior.

Writing shorter code/algorithms, is more efficient (performance)?

After coming across the code golf trivia around the site it is obvious people try to find ways to write code and algorithms as short as the possibly can in terms of characters, lines and total size, even if that means writing something like:
//Code by: job
//Topic: Code Golf - Collatz Conjecture
n=input()
while n>1:n=(n/2,n*3+1)[n%2];print n
So as a beginner I start to wonder whether size actually matters :D
It is obviously a very subjective question highly dependent on the actual code being used, but what is the rule of thumb in the real world.
In the case that size wont matter, how come then we don't focus more on performance rather than size?
I hope this does not become a flame war. Good code has many attributes, including:
Solving the use-case properly.
Readability.
Maintainability.
Performance.
Testability.
Low memory signature.
Good user interface.
Reusability.
The brevity of code is not that important in 21st century programming. It used to be more important when memory was really scarce. Please see this question, including my answer, for books referencing the attributes above.
A lot of good answers already about what's important versus what's not. In real life, (almost) nobody writes code like code golf, wtih shortened identifiers, minimal whitespace, and the fewest possible statements.
That said, "more code" does correlate with more bugs and complexity, and "less code" tends to correlate with better readability and performance. So all other things being equal, it's useful to strive for shorter code, but only in the sense of "these simple 30 lines of code do the same as that 100 complex lines of code".
Writing "code golf" solutions are often to do with showing how "clever" you are in getting the job done in the most succinct way even at the expense of readability. Quite often, however, more verbose code including, for example, memoization of function results, can be faster. Code size can matter for performance, smaller blocks of code can fit in the L1 CPU cache but this is an extreme case of optimization and a faster algorithm will most always be better. "Code Golf" code is not like production code - always write for clarity & readability of the solution rather than terseness if anyone, including yourself, ever intend to read that code again.
Whitespace has no effect on performance. So code like that is just silly (or perhaps the golf score was based on the character count?). The number of lines also has no effect, although the number of statements can have an effect. (exception: python, where whitespace is significant)
The effect is complex, however. It's not at all uncommon to discover that you have to add statements to a function in order to improve it's performance.
Still, without knowing anything else, bet that more statements is a larger object file, and a slower program. But more lines doesn't do anything other than make code more readable up to a point (after which adding more lines makes it less readable ;)
I don't believe that Code Golf has any practical significance. In practice, readable code is what counts. Which in itself is a conflicting requirement: readable code should be concise, but still easy to understood.
However, I would like to answer your question yet differently. Usually, there are fast and simple algorithms. However, if the speed is top priority, things can get complex real fast (and the resulting code will be longer). I don't believe that simplicity equals speed.
There are many aspects to performance. Performance can for example be measured by memory footprint, speed of execution, bandwith consumption, framerate, maintainability, supportability and so on. Performance usually means spending as little as possible of the most scarce resource.
When applied to networking, brevity IS performance. If your webserver serves a little javascript snippet on every page, it doesn't exactly hurt to keep the variable names short. Pull up www.google.com and view source!
Some times DRY is not helping performance. An example is that Microsoft has found that they don't want a to loop through an array unless it is bigger than 3 elements.
String.Format has signatures for one, two and three arguments, and then for array.
There are many ways of trading one aspect for another. This is usually called caching.
You can for example trade memory footprint for speed of execution. For example by doing lookup instead of execution. It is just a matter of replacing () with [] in most popular languages. If you plan it so that the spaceship in your game can only go in a fixed number of directions, you can save on trigonometric function calls.
Or you can use a proxy server with a cache for looking up things over a network. DNS servers do this all the time.
Finally, if development team availability is the most scarce resource, clarity of code is the best bet for maintainability performance, even if it doesn't run quite as fast or is quite as interesting or "elegant" in code.
Absolutely not. Code size and performance (however you measure it) are only very loosly connected. To make matter worse whats a neat trick on one chip/compiler/OS may very well be the worse thing you can do in another archictecture.
Its counter-intuitive but a clear well written simple as possible implmentation is often far more efficient than than a devious bag of tricks. Today's optimizing compilers like clear uncomplicated code just as much as humans and complex trickery can cause them to abandon thier best optimizing strategies.
Writing fewer lines of code tends to be better for a bunch of reasons. For example, the less code you have, the less chance for bugs. See for example Paul Graham's essay, "Succinctness is Power"
Notwithstanding that, the level reached by Code Golf is usually far beyond what makes sense. In Code Golf, people are trying to write code that is as short as possible, even if they know that it's less readable.
Efficiency is a much harder thing to decide. I'm guessing that less code is usually more efficient, but there are many cases where this isn't true.
So to answer the real question, why do we even have Code Golf competitions which aim at a low character count, if that's not a very important thing?
Two reasons:
Making code as short as possible means you have to be both clever, and know a language pretty well to find all kinds of tricks. This makes it a fun riddle.
Also, it's the easiest measure to use for a code competition. Efficiency, for example, is very hard to measure, especially using many different languages, especially since some solutions are more efficient in some cases, but less in others (big input vs small). Readability: that's a very personal thing, which often leads to heated debates.
In short, I don't think there is any way of doing Code Golf style competitions without using "shortness of code" as the criterion.
This is from "10 Commandments for Java Developers"
Keep in Mind - "Less is more" is not always better. - Code efficiency is a great thing, but > in many situations writing less lines of code does not improve the efficiency of that code.
This is (probably) true for all programming languages (though in assembly it could be different).
It makes a difference if you're talking about little academic-style algorithms or real software, which can be thousands of lines of code. I'm talking about the latter.
Here's an example where a reasonably well-written program was speeded up by a factor of 43x, and it's code size was reduced by 4x.
"Code golf" is just squeezing code, like cramming undergraduates into a phone booth. I'm talking about reducing code by rewriting it in a form that is declarative, like a domain-specific-language (DSL). Since it is declarative, it maps more directly onto its requirements, so it is not puffed up with code that exists only for implementation's sake. That link shows an example of doing that.
This link shows a way of reducing size of UI code in a similar way.
Good performance is achieved by avoiding doing things that don't really have to be done. Of course, when you write code, you're not intentionally making it do unnecessary work, but if you do aggressive performance tuning as in that example, you'd be amazed at what you can remove.
The point of code golf is to optimise for one thing (source length), at the potential expense of everything else (performance, comprehensibility, robustness). If you accidentally improve performance that's a fluke - if you could shave a character off by doubling the runtime, then you would.
You ask "how come then we don't focus more on performance rather than size", but the question is based on a false premise that programmers focus more on code size than on performance. They don't, "code golf" is a minority interest. It's challenging and fun, but it's not important. Look at the number of questions tagged "code-golf" against the number tagged "performance".
As other people point out, making code shorter often means making it simpler to understand, by removing duplication and opportunities for obscure errors. That's usually more important than running speed. But code golf is a completely different thing, where you remove whitespace, comments, descriptive names, etc. The purpose isn't to make the code more comprehensible.

Code replication and refactoring

I would like to hear opinions on small amounts of code replication within methods that
check for the same condition
e.g.
While(condition){
...... do x
}
Normally if there was any of this kind of replication I would refactor the code as it can make versioning a nightmare, what if the condition changes for example you have to change every instance, not a nice job to do.
However what if the condition is relatively simple and is only used within say 3 methods is it wise to refactor,
So in summary where do people draw the line at refactoring code?
Refactoring is not free. Any change to the code can introduce bugs. So every concious developer thinks about whether he has time to carefully examine the changes and in many cases decides not to refactor.
It depends on 2 things, IMO - the size of the duplication (how many lines are involved), and the locality of the duplications - how 'close' are they in terms of context.
If you have duplicated code in several methods in the same class then I would consider extracting even duplicates of a single line into a seperate method (assuming that the line in question was an easly identifiable, easy to isolate, fairly uncoupled piece of code).
Alternatively, if I had some code in one part of a project, and an almost identical piece of code in an (almost) unrelated area then I wouldnt factor that out, as the scope for future divergence would seem quitre high....
The key to refactoring is to do it when you need to. In the above example if you have 3 different while loops with the same condition, who is to say in the future you might want different conditions, if you've already refactored then you've introduced a potential error situation.
It's a matter of judgement, the same condition three times seems ok, the same condition 10 times is an obvious refactor, but where is the tipping point?
I am personally usually quite aggressive with refactorings. I believe that if you clean your code regularly you mostly need to do small and simple refactorings. If you leave it for a while, it gets so messy and difficult to maintain.
In your particular case, I would definitely refactor if the condition has a reasonable business meaning, because it will make the code more readable. But even if this is a technicality, I would consider refactoring provided that its just the matter of extracting a function or property.
Ideally you should have unit tests that will make sure that your refactoring is still correct, so the cost of doing it should be really a few minutes, often smaller than writing this response.
A common rule of thumb is the Rule of Three introduced by Martin Fowler in the seminal work Refactoring. The rule says that two things that are basically the same can stay, but once you add a third you should refactor them.
Besides making future changes easier as you mention, refactoring helps with readability and can make intention more obvious.

Resources