Code replication and refactoring - refactoring

I would like to hear opinions on small amounts of code replication within methods that
check for the same condition
e.g.
While(condition){
...... do x
}
Normally if there was any of this kind of replication I would refactor the code as it can make versioning a nightmare, what if the condition changes for example you have to change every instance, not a nice job to do.
However what if the condition is relatively simple and is only used within say 3 methods is it wise to refactor,
So in summary where do people draw the line at refactoring code?

Refactoring is not free. Any change to the code can introduce bugs. So every concious developer thinks about whether he has time to carefully examine the changes and in many cases decides not to refactor.

It depends on 2 things, IMO - the size of the duplication (how many lines are involved), and the locality of the duplications - how 'close' are they in terms of context.
If you have duplicated code in several methods in the same class then I would consider extracting even duplicates of a single line into a seperate method (assuming that the line in question was an easly identifiable, easy to isolate, fairly uncoupled piece of code).
Alternatively, if I had some code in one part of a project, and an almost identical piece of code in an (almost) unrelated area then I wouldnt factor that out, as the scope for future divergence would seem quitre high....

The key to refactoring is to do it when you need to. In the above example if you have 3 different while loops with the same condition, who is to say in the future you might want different conditions, if you've already refactored then you've introduced a potential error situation.
It's a matter of judgement, the same condition three times seems ok, the same condition 10 times is an obvious refactor, but where is the tipping point?

I am personally usually quite aggressive with refactorings. I believe that if you clean your code regularly you mostly need to do small and simple refactorings. If you leave it for a while, it gets so messy and difficult to maintain.
In your particular case, I would definitely refactor if the condition has a reasonable business meaning, because it will make the code more readable. But even if this is a technicality, I would consider refactoring provided that its just the matter of extracting a function or property.
Ideally you should have unit tests that will make sure that your refactoring is still correct, so the cost of doing it should be really a few minutes, often smaller than writing this response.

A common rule of thumb is the Rule of Three introduced by Martin Fowler in the seminal work Refactoring. The rule says that two things that are basically the same can stay, but once you add a third you should refactor them.
Besides making future changes easier as you mention, refactoring helps with readability and can make intention more obvious.

Related

When should you not refactor?

We all know that refactoring is good and I love it as much as the next guy, but do you have real cases where is better not to refactor ?
Something like time critical stuff or synchronization? Technical or human reasons are equally welcome. Real cases scenarios and experiences a plus.
Edit : from the answers thus far, it looks like the only reason not to refactor is money. My question is mostly relative to something like this: suppose you would like to perform "extract method", but if you add the additional function call, you will make the code slightly less faster and hinder a very strict synchronization. Just to give you an idea of what I mean.
Another reason I sometimes heard is that "others used to the current code layout will get annoyed by your changes". Of course, I doubt this is a good reason.
I'm a big fan of refactoring to keep code clean and maintainable. But you generally want to shy away from refactoring production modules that work fine and don't require change. However, when you do need to work on a module to fix bugs or introduce a new feature, some refactoring is usually worth it and won't cost much since you're already committed to doing a full set of tests and going through the release process. (Unit tests are very helpful, but are only part of the full test suite, as other posters noted.)
More significant refactorings may make it harder for others to find their way around the new code, and they may then react unfavorably to refactoring. To minimize this, bring other team members in on the process using an approach like pair programming.
Update (8/10): Another reason to not refactor is when you aren't approaching the existing code base with proper humility and respect. With these qualities you'll tend to be conservative and do only refactorings that really do make a difference. If you approach the code with too much arrogance, you may wind up just making changes instead of refactoring. Is that new method name really clearer, or did the old one have a name with a very specific meaning in your application domain? Did you really need to mechanically reformat that source file to your personal style, when the existing style met project guidelines? Again pair programming can help.
To reinforce the other answer (and touch on issues you mention): do not refactor a part of the code until it's well covered by all relevant kinds of testing. This doesn't mean "don't refactor it" -- the emphasis is on "add the necessary tests" (to do unit-tests properly may well require some refactoring, particularly the introduction of factory DPs and/or dependency injection DPs in code that's now solidly bolted to concrete dependencies).
Note that this does cover your second paragraph's issues: if a section of the code is time-critical it should be well covered by "load-tests" (which like the more usual kind, correctness-test, should cover both specific units [albeit performance-wise -- correctness-checking is other tests' business!-)] AND end-to-end operations -- the equivalent of unit tests and integration tests if one was talking about correctness rather than performance).
Multi-tasking code with subtle sync issues can be a nightmare as no test can really make you entirely confident about it -- no other refactoring (that might in any way affect any fragile sync that just appears to be working now) should be considered BEFORE one intended to make the synchronization much, MUCH more robust and sound (message-passing through guaranteed-threadsafe queues being BY FAR my favorite design pattern in this regard;-).
Hmmm - I disagree with the above (1st response). Given code with no tests, you may refactor it to to make it more testable.
You do not refactor code when you cannot test the resulting code in time to deliver it such that it is still valuable to the recipient.
You do not refactor code when your refactoring will not improve the quality of the code. Quality is not subjective, although at times, design may be.
You do not refactor code when there is no business justification for making an alteration.
There are probably more, but hopefully you get the idea...
As Martin Fowler writes, you shouldn't refactor if a deadline is near. That time in project is better suited to flush out bugs instead of improving design (refactoring). Do the refactoring omitted this time directly after the deadline is over.
Refactoring is not good in and of itself. Rather, its purpose is to improve code quality so that it can be maintained more cheaply and with less risk of adding defects. For actively developed code, the benefits of refactoring are worth the cost. For frozen code that there is no intention to do any further work on, refactoring yields no benefit.
Even for live code, refactoring has its own risks, which unit tests can minimize. It also has its own place in the development cycle, which is towards the front, where it's less disruptive. The best time and place for refactoring is just before you start to make major changes to some otherwise brittle code.
When it is not cost-effective. There's a guy at the place I work who loves refactoring. Making code perfect makes him very happy. He can check out a current project tree and go to town on it, moving functions and classes around and tightening things up so they look great, have better flow, and are more extensible in the future.
Unfortunately, it's not worth the money. If he spends a week refactoring some classes into more functional units that may be easier to work with in the future, that's a week's worth of salary lost to the company with no noticeable bottom-line improvements.
Code will never, ever be absolutely perfect. You learn to live with it, and keep your hands off something that could be done better, but perhaps isn't worth the time.
If the code seems very difficult to refactor without breaking, that's the most important code to refactor!
If there aren't any tests, write some as you refactor.
Honestly, the one case is where you are forbidden to touch some code by management/customer/SomeoneImportant, and when that happens I consider the project broken.
Here is my experience:
Don't refactor:
When you don't have test suite accompanying with the code you want to refactor. You might want to develop the test suit first instead.
When your manager doesn't really care about the maintainablity and extensibility of current code base, instead they care much about if they would be able to deliver the product on schedule, especially for the project with short and tight schedule.
If you stick to the principle that everything that you do should add value for the client/ business, then the times you should not refactor are the following:
Code that works and no new development is planned.
Code that is good enough / works and refactoring simply represents gold plating.
The cost of refactoring is higher than living with the existing code.
The cost of refactoring is higher that rewriting the code from scratch
Some of the other answsers say that you should not refactor code that does not have unit tests. If code needs refactoring, you should refactor it, you must however write tests first. If the code is written in a way that makes it difficult to test, it should be rewritten (in a perfect world).
When you've got other stuff to build. I always feel like refactoring an existing system when I'm supposed to be doing something else.
There's always a balance to be had between fixing or adding to code and refactoring. However, this balance is so far in favor of refactoring that I don't think I've ever been on a team that refactored too much. Chances are, if you think you're erring on the side of refactoring too much, you're right on the money.
Of course, the biggest determining factor is how close the deadline is. If a deadline is imminent, requirements come first.
Isn't the need to refactor code largely based on the propensity of people to cut and paste code rather than thinking the solution through, and doing the factoring in advance? In other words, whenever you feel the need to cut & paste some code, merely make that chunk of code a function, and document it.
I have had to maintain way too much code where people found it easier to cut and paste a whole function, only to make one or two trivial changes, which could easily have been parametrized. But like many other's experience, to try to refactor some of this code would have take a LOT of time and been very risky.
I have 4 projects wherein a 10K line collection of functions was merely copied and modified as needed. This is a horrid maintenance nightmare. Especially when the code has LOTS of problems, e.g. hard-wired endianness assumptions, tons of global variables, etc. I feel bile in my throat just thinking about it.
Don't refactor if you don't have the time to test the refactored code before release. Refactoring can introduce bugs. If you have well-tested and relatively bug-free code, why take the risk? Wait until the next development cycle.
If you're stuck maintaining an old flakey code base with no future beyond keeping it running until management can bite the bullet and do a rewrite then refactoring is a lose-lose situation. First the developer loses because refactoing bad flakey code is a nightmare and secondly the business loses because as the developer attempts to refactor the software breaks in unexpected and unforseen ways.
When you don't really know what the code is doing in the first place. And yes, I have seen people ignore that rule.
It's just a cost-benefit tradeoff. Estimate the cost to refactor, estimate the benefits, determine if you actually have the time to refactor given other tasks, determine if refactoring is the best time-benefit tradeoff. There may be other tasks more worth doing.

How do you decide which parts of the code shall be consolidated/refactored next?

Do you use any metrics to make a decision which parts of the code (classes, modules, libraries) shall be consolidated or refactored next?
I don't use any metrics which can be calculated automatically.
I use code smells and similar heuristics to detect bad code, and then I'll fix it as soon as I have noticed it. I don't have any checklist for looking problems - mostly it's a gut feeling that "this code looks messy" and then reasoning that why it is messy and figuring out a solution. Simple refactorings like giving a more descriptive name to a variable or extracting a method take only a few seconds. More intensive refactorings, such as extracting a class, might take up to a an hour or two (in which case I might leave a TODO comment and refactor it later).
One important heuristic that I use is Single Responsibility Principle. It makes the classes nicely cohesive. In some cases I use the size of the class in lines of code as a heuristic for looking more carefully, whether a class has multiple responsibilities. In my current project I've noticed that when writing Java, most of the classes will be less than 100 lines long, and often when the size approaches 200 lines, the class does many unrelated things and it is possible to split it up, so as to get more focused cohesive classes.
Each time I need to add new functionality I search for already existing code that does something similar. Once I find such code I think of refactoring it to solve both the original task and the new one. Surely I don't decide to refactor each time - most often I reuse the code as it is.
I generally only refactor "on-demand", i.e. if I see a concrete, immediate problem with the code.
Often when I need to implement a new feature or fix a bug, I find that the current structure of the code makes this difficult, such as:
too many places to change because of copy&paste
unsuitable data structures
things hardcoded that need to change
methods/classes too big to understand
Then I will refactor.
I sometimes see code that seems problematic and which I'd like to change, but I resist the urge if the area is not currently being worked on.
I see refactoring as a balance between future-proofing the code, and doing things which do not really generate any immediate value. Therefore I would not normally refactor unless I see a concrete need.
I'd like to hear about experiences from people who refactor as a matter of routine. How do you stop yourself from polishing so much you lose time for important features?
We use Cyclomatic_complexity to identify the code that needs to be refactored next.
I use Source Monitor and routinely refactor methods when the complexity metric goes aboove around 8.0.

When should I break a function?

Its prudent to break a long function into a chief function and helper functions.
I know that the outside the module only chief function will be called, but its long length may prove to be intimidating.
Textbooks put a limit on the number of lines, but I feel that this is too rigid.
P.S. I am programming in Python and need to process incoming, messages. The function returns a tuple containing the message but in Python's internal data types.
So you can see somewhat independent code for each message type.
Duplicate Question
When is a function too long?
I think you need to go about this from the other end of the problem. Think bottom-up. Identify small units of work, as small as possible, and start composing your code that way. You will only run into spaghetti-code issues when you code top-down and don't keep a structured approach.
If you already have spaghetti code and need to refactor, you pretty much have to start over. It is probably more work to break up existing spaghetti code than to rewrite it, and the result may not be as good.
I don't think there should be a hard number for the lines of code in a method either, but well written code does not have methods with more than 5 to 10 lines in the lower layers, and 20 to 30 lines in the business logic. To give you some kind of metric.
I'm not a big fan of breaking a function into multiple functions unnecessarily. It's not a hard and fast thing - if there are things that seem like distinct logical units, then by all means, break those out and think about them separately. But don't just break things out for the sake of some guideline like "one page per function" or "N lines per function".
One good rule of thumb is that if it doesn't fit on a single screen it is worth thinking about splitting it up. But only if it makes sense to split it up, some long functions are perfectly readable and it doesn't make any sense to slavishly split them into multiple functions just for the sake of it.
Never write a function that, when printed on fanfold paper, is taller than you are.
I like the rule of thumb that you should break out the subfunction if you can think of a good domain-relevant name for it.
When someone can understand the top-level function without necessarily having to look up the definition of the sub-function, you've likely made a net gain. (But when you break it down too far, your names start referring to your implementation artifacts rather than the domain)
I was recently discussing this with a friend. He suggested refactoring to separate concerns and I must say I have to agree. That is, one function should do one thing, if it does more than one thing, split it up. If not, let it be together, it makes no sense to split up a function, only to have it obfuscate the meaning. After all, a function is a block of code that does one thing!
The limit in term of number of lines is often impractical becuase it doesn't account for readability well. It's better to try to seperate groups of lines of code that have just a few inputs and just a few outputs and make this a separate functon. It's not always possible - then it's often wise to just leave the code as it is and not to refactor for the sake of refactoring.
Well since I am coding in Python so I have the liberty to write functions inside functions, unlike C, C++ or Java. This i feel is a better choice.
It's not specified. But line should be as low as possible. But you may follow the Role of 30. I follow this in my PHP scripts when needed.
Rule of 30:
“Rule of 30” in Refactoring in Large Software Projects by Martin Lippert and Stephen Roock:
Methods should not have more than an average of 30 code lines.
A class should contain an average of less than 30 methods.
A package/library shouldn’t contain more than 30 classes.
Subsystems should avoid more than 30 packages.
A system more than 30 subsystems may create problem.
If an element consists of more than 30 subelements, it is highly probable that there is a serious problem.
personally I break a function if it either saves total lines or total processing time.
if I only run the helper once per chief function I don't bother
The point is that in principal it's better to have specialiced functions. But where one sets the limit depends very much on
1) the "usual" programming style in certain languages. (one can observe that, object-oriented langauges tend to shorter procedureds than let's say C or the like
2) it depends on your way of programming. Every hard limit must be questioned. IMHO. Overall there will probably some "natural" distribution of programs
3) I think what one should keep on one's mind is that a function should do a certain task take for example some function for parsing it is usually much longer than a function just settin some field in a structure. Or getting back just consider how a event loop in the Windows API may look. So that all suggests that there may be good reasons for long methods...
If there is independent code (in your case specifics for each message type) those areas should be broken out.
Size matters not. Judge me by my size do you? - Yoda
Your main concerns are readability, simplicity and maintainability. A good indicator is if you need to write comments to explain a section of a function then that section is a good candidate for a separate function.
There are many reasons to break a long function into its constituent pieces. Most important is:
readability
maintainability
code clarity/intent
Some functions simple cannot be broken into smaller pieces without negatively impacting the listed goals, so there is no hard-and-fast rule.
If you didn't write it and it's already in production: NEVER!!! If you break it up, you're likely to break it, it's that simple.
If you are writing it and you're not sure, the on screen rule apples as others have said.

How to refactor rapidly evolving code?

I have some research code that's a real rat's nest, with code duplication everywhere, and clearly needs to be refactored. However, the code base is evolving as I come up with new variations on the theme and fit them into the codebase. The reason I've put off refactoring so long is because I feel like the minute I spend a few days coming up with good abstractions, seeing what design patterns fit where, etc., I'll want to try out some new unforeseen idea that makes my abstractions completely inadequate. In other words, because of the rate at which the code is evolving, I really have no idea where abstraction lines belong, even though there is no shortage of (approximate) duplication and the general messiness of the code makes adding stuff to it a real pain. What are some general best practices for coping with this kind of situation?
Don't spend so long refactoring!
When you're about make a change in a piece of code, consider refactoring it to make the change easier.
After making the change, refactor again to clean up the damage done by that change.
In both cases, make the refactorings small and do them quickly, and move on.
You don't have to keep your code pristine at all times, but remember that it's easier to go fast if you have well-factored code to work in (and if you have good unit tests, of course).
Test Driven Development:
Red, Green, Refactor. Rinse, repeat.
Since it's one of the steps in every single cycle, you'll notice that's a LOT of usually minor refactoring taking place. That's the way it should be.
Your situation is pretty familiar to me. While doing investigative coding often you have no idea what the "right" abstraction will be, and as you say it can change with every new idea.Other posters have suggested:
Continuous small refactoring, which helps to avoid getting into the rats-nest situation
Test-Driven Development, which helps to find good, re-usable abstractions. It's important to note that TDD is less about testing than about doing good designs!
However, for investigative research code there is another strategy: the prototype. This seems to be what you are currently doing: coding as quickly as possible to prove a concept. There's nothing wrong with that, but a prototype should always be throw-away. Tweak it until you have all the necessary input and knowledge, then throw away the code and start over with TDD and continuous refactoring, and all your other "doing the things right" strategies.
Don't keep any of the code. Don't copy-paste anything. Don't refer back to it. Just start over with your new knowledge.
Clean up the code a little bit at a time. Always when you touch a class, try to leave the class cleaner that it was before you touched it ("the boy scout rule"). Refactoring is best done in very small steps, but very often.
Things like renaming some variable, splitting a method etc. take only some seconds or minutes. Large refactorings such as splitting or joining classes, may take an hour or two (and you make it in small steps, so that all tests pass at least every five minutes - otherwise you have entered Refactoring Hell and you should revert to the last known working state). If it takes days or weeks for you to refactor something, then it's not anymore "refactoring" - it's more like rewriting.
An article about this topic:
http://blog.objectmentor.com/articles/2007/07/20/whats-your-unit-of-measure
Put it in Distributed SCM like Git at least, that way when you break something refactoring you can reverse time divisibly to find the commit prior to the change, as well as being able to work on changes and commit them in branches without interfering with others work.
Gits Branch merge is great for things like this and you'll know easily if 2 people made incompatible changes in parallel without having to worry about the rest of the code.
For the above reasons, I would also create a seperate branch in the repository just for re factoring code with, and keep it up-dated regularly. This way, not only will others not interfere with your progress, but they can keep an eye on it and see changes in it that will eventually hit the main branch so they can pre-emptively code around those changes.
If you already know where there is duplication, you don't need several days to refactor it away.
Sometimes a rewrite is the only choice. This seems to be the case.
The CloneDR finds duplicate code, both exact copies and near-misses, across large source systems, parameterized by langauge syntax. It supports Java, C#, COBOL, C++, PHP and many other languages.
When it shows a parameterized abstraction of a set of found clones, it is essentially proposing that you refactor the code with that abstraction implemented (as a method, a function, a class, ...).
So running the CloneDR gets a list of potential abstractions to be added to your code, and replacing the clone instances by calls on the abstraction refactors your code thus cleaning it up (somewhat).
Even more remarkably, when it shows the parameter bindings used at each clone site needed to invoke the abstraction, it often shows a bungled clone instance, easily recognized when the bound paramters are conceptually inconsistent. If a parameer is bound to variables named YYYY-MM-DD, and one of them is YY-MM-DD, the "its a 4 digit-year" parameter type looks violated and in this this case there's a broken Y2K remediation. So examining the clone bindings often finds bugs.
This is a very common problem in scientific computing. Some of the most effective ideas for reducing the size and complexity of code require leveraging assumptions, and science demands that you constantly change those assumptions.
All you can do is try to refactor your code as you go, and try not to write yourself into any corners. Also work with good people who understand the value of not making a mess.

How often should you refactor?

I had a discussion a few weeks back with some co-workers on refactoring, and I seem to be in a minority that believes "Refactor early, refactor often" is a good approach that keeps code from getting messy and unmaintainable. A number of other people thought that it just belongs in the maintenance phases of a project.
If you have an opinion, please defend it.
Just like you said: refactor early, refactor often.
Refactoring early means the necessary changes are still fresh on my mind. Refactoring often means the changes tend to be smaller.
Delaying refactoring only ends up making a big mess which further makes it harder to refactor. Cleaning up as soon as I notice the mess prevents it from building up and becoming a problem later.
I refactor code as soon as it's functional (all the tests pass). This way I clean it up while it's still fresh in my mind, and before anyone else sees how ugly the first version was.
After the initial check-in I typically refactor every time I touch a piece of code. Refactoring isn't something you should set aside separate time for. It should be something you just do as you go.
You write code with two hats on. The just-get-the-thing-working hat and the I-need-to-understand-this-tomorrow hat. Obviously the second hat is the refactoring one. So you refactor every time you have made something work but have (inevitably) introduced smells like duplicated code, long methods, fragile error handling, bad variable names etc...
Refactoring whilst trying to get something working (i.e. wearing both hats) isn't practical for non-trivial tasks. But postponing refactoring till the next day/week/iteration is very bad because the context of the problem will be gone from your head. So switch between hats as often as possible but never combine them.
I refactor every chance I get because it lets me hone my code into the best it can be. I do this even when actively developing to prevent creating unmaintainable code in the first place. It also oftentimes lets me straighten out a poor design decision before it becomes unfixable.
Three good reasons to refactor:
Your original design (perhaps in a very small area, but design nonetheless) was wrong. This includes where you discover a common operation and want to share code.
You are designing iteratively.
The code is so bad that it needs major refurbishment.
Three good reasons not to refactor:
"This looks a little bit messy".
"I don't entirely agree with the way the last guy did this".
"It might be more efficient". (The problem there is 'might').
"Messy" is controversial - there is a valid argument variously called "fixing broken windows", or "code hygiene", which suggests that if you let small things slide, then you will start to let large things slide too. That's fine, and is a good thing to bear in mind, but remember that it's an analogy. It doesn't excuse shunting stuff around interminably, in search of the cleanest possible solution.
How often you refactor should depend on how often the good reasons occur, and how confident you are that your test process protects you from introducing bugs.
Refactoring is never a goal in itself. But if something doesn't work, it has to be fixed, and that's as true in initial development as it is in maintenance. For non-trivial changes it's almost always better to refactor, and incorporate the new concepts cleanly, than to patch a single place with great lumps of junk in order to avoid any change elsewhere.
For what it's worth, I think nothing of changing an interface provided that I have a handle on what uses it, and that the scope of the resulting change is manageable.
As the book says, You refactor when
you add some code... a new feature
when you fix a bug / defect
when you do a code-review with multiple people
when you find yourself duplicating something for the third time.. 3 strikes rule
I try to go by this motto: leave all the code you touch better than it was.
When I make a fix or add a feature I usually use that opportunity to do limited refactoring on the impacted code. Often this makes it easier to make my intended change, so it actually doesn't cost anything.
Otherwise, you should budget dedicated time for refactoring, if you can't because you are always fighting fires (I wonder why) then you should force yourself to refactor when you find making changes becomes much harder than it should and when "code smells" are just unbearable.
A lot of times when I'm flushing out ideas my code starts out very tightly coupled and messy. As I start polishing the idea more the logical separations start becomming more and more clear and I begin refactoring. It's a constant process and as everyone suggests should be done 'Early and Often'.
I refactor when:
I'm modifying code and I'm confused by it. If it takes me a while to sift it out, it needs refactoring.
I'm creating new code and after I've got it "working". Often times I'll get things working and as I'm coding I realize "Hey, I need to redo what I did 20 lines up, only with a few changes". At that point I refactor and continue.
The only thing that in my opinion should stop you from doing this is time constraints. Like it or not, sometimes you just don't have the time to do it.
It's like the National Parks -- Always leave it a little better than you found it.
To me, that means any time I open code, and have to scratch my head to figure out what's going on, I should refactor something. My primary goal is for readability and understanding. Usually it's just renaming a variable for clarity. Sometimes it's extracting a method -
For example (trivial), If I came across
temp = array[i];
array[i] = array[j];
array[j] = temp;
I would probably replace that with a swap(i,j) method.
The compiler will likely inline it anyways, and a swap() tells everyone semantically what's going on.
That being said, with my own code (starting from scratch), I tend to refactor for design.
I often find it easier to work in a concrete class. When its done and debugged, then I'll pull the old Extract Interface trick.
I'll leave it to a co-worker to refactor for readability, as I'm too close to the code to notice the holes. After all, I know what I meant.
Refactor opportunistically! Do it whenever it's easy.
If refactoring is difficult, then you're doing it at the wrong time (when the code doesn't need it) or on the wrong part of the code (where there are better efficiencies to be gained elswhere). (Or you're not that good at refactoring yet.)
Saving refactoring for "maintenance" is a tautology. Refactoring is maintenance.
I refactor every time I read anything and can make it more readable. Not a major restructuring. But if I think to myself "what does this List contain? Oh, Integers!" then I'll change it to List<Integer>. Also, I often extract methods in the IDE to put a good name of a few lines of code.
The answer is always, but more specifically:
Assuming you branch for each task, then on each new branch, before it goes to QA.
If you develop all in the trunk, then before each commit.
When maintaining old code, use the above for new tasks, and for old code do refactoring on major releases that will obtain extra QA.
Everytime you encounter a need. At least when you're going to change a piece of code that needs refactoring.
I localize refactoring to code related to my current task. I try to do my refactoring up front. I commit these changes separately since from a functional standpoint, it is unrelated to the actual task. This way the code is cleaner to work with and the revision history is also cleaner.
Continuously, within reason. You should always be looking for ways to improve your software, but you have to be careful to avoid situations where you’re refactoring for the sake of refactoring (Refactorbation).
If you can make a case that a refactoring will make a piece of code faster, easier to read, easier to maintain or easier or provide some other value to the business I say go for it!
"Refactor early, refactor often" is a productive guideline. Though that kind of assumes that you really know the code. The older a system gets, the more dangerous refactoring becomes, and the more deliberation is required. In some cases refactoring needs to be managed tasks, with effort level and time estimates, etc.
If you have a refactoring tool that will make the changes safely, then you should refactor whenever the code will compile, if it will make the code clearer.
If you do not have such a tool, you should refactor whenever the tests are green, if it will make the code clearer.
Make small changes -- rename a method to make what it does clearer. Extract a class to make a group of related variables be clearly related. Refactoring is not about making large changes, but about making things cleaner minute by minute. Refactoring is clearing your dishes after each meal, instead of waiting until every surface is covered in dirty plates.
Absolutely as soon as it seems expedient. If you don't the pain builds up.
Since switching to Squeak (which I now seem to mention every post) I've realised that lots of design questions during prototyping fall away because refactoring is really easy in that environment. To be honest, if you don't have an environment where refactoring is basically painless, I recommend that you try squeak just to know what it can be like.
Refactoring often can often save the day, or at least some time. There was a project I was working on and we refactored all of our code after we hit some milestone. It was a great way because if we needed to rip code out that was no longer useful it made it easier to patch in whatever new thing we needed.
We're having a discussion at work on this right now. We more or less agree that "write it so it works, then fix it". But we differ on the time perspective. I am more "fix it right away", my coworker is more "fix it in the next iteration".
Some quotes that back him up:
Douglas Crockford, Senior Javascript Architect Yahoo:
refactor every 7th sprint.
Ken Thompson (unix man):
Code by itself almost rots and it's
gonna be rewritten. Even when nothing
has changed, for some reason it rots.
I would like that once done with a task the code submitted is something you can come back to in 2 months and think "yes, I did well here". I do not believe that it is easy to find time to come back later and fix it. believing this is somewhat naive from my point of view.
Edit: spelling error
I think you should refactor something when you're currently working on a part of it. Means if you have to enhance function A, then you should refactor it before (and afterwards?). If you don't do anything with this function, then leave it as it is, as long as you have something else to do.
Do not refactor a working part of the system, unless you already have to change it.
There are many views on this topic, some linked to a particular methodology or approach to development. When using TDD, refactor early and often is, as you say, a favoured approach.
In other situations you may refactor as and when needed. For example, when you spot repetitious code.
When following more traditional methods with detailed up-front design, the refactoring may be less often. However, I would recommend not leaving refactoring until the end of a project. Not only will you be likely to introduce problems, potentially post-UAT, it is often also the case that refactoring gets progressively more difficult. For this reason, time constraints on the classic project cause refactoring and extra testing to be dropped and when maintenance kicks in you may have already created a spaghetti-code monster.
If it ain't broke, don't refactor it.
I'd say the time to refactor belongs in the initial coding stage, and ou can do it as often and as many times as you like. Once its in the hands of a customer, then it becomes a different matter. You do not want to make work for yourself 'tidying' up code only to find that it gets shipped and breaks something.
The time after initial delivery to refactor is when you say you'll do it. When the code gets a bit too smelly, then have a dedicated release that contains refactorings and probably a few more important fixes. That way, if you do happen to break something, you know where it went wrong, you can much more easily fix it. If you refactor all the time, you will break things, you will not know that its broken until it gets QAd, and then you'll have a hard time trying to figure out whether the bugfix/feature code changes caused the problem, or some refactoring you performed ages ago.
Checking for cbreaking changes is a lot easier when the code looks roughly like it used to. Refactor a lot of code structure and you can make it next to impossible, so only refactor when you seriously mean to. Treat it like you would any other product code change and you should be ok.
I think this Top 100 Wordpress blog post may have some good advice.
http://blog.accurev.com/2008/09/17/dr-strangecode-or-how-i-learned-to-stop-worrying-and-love-old-code/

Resources