Refactor Mercilessly or Build One To Throw Away? - refactoring

Where a new system concept or new technology is used, one has to build a
system to throw away, for even the best planning is not so omniscient as
to get it right the first time. Hence plan to throw one away; you will, anyhow.
-- Fred Brooks, The Mythical Man-Month [Emphasis mine]
Build one to throw away. That's what they told me. Then they told me that we're all agile now, so we should Refactor Mercilessly. What gives?
Is it always better to refactor my way out of trouble? If not, can anyone suggest a rule-of-thumb to help me decide when to stick with it, and when to give up and start over?

If you're doing test-driven development, you can refactor your way out of almost any trouble. I've changed major design decisions without much trouble, and rescued decade-old codebases.
The only exception is when you've discovered that your architecture is completely wrong from beginning to end. For example, if you wrote your app using threads, but you discovered that you wanted a bunch of asynchronous state machines. At that point, go ahead and throw away the first draft.

Throw away early, refactor later
Throwing away is OK for small systems, but if the size of the system is huge, you simply do not have the resources to do so.
You could, however, create a small pilot project that implements only the very essential features of the actual project. After some trial and error and learning and throwing away stuff, you end up with a solid core and a better understanding for the actual project. Then you let the size of the project grow, by adding all the features needed. But once you get there, no way you can throw away the core. Only refactoring.

If you're merciless enough, the end result of refactoring will be pretty close to what you'd have gotten if you rebuilt from scratch, but you won't have been stuck with a non-working system during the process.

One of the central points of The Mythical Man Month was that the hard part of software development is figuring out what to say, not how to say it.
The way I've interpreted this recently is that the most value you get out of the first draft is the requirements you've gathered and preserved in the form of tests. If you're careful not to test things that aren't actually requirements of the system, you can refactor your way out of any mess.
So long as you don't code yourself into a trap where you have to start throwing out tests, you are OK to throw out as much code as you want without losing a significant amount of real work.

My general advice here is to refactor an existing system away from its bad designs to a system with better designs. This maintains the system and allows it to be deployed at all times. If you start from scratch it may be a while before you can deploy, or never.
If you are talking about just writing some brand new code where there is no existing system, then quite often its a good idea to write a little bit of code, however you want, then throw that away since it was never deployed and start again (using TDD).

There comes a point where refactoring is a waste of time. You just have to start again from scratch. If you keep your design fairly flexible, and you recognise that you don't know everything just yet, you won't have to throw anything away. A class might become redundant, of course, but you wont throw away a whole system.
Having a flexible design is necessary to be able to refactor properly. Having no design or a rigid design means you WILL end up throwing something away - either because you cannot refactor, or because the constant refactoring degrades the maintainability of your code base. Few humans are meticulous and disciplined enough to be able to complete a long sequence of minor refactorings to maintain integrity. Unless you have an all-star team, this degradation will happen!
TL;DR: You can refactor your way out of most trouble. Occasionally, though, you won't be able to refactor past some design elements. When that happens, it is time to start again - although hopefully you can re-use some of the components you have in place.

Different situations require different approaches. Personally I perfer refactoring to a better design whenever possible. Refactoring leads to less bugs than a rewrite.
But, even if you plan to throw one away, its still a good idea to write a bunch of acceptance tests to make sure your 2nd version is on the right track. Then you can migrate towards the next version piece by piece while ensuring your functionality isnt changing from the user's perspective. Sounds a bit like refactoring, just a little sloppier I guess.

When talking about Agile, you could do both, but in a general way, you will do spikes (prototypes) only to try specific issues, learn about them and be able to do better estimates. Throw away when you are doing a simple spike and refactoring when you are really coding the application.
Kind Regards

I'll prototype when I'm trying to work out a new problem or functionality. After than, I'll rebuild it based on what I learned. Actually, that sounds a lot like refactoring... what? Maybe it's the same thing? Hmmm...

I think that throwing one away is sometimes the best way to go, but it can hurt. One thing that I've found that works well is to throw one away, but choose your technology well.
For example, I've written a large codebase in Ruby on Rails, and over the past 2-3 years, RoR has advanced a lot. I also made some decisions in architecture that needed to be fixed. So, I'm throwing one away, and building a new one from scratch. However, I'm still able to use 70-80% or so of my old code as I'm still writing in Ruby and Rails.
The major factor that has helped with that is that Rails forces you to write well structured code with separation of business logic and presentation layers. I didn't get it perfect the first time around, but since everything is fairly well separated and DRY, porting the code to Rails v2.1, re-architecting the problem areas, and re-writing some "problem" features has been a fairly pain free experience.
So, by choosing a great technology from the start, I've been able to throw one away, but still take with me 70-80% of the old stuff that still works.

In a later essay in The Mythical Man Month, Brooks warns that he's found that if you do indeed plan to throw 1 away, you'll end up throwing 2 away!
I personally saw this happen in real life; we assigned a version 1 of the project as a quick throw-away to a mediocre programmer, because "we plan to throw it away later -- we will anyhow." We ended up having to rewrite it for version 2, but that one got thrown away too. I never saw version 3 - the company went out of business.
I think when Brooks says "plan to throw one away, you will anyway" it's more like the statement "the number of bugs remaining to be found is 'n+1'." That is, it's a ha-ha-only-serious statement about Murphy's law, rather than practical advice. The lessons to take away from it is that prototypes are valuable, good writing is rewriting, and don't be afraid to abandon something that isn't working.
However, it has to come down to a judgement call because as Joel Spolsky has talked about in several essays, the option to throw away and start over is tempting because code is easier to write than to read, and more fun to write than to maintain, so your natural inclination will always be to start over even when that isn't really the best thing to do.

I think that your version control system plays a large role here. If you run a distributed version control system with easy branching (git, mercurial, these days), then you'll be able to prototype easier, and refactor easier, all while still having a valid working copy. Anything else requires so much more discipline.

As a development manager in this organisation, I'm "not allowed" to write production code.
I (ab)use that rule to knock out quick, dirty proof-of-concept code that addresses one or other sticking point, then I check it in to source control and point a "proper" dev at it and say "Here's how it's done, now do it properly."
That's as close as we get to "one to throw away" here, and it's probably taken me a couple of hours max to knock together. Spending time putting in things like error handling, boundary-checking and all the other bits that make good code would be a waste of time for this sort of work, yet it means that the guys who are getting paid to write production code can spend their time writing production code and don't have excuses like "it's only a prototype" when it comes to code-review time.
Building one to throw away is too often used as an excuse for not doing the job properly. That means that you don't actually encounter enough of the issues in the process to learn enough to make it a good use of anyone's time. And doing it properly, only to throw it away, is even more wasteful.
As several people have previously said, the most important feature in any software is that it ships. With that in mind, I'd build "one to get people to pay me for" any day, and my mercilessness in terms of refactoring is to allow only enough of it to get a product that works and can be reasonably maintained.

It is easy to build one to throw away on any configuration management system that supports the concept of a branch. If you are introducing a radical design change into an existing system that is in the field and is the source of your paycheck; you darn well better branch; prototype; and throw it away if it doesn't work.
Refactoring a large legacy cash cow system often leads to plain old fashioned hacking. Refactoring just sounds a lot better than hacking I guess.

Related

What are some good strategies to fix bugs as code becomes more complex?

I'm "just" a hobbyist programmer, but I find that as my programs get longer and longer the bugs get more annoying--and harder to track. Just when everything seems to be running smoothly, some new problem will appear, seemingly spontaneously. It may take me a long time to figure out what caused the problem. Other times I'll add a line of code, and it'll break something in another unit. This can get kind of frustrating if I thought everything was working well.
Is this common to everyone, or is it more of a newbie kind of thing? I hear about "unit testing," "design frameworks," and various other concepts that sound like they would decrease bugginess, make my apps "robust," and everything easy to understand at a glance :)
So, how big a deal are bugs to people with professional training?
Thanks -- Al C.
The problem of "make a fix, cause a problem elsewhere" is very well known, and is indeed one of the primary motivations behind unit testing.
The idea is that if you write exhaustive tests for each small part of your system independently, and run them on the entire system every time you make a change anywhere, you will see the problem immediately. The main benefit, however, is that in the process of building these tests you'll also be improving your code to have less dependencies.
The typical solution to these sort of problems is to reduce coupling; make different parts less dependent on one another. More experienced developers sometimes have habits or design skills to build systems in this manner. For example, we use interfaces and implementations rather than classes; we use model-view-controller for user interfaces, etc. In addition, we can use tools that help further reduce dependencies, like "Dependency injection" and aspect oriented programming.
All programmers make mistakes. Good and experienced programmers build their programs so that it is easier to find the mistakes and restrict their effects.
And it is a big deal for everyone. Most companies spend more time on maintenance than on writing new code.
Are you automating your tests? If you do not, you're signing up creating bugs without finding them.
Are you adding tests for bugs as you fix them? If you do not, you are signing up for creating the same bugs over and over.
Are you writing unit tests? If not, you are signing up for long debugging sessions when a test fails.
Are you writing your unit tests first? If not, your unit tests will be hard to write when your units are tightly coupled.
Are you refactoring mercilessly? If not, every edit will become more difficult and more likely to introduce bugs. (But make sure you have good tests, first.)
When you fix a bug, are you fixing the entire class? Don't just fix the bug; don't just fix similar bugs throughout your code; change the game so you can never create that kind of bug again.
Bugs are a big deal to everyone. I've always found that the more I program, the more I learn about programming in general. I cringe at the code I wrote a few years back!! I started out as a hobbyist and liked it so much that I went to engineering college to get a Computer Science Engineering major (I am in my final semester). These are the things that I have learned :
I take time to actually design what I am going to write and document the design. It really eliminates a lot of problems down the line. Whether the design is as simple as writing down a few points on what I am going to write or full blown UML modeling (:( ) doesn't matter. Its the clarity of thought and purpose and having material to look back at when I come back to the code after a while that matter the most.
No matter what language I write in, keeping my code simple and readable is important. I think that it is extremely important not to over complicate the code and at the same time not to over simplify it. (Hard learned lesson!!)
Efficiency optimizations and fancy tricks should be applied at the end, only when necessary and only if they are needed. Another thing is that I apply them only If I really know what I am doing and I always test my code!
Learning language dependant details helps me keep my code bug free. For instance I learned that scanf() is evil in C!
Others have already commented on the zen of writing tests. I would like to add that you should always do regression tests. (i.e. Write new code, test all parts of your code to see if it breaks)
Keeping a mental picture of code is hard at times, so I always document my code.
I use methods to make sure that there is a bare minimum dependence between different parts of my code. Interfaces, class hierarchies etc. (Decoupled design)
Thinking before I code and being disciplined in whatever I write is another crucial skill. I know people who don't format their code so its readable (Shudder!).
Reading other peoples source to learn best practices is good. Making my own list is better!. When working in a team, there must be a common set of them.
Don't be paralyzed by analysis. Write tests, then code, then execute and test. Rinse wash repeat!
Learning to read over my own code and combing it for mistakes is important. Improving my arsenal of debugging skills was a great investment. I keep them sharp by helping my classmates fix bugs regularly.
When there is a bug in my code, I assume its my mistake, not the computers and work from there. That is a state of mind that really helps me.
A fresh pair of eyes aids in debugging. Programmers tend to miss even the most obvious errors in their own code when exhausted. Having someone to show your code to is great.
having someone to throw ideas at and not be judged is important. I talk to my mom (who is not a programmer) , throw ideas at her and find solutions. She helps me bounce my ideas back and forth and refine them. If she is unavailable, I talk to my pet cat.
I am not so be discouraged by bugs anymore. I've learned to love removing bugs almost as much as programming.
Using version control has really helped me manage different ideas I get while coding. That helps reduce errors. I recommend using git or any other version control system you might like.
As Jay Bazzuzi said - Refactor code. I just added this point after reading his answer, to keep my list complete. All credit goes to him.
Try to write reusable code. Reuse code, both yours and from libraries. Using libraries which are bug free to do some common tasks really reduces bugs (sometimes).
I think the following quote says it best - "If debugging is the art of removing bugs, programming must be the art of putting them in."
No offense to anyone who disagrees. I hope this answer helps.
Note
As others Peter has pointed out, use Object Oriented Programming if you are writing a large amount of code. There is a limit to code length after which it becomes harder and harder to manage if written procedurally. I like procedural for smaller stuff, like playing with algorithms.
There are two ways to write error-free programs; only the third one works. ~Alan J. Perlis
The only way for errors to occur in a program is by being put there by the author. No other mechanisms are known. Programs can't acquire bugs by sitting around with other buggy programs. ~Harlan Mills
Obviously, bugs are a big deal to any programmer. Just look through the list of questions on Stack Overflow to see this illustrated.
The difference between a hobbyist and an experienced professional is that the pro will be able to use his experience to code in a more "defensive" way, avoiding many types of bugs in the first place.
All the other answers are great. I'll add two things.
Source control is mandatory. I'm assuming you're on windows here. VisualSVN Server is free and maybe 4 clicks to install. TortoiseSVN is also free and it integrates into Windows Explorer, getting around the VS Express limitations of no add-ins. If you create too many bugs, you can revert your code and start over. Without source control, this is next to impossible. Plus you can sync your code if you have a laptop and a desktop.
People are going to recommend many techniques like unit testing, mocking, Inversion of Control, Test Driven Development, etc. These are great practices, but don't try to cram it all into your head too quickly. You have to write code to get better at writing code, so work these techniques slowly into your code writing. You have to crawl before you walk and walk before you can run.
Best of luck in your coding adventures!
This is a common newbie thing. As you get more experience, of course, you'll still have bugs, but they'll be easier to find and fix because you'll learn how to make your code more modular (so that changing one thing doesn't have ripple effects everywhere else), how to test it, and how to structure it to fail fast, close to the source of the problem, rather than in some arbitrary place. One very basic but useful thing that doesn't require complex infrastructure to implement is to check the inputs to all functions that have non-trivial precondtions with asserts. This has saved me several times in cases where I would have otherwise gotten weird segfaults and arbitrary behavior that would have been near impossible to debug.
If bugs weren't a problem then I'd be able to write a 100,000 line program in 10 minutes!
Your question is like, "As an amateur doctor, I worry about my patients' health: sometimes when I'm not careful enough, they sicken. Is patients' health a problem for you professional doctors too?"
Yes: it's the central problem, even the only problem (for any sufficiently all-inclusive definition of 'bug').
Bugs are common to everyone -- professional or not.
The larger and more distributed the project, the more careful one must be. One look at any open source bug database (ex: https://bugzilla.mozilla.org/ ) will confirm this for you.
The software industry has evolved various programming styles and standards, which when used right, make wrong code easier to spot or limited in its impact.
Therefore, training has a very positive on code quality... But at the end of the day, bugs still sneak through.
If you're just a hobbyist programmer, learning full bore TDD and OOP may involve more time than you're willing to put in. So, going on the assumption that you don't want to put in the time on them, a few easily digestible suggestions to cut down on bugs are:
Keep each function doing one thing. Be suspect of a function more than, say, 10 lines long. If you think you can break it into two functions, you probably should. Something that will help you control this is naming your functions according to exactly what they are doing. If you find that your names are long and unwieldy then you function is probably doing too many things.
Turn magic strings into constants. That is, instead of using:
people["mom"]
use instead
var mom = "mom";
people[mom]
Design your functions to either do something (command) or get something (query), but not both.
An extremely short and digestible take on OOP is here http://www.holub.com/publications/notes_and_slides/Everything.You.Know.is.Wrong.pdf. If you get this, you've got the gist of OOP and are quite frankly ahead of a lot of professional programmers.
The prevailing wisdom seems to be that the average programmer creates 12 bugs per 1000 lines of code - depends on who you ask for the exact number, but it's always per lines of code - so, the bigger the program, the more the bugs.
Subpar programmers tend to create way more bugs.
Newbies are often trapped by idiosyncrasies of the language, and lacking experience tends towards more bugs too. As you go on, you will get better, but never will you create bug-free code... well I still have bugs, even after 30 years, but that could be just me.
Nasty bugs happen to everyone from pros to hobbyists. Really good programmers get asked to track down really nasty bugs. It's part of the job. You'll know you've made it as a software developer when you stare at a nasty bug for two days and in frustration you shout, "Who wrote this crap!?!?" ... only to realize it was you. :-)
Part of the skill of a software developer is the ability to keep a large set of interrelated items straight in his/her head. It sounds like you're discovering what happens when your mental model of the system breaks down. With practice you will learn to design software that doesn't feel so brittle. There are tons of books, blogs, etc. out there on the subject of software design. And Stack Overflow of course for specific questions.
All that said, here's a couple of things you can do:
A good debugger is invaluable. Often you have to step through your code line by line to figure out what went wrong.
Use a garbage-collected language such as Python or Java if it makes sense for your project. GC will help you focus on making things work instead of getting bogged down by maddening memory errors.
If you write C++, learn to love RAII.
Write LOTS of code. Software is somewhat of an art form. Lots of practice will make you better at it.
Welcome to Stack Overflow!
What really changed my odds against code complexity and bugs was using a coding standart - how to place brackets an so on. It may seem like just boring and useless thing but it really unifies all the code and makes it much easier to read and maintain. So do you use a coding standart?
If you're not well organized, your codebase will become your very own Zebra Puzzle. Adding more code is like adding more people/animals/houses to your puzzle, and soon you have 150 various animals, people, houses and cigarette brands in your puzzle and you realize that it just took you a week to add 3 lines of code because everything is so inter-related that it takes forever to make sure the code still executes how you want it to.
The most popular organizational paradigm seems to be Object Oriented Programming, if you can break your logic down into small units which can be constructed and used independently of each other, then you will find bugs far less painful when they occur.

How often should you refactor?

I had a discussion a few weeks back with some co-workers on refactoring, and I seem to be in a minority that believes "Refactor early, refactor often" is a good approach that keeps code from getting messy and unmaintainable. A number of other people thought that it just belongs in the maintenance phases of a project.
If you have an opinion, please defend it.
Just like you said: refactor early, refactor often.
Refactoring early means the necessary changes are still fresh on my mind. Refactoring often means the changes tend to be smaller.
Delaying refactoring only ends up making a big mess which further makes it harder to refactor. Cleaning up as soon as I notice the mess prevents it from building up and becoming a problem later.
I refactor code as soon as it's functional (all the tests pass). This way I clean it up while it's still fresh in my mind, and before anyone else sees how ugly the first version was.
After the initial check-in I typically refactor every time I touch a piece of code. Refactoring isn't something you should set aside separate time for. It should be something you just do as you go.
You write code with two hats on. The just-get-the-thing-working hat and the I-need-to-understand-this-tomorrow hat. Obviously the second hat is the refactoring one. So you refactor every time you have made something work but have (inevitably) introduced smells like duplicated code, long methods, fragile error handling, bad variable names etc...
Refactoring whilst trying to get something working (i.e. wearing both hats) isn't practical for non-trivial tasks. But postponing refactoring till the next day/week/iteration is very bad because the context of the problem will be gone from your head. So switch between hats as often as possible but never combine them.
I refactor every chance I get because it lets me hone my code into the best it can be. I do this even when actively developing to prevent creating unmaintainable code in the first place. It also oftentimes lets me straighten out a poor design decision before it becomes unfixable.
Three good reasons to refactor:
Your original design (perhaps in a very small area, but design nonetheless) was wrong. This includes where you discover a common operation and want to share code.
You are designing iteratively.
The code is so bad that it needs major refurbishment.
Three good reasons not to refactor:
"This looks a little bit messy".
"I don't entirely agree with the way the last guy did this".
"It might be more efficient". (The problem there is 'might').
"Messy" is controversial - there is a valid argument variously called "fixing broken windows", or "code hygiene", which suggests that if you let small things slide, then you will start to let large things slide too. That's fine, and is a good thing to bear in mind, but remember that it's an analogy. It doesn't excuse shunting stuff around interminably, in search of the cleanest possible solution.
How often you refactor should depend on how often the good reasons occur, and how confident you are that your test process protects you from introducing bugs.
Refactoring is never a goal in itself. But if something doesn't work, it has to be fixed, and that's as true in initial development as it is in maintenance. For non-trivial changes it's almost always better to refactor, and incorporate the new concepts cleanly, than to patch a single place with great lumps of junk in order to avoid any change elsewhere.
For what it's worth, I think nothing of changing an interface provided that I have a handle on what uses it, and that the scope of the resulting change is manageable.
As the book says, You refactor when
you add some code... a new feature
when you fix a bug / defect
when you do a code-review with multiple people
when you find yourself duplicating something for the third time.. 3 strikes rule
I try to go by this motto: leave all the code you touch better than it was.
When I make a fix or add a feature I usually use that opportunity to do limited refactoring on the impacted code. Often this makes it easier to make my intended change, so it actually doesn't cost anything.
Otherwise, you should budget dedicated time for refactoring, if you can't because you are always fighting fires (I wonder why) then you should force yourself to refactor when you find making changes becomes much harder than it should and when "code smells" are just unbearable.
A lot of times when I'm flushing out ideas my code starts out very tightly coupled and messy. As I start polishing the idea more the logical separations start becomming more and more clear and I begin refactoring. It's a constant process and as everyone suggests should be done 'Early and Often'.
I refactor when:
I'm modifying code and I'm confused by it. If it takes me a while to sift it out, it needs refactoring.
I'm creating new code and after I've got it "working". Often times I'll get things working and as I'm coding I realize "Hey, I need to redo what I did 20 lines up, only with a few changes". At that point I refactor and continue.
The only thing that in my opinion should stop you from doing this is time constraints. Like it or not, sometimes you just don't have the time to do it.
It's like the National Parks -- Always leave it a little better than you found it.
To me, that means any time I open code, and have to scratch my head to figure out what's going on, I should refactor something. My primary goal is for readability and understanding. Usually it's just renaming a variable for clarity. Sometimes it's extracting a method -
For example (trivial), If I came across
temp = array[i];
array[i] = array[j];
array[j] = temp;
I would probably replace that with a swap(i,j) method.
The compiler will likely inline it anyways, and a swap() tells everyone semantically what's going on.
That being said, with my own code (starting from scratch), I tend to refactor for design.
I often find it easier to work in a concrete class. When its done and debugged, then I'll pull the old Extract Interface trick.
I'll leave it to a co-worker to refactor for readability, as I'm too close to the code to notice the holes. After all, I know what I meant.
Refactor opportunistically! Do it whenever it's easy.
If refactoring is difficult, then you're doing it at the wrong time (when the code doesn't need it) or on the wrong part of the code (where there are better efficiencies to be gained elswhere). (Or you're not that good at refactoring yet.)
Saving refactoring for "maintenance" is a tautology. Refactoring is maintenance.
I refactor every time I read anything and can make it more readable. Not a major restructuring. But if I think to myself "what does this List contain? Oh, Integers!" then I'll change it to List<Integer>. Also, I often extract methods in the IDE to put a good name of a few lines of code.
The answer is always, but more specifically:
Assuming you branch for each task, then on each new branch, before it goes to QA.
If you develop all in the trunk, then before each commit.
When maintaining old code, use the above for new tasks, and for old code do refactoring on major releases that will obtain extra QA.
Everytime you encounter a need. At least when you're going to change a piece of code that needs refactoring.
I localize refactoring to code related to my current task. I try to do my refactoring up front. I commit these changes separately since from a functional standpoint, it is unrelated to the actual task. This way the code is cleaner to work with and the revision history is also cleaner.
Continuously, within reason. You should always be looking for ways to improve your software, but you have to be careful to avoid situations where you’re refactoring for the sake of refactoring (Refactorbation).
If you can make a case that a refactoring will make a piece of code faster, easier to read, easier to maintain or easier or provide some other value to the business I say go for it!
"Refactor early, refactor often" is a productive guideline. Though that kind of assumes that you really know the code. The older a system gets, the more dangerous refactoring becomes, and the more deliberation is required. In some cases refactoring needs to be managed tasks, with effort level and time estimates, etc.
If you have a refactoring tool that will make the changes safely, then you should refactor whenever the code will compile, if it will make the code clearer.
If you do not have such a tool, you should refactor whenever the tests are green, if it will make the code clearer.
Make small changes -- rename a method to make what it does clearer. Extract a class to make a group of related variables be clearly related. Refactoring is not about making large changes, but about making things cleaner minute by minute. Refactoring is clearing your dishes after each meal, instead of waiting until every surface is covered in dirty plates.
Absolutely as soon as it seems expedient. If you don't the pain builds up.
Since switching to Squeak (which I now seem to mention every post) I've realised that lots of design questions during prototyping fall away because refactoring is really easy in that environment. To be honest, if you don't have an environment where refactoring is basically painless, I recommend that you try squeak just to know what it can be like.
Refactoring often can often save the day, or at least some time. There was a project I was working on and we refactored all of our code after we hit some milestone. It was a great way because if we needed to rip code out that was no longer useful it made it easier to patch in whatever new thing we needed.
We're having a discussion at work on this right now. We more or less agree that "write it so it works, then fix it". But we differ on the time perspective. I am more "fix it right away", my coworker is more "fix it in the next iteration".
Some quotes that back him up:
Douglas Crockford, Senior Javascript Architect Yahoo:
refactor every 7th sprint.
Ken Thompson (unix man):
Code by itself almost rots and it's
gonna be rewritten. Even when nothing
has changed, for some reason it rots.
I would like that once done with a task the code submitted is something you can come back to in 2 months and think "yes, I did well here". I do not believe that it is easy to find time to come back later and fix it. believing this is somewhat naive from my point of view.
Edit: spelling error
I think you should refactor something when you're currently working on a part of it. Means if you have to enhance function A, then you should refactor it before (and afterwards?). If you don't do anything with this function, then leave it as it is, as long as you have something else to do.
Do not refactor a working part of the system, unless you already have to change it.
There are many views on this topic, some linked to a particular methodology or approach to development. When using TDD, refactor early and often is, as you say, a favoured approach.
In other situations you may refactor as and when needed. For example, when you spot repetitious code.
When following more traditional methods with detailed up-front design, the refactoring may be less often. However, I would recommend not leaving refactoring until the end of a project. Not only will you be likely to introduce problems, potentially post-UAT, it is often also the case that refactoring gets progressively more difficult. For this reason, time constraints on the classic project cause refactoring and extra testing to be dropped and when maintenance kicks in you may have already created a spaghetti-code monster.
If it ain't broke, don't refactor it.
I'd say the time to refactor belongs in the initial coding stage, and ou can do it as often and as many times as you like. Once its in the hands of a customer, then it becomes a different matter. You do not want to make work for yourself 'tidying' up code only to find that it gets shipped and breaks something.
The time after initial delivery to refactor is when you say you'll do it. When the code gets a bit too smelly, then have a dedicated release that contains refactorings and probably a few more important fixes. That way, if you do happen to break something, you know where it went wrong, you can much more easily fix it. If you refactor all the time, you will break things, you will not know that its broken until it gets QAd, and then you'll have a hard time trying to figure out whether the bugfix/feature code changes caused the problem, or some refactoring you performed ages ago.
Checking for cbreaking changes is a lot easier when the code looks roughly like it used to. Refactor a lot of code structure and you can make it next to impossible, so only refactor when you seriously mean to. Treat it like you would any other product code change and you should be ok.
I think this Top 100 Wordpress blog post may have some good advice.
http://blog.accurev.com/2008/09/17/dr-strangecode-or-how-i-learned-to-stop-worrying-and-love-old-code/

When is it good (if ever) to scrap production code and start over? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was asked to do a code review and report on the feasibility of adding a new feature to one of our new products, one that I haven't personally worked on until now. I know it's easy to nitpick someone else's code, but I'd say it's in bad shape (while trying to be as objective as possible). Some highlights from my code review:
Abuse of threads: QueueUserWorkItem and threads in general are used a lot, and Thread-pool delegates have uninformative names such as PoolStart and PoolStart2. There is also a lack of proper synchronization between threads, in particular accessing UI objects on threads other than the UI thread.
Magic numbers and magic strings: Some Const's and Enum's are defined in the code, but much of the code relies on literal values.
Global variables: Many variables are declared global and may or may not be initialized depending on what code paths get followed and what order things occur in. This gets very confusing when the code is also jumping around between threads.
Compiler warnings: The main solution file contains 500+ warnings, and the total number is unknown to me. I got a warning from Visual Studio that it couldn't display any more warnings.
Half-finished classes: The code was worked on and added to here and there, and I think this led to people forgetting what they had done before, so there are a few seemingly half-finished classes and empty stubs.
Not Invented Here: The product duplicates functionality that already exists in common libraries used by other products, such as data access helpers, error logging helpers, and user interface helpers.
Separation of concerns: I think someone was holding the book upside down when they read about the typical "UI -> business layer -> data access layer" 3-tier architecture. In this codebase, the UI layer directly accesses the database, because the business layer is partially implemented but mostly ignored due to not being fleshed out fully enough, and the data access layer controls the UI layer. Most of the low-level database and network methods operate on a global reference to the main form, and directly show, hide, and modify the form. Where the rather thin business layer is actually used, it also tends to control the UI directly. Most of this lower-level code also uses MessageBox.Show to display error messages when an exception occurs, and most swallow the original exception. This of course makes it a bit more complicated to start writing units tests to verify the functionality of the program before attempting to refactor it.
I'm just scratching the surface here, but my question is simple enough: Would it make more sense to take the time to refactor the existing codebase, focusing on one issue at a time, or would you consider rewriting the entire thing from scratch?
EDIT: To clarify a bit, we do have the original requirements for the project, which is why starting over could be an option. Another way to phrase my question is: Can code ever reach a point where the cost of maintaining it would become greater than the cost of dumping it and starting over?
Without any offense intended, the decision to rewrite a codebase from scratch is a common, and serious management mistake newbie software developers make.
There are many disadvantages to be wary of.
Rewrites stop new features from being developed cold for months/years. Few, if any companies can afford to stand-still for this long.
Most development schedules are difficult to nail. This rewrite will be no exception. Amplify the previous point by, now, a delay in development.
Bugs that were fixed in the existing codebase through painful experience will be re-introduced. Joel Spolsky has more examples in this article.
Danger of falling victim to the Second-system effect -- in summary, ``People who have designed something only once before try to do all the things they "didn't get to do last time", loading the project up with all the things they put off while making version one, even if most of them should be put off in version two as well.''
Once this expensive, burdensome rewrite is completed, the very next team to inherit the new codebase is likely to use the same excuses for doing another rewrite. Programmers hate learning someone else's code. No one writes perfect code because perfection is so subjective. Find me any real-world application and I can give you a damning indictment and rationale for doing a from-scratch rewrite.
Whether you ultimately rewrite from scratch or not, beginning a refactoring phase now is a good way to both really sit down and understand the problem so that the rewrite will go more smoothly if truly called for, as well as giving the existing codebase an honest look to really see if a rewrite's needed.
To actually scrap and start over?
When the current code doesn't do what you would like it to do, and would be cost prohibitive to change.
I'm sure someone will now link Joel's article about Netscape throwing their code away and how it's oh-so-terrible and a huge mistake. I don't want to talk about it in detail, but if you do link that article, before you do so, consider this: the IE engine, the engine that allowed MS to release IE 4, 5, 5.5, and 6 in quick succession, the IE engine that totally destroyed Netscape... it was new. Trident was a new engine after they threw away the IE 3 engine because it didn't provide a suitable basis for their future development work. MS did that which Joel says you must never do, and it is because MS did so that they had a browser that allowed them to completely eclipse Netscape. So please... just meditate on that thought for a moment before you link Joel and say "oh you should never do it, it's a terrible idea".
A rule of thumb I've found useful is that if given a code base, if I have to re-write more than 25% of the code to make it work or modify it based upon new requirements, you may as well re-write it from scratch.
The reasoning is that you can only patch a body of code so far; beyond a certain point, it's quicker to do over.
There's an underlying assumption that you have a mechanism (such as thorough unit and/or system tests) that will tell you whether your re-written version is functionally equivalent (where it needs to be) as the original.
If it requires more time to read and understand the code (if that is even possible)
than it would to rewrite the entire application, I say scrap it and start over.
Be very carefull with this:
Are you sure you aren't just being lazy and not bothering to read the code
Are you being arrogant about the great code you will write compared to the rubbish anyone else produced.
Remember tested-working code is worth a lot more than imaginary yet-to-be-written code
In the words of our estemed host and overlord, Joel - things you should never do,
it's not always wrong to abandon working code - but you have to be sure about the reason.
I saw an application re-architected within 2 years of its introduction into production, and others rewritten in different technologies (one was C++ - now Java). Both efforts were were not, to my mind, successful.
I prefer a more evolutionary approach to bad software. If you can "componentize" your old app such that you can introduce your new requirements and interface with the old code, you can ease yourself into the new environment without having to "sell" the zero-value (from a biz perspective) investment in rewriting.
Suggested approach - write unit tests for the functionality with which you wish to interface to 1) ensure the code behaves as you expect and 2) provide a safety net for any refactoring that you may wish to do on the old base.
Bad code is the norm. I think IT gets a bad rap from business for favoring rewrites/rearchitecting/etc. They pay the money and "trust" us (as an industry) to deliver solid, extensible code. Sadly, business pressures frequently result in shortcuts that make the code unmaintainable. Sometimes it's bad programmers... sometimes bad situations.
To answer your rephrased question... can code maintenance costs ever exceed rewriting costs... the answer is clearly yes. I don't see anything in your examples, however, that lead me to believe this is your case. I think those issues can be addressed with tests and refactoring.
In terms of business value, I would think it's extremely rare that a real case can be made for a rewrite due solely to the internal state of the code. If the product's customer-facing and is currently live and bringing in money (i.e. is not a mothballed or unreleased product), then consider that:
You already have customers using it. They're familiar with it, and might have built some of their own assets around it. (Other systems that interface to it; products based on it; processes they'd have to change; staff they'd maybe have to retrain). All of this costs the customer money.
Re-writing it might cost less in the long term than making difficult changes and fixes. But you can't quantify that yet, unless your app is no more complex than Hello World. And a re-write means a re-test and a redeploy, and probably an upgrade path for your customers.
Who says the re-write will be any better? Can you honestly say your firm is writing sparkly code now? Have the practices that turned the original code to spaghetti been corrected? (Even if the main culprit was a single developer, where were his peers and management, ensuring quality through reviews, testing, etc.?)
In terms of technical reasons, I'd suggest it could be time for a major rewrite if the original has some technical dependencies that have become problematic. e.g. a third party dependency that's now out of support, etc.
In general though, I think the most sensible move is to refactor piece by piece (very small pieces if it's really that bad), and improve the internal architecture incrementally rather than in one big drop.
Two threads of thought on this one: Do you have the original requirements? Do you have confidence that the original requirements are accurate? What about test plans or unit tests? If you have those things in place it might be easier.
Putting on my customer hat, does the system work or is it unstable? If you've got something that's unstable you've got an argument to change; otherwise you're best of refactoring it bit by bit.
I think the line in the sand is when basic maintenance is taking 25% - 50% longer than it should. There comes a time when maintaining legacy code becomes too costly. A number of factors contribute to the final decision. Time and cost being the most important factors I think.
If there are clean interfaces and you can cleanly delineate module boundaries, then it might be worth refactoring it module by module or layer by layer in order to allow you to migrate existing customers forward into cleaner more stable codebases, and over time, after you've refactored every module, you will have rewritten everything.
But, based on the codereview, doesn't sound like there would be any clean boundaries.
I wonder if the people who vote for scrapping and starting over have ever successfully refactored a large project, or at least seen a large project in poor condition that they think could use a refactoring?
If anything, I err on the opposite side: I've seen 4 large projects that were a mess, that I advocated refactoring as opposed to rewriting. On a couple, there was barely a single line of original code that remained, and major interfaces changed in significant ways, but the process never involved the entire project failing to function as well as it originally did, for any more than a week. (And top-of-trunk was never broken).
Perhaps a project exists that is so severely broken that to attempt to refactor it would be doomed to failure, or perhaps one of the previous projects I refactored would have been better served by a "clean re-write", but I'm not sure I'd know how to recognize it.
I agree with Martin. You really need to weigh the effort that will be involved in writing the app from scratch against the current state of the app and how many people use it, do they like it, etc. Often we may want to completely start from scratch, but the cost far outweighs the benefit. I come across bits of ugly looking code all the time, but I soon realize that some of these 'ugly' areas are really bug fixes and make the program work correctly.
I would try to consider the architecture of the system and see whether it is possible to scrap and rewrite specific well defined components without starting everything from scratch.
What would usually happen is that you can either do that (and then sell that to the customer/management), or that you find out that the code is such a horrible and tangled mess that you become even more convinced that you need a rewrite and have more convincing arguments for it (including: "if we engineer it right, we would never need to scrap the whole thing and do a third rewrite).
Slow maintenance would eventually cause that architectural drift that would make a rewrite more expensive later.
Scrap old code early and often. When in doubt, throw it out. The hard part is convincing non-technical folks of the cost-to-maintain.
So long as the value derived appears to be greater than the cost to operate and maintain, there's still positive value flowing from the software. The question surrounding a rewrite this: "will we get even more value from a rewrite?" Or alternatively "How much more value will we get from a rewrite?" How many person-hours of maintenance will you save?
Remember, the rewrite investment is once only. The return on the rewrite investment lasts forever. Forever.
Focus the value question down to specific issues. You listed a bunch of them above. Stick with that.
"Will we get more value by reducing cost through
dropping the junk that we don't use
but still have to wade through?"
"Will we get more value from dropping the junk that's unreliable and breaks?"
"Will we get more value if we understand it -- not by documenting, but by replacing with something we built as a team?"
Do you homework. You'll have to confront the following show-stoppers.
These will originate somewhere in your executive foodchain from someone who'll respond as follows:
"Is it broken?" And when you say "It's not crashed as such," They'll say "It's not broke - don't fix it."
"You've done the code analysis, you understand it, you no longer need to fix it."
What's your answer to them?
That's only the first hurdle. Here's the worst possible situation. This doesn't always happen, but it does happen with alarming frequency.
Someone in your executive foodchain will have this thought:
"A rewrite doesn't create enough value. Rather than simply rewrite, let's expand it." The justification is that by creating enough value, users are more likely to buy in to the rewrite.
A project where scope is expanded -- artificially -- to add value is usually doomed.
Instead, do the smallest rewrite you can to replace the darn thing. Then expand to fit real needs and add value.
You can only give a definite yes to rewriting in case if you know completely how your application works (and by completely I mean it, not just having a general idea of how it should work) and you know more or less exactly how to make it better. Any other cases and it's a shot in the dark, it depends on too much things. Perhaps gradual refactoring would be safer if it is possible.
If possible, I typically would prefer to rewrite smaller portions of the code over time when I need to refactor a baseline. There are typically many smaller issues such as magic number, poor commenting, etc. that tend to make the code look worse than it actually is. So, unless the baseline is just awful, keep the code and just make improvements at the same time you are maintaining the code.
If refactoring requires a lot of work, I recommend laying out a small re-design plan/todo list that gives you a list of things to work on in order so that you can bring the baseline to a better state. Starting from scratch is always a risky move and you are not guaranteed that the code will be better when you are finished. Using this technique, you will always have a working system that improves over time.
Code with excessively high cyclomatic complexity (like over 100 in a large number of modules) is a good clue. Also, how many bugs does it have / KLOC? How critical are the bugs? How often are bugs introduced when bug fixes are made. If your answer is a lot (I cant remember norms right now), then a rewrite is warranted.
As early as possible. Whenever you get a premonition that your code is slowly turning into an ugly beast that is very likely to consume your soul and give you headaches, and you know the problem is in the underlying structure of the code (so any fix would be a hack, e.g. introduce a global variable), then it's time to start over.
For some reasons people don't like throwing away precious code, but if you feel your better off starting over, you are probably right. Trust your instinct and remember that it wasn't a waste of time, it taught you one more way of NOT approaching the problem. You could (should) always use a version control system so your baby is never really lost.
I do not have any experience with using metrics for this myself, but the
article
"Software Maintainability Metrics Models in Practice" discusses
more or less the same question asked here for two case studies they did.
It starts with the following editor's note:
In the past, when a maintainer
received new code to maintain, the
rule-of-thumb was "If you have to
change more than 40 percent of someone
else's code, you throw it out and
start over." The Maintainability Index
[MI] addressed here gives a much more
quantifiable method to determine when
to "throw it out and start over." This
work was sponsored by the U.S. Air
Force Information Warfare Center and
the U.S. Department of Energy [DOE],
Idaho Field Office, DOE Contract No.
DE-AC07-94ID13223.)
I think the rule was...
The first version is always a throw away
So, if you learned your lesson(s), or his/her lessons, then you can go ahead and write it fresh now that you understand your problem domain better.
Not that there aren't parts that can/should be kept. Tested code is the most valuable code, so if it isn't deficient in any real way other than style, no reason to toss it all out.
When is it good (if ever) to scrap production code and start over?
Never had to do this, but logic would dictate (to me, anyway) that once you pass the inflection point where you're spending more time reworking and fixing bugs in the existing code base than you are adding new functionality, it's time to trash the old stuff and get a fresh start.
If it requires more time to read and understand the code (if that is even possible) than it would to rewrite the entire application, I say scrap it and start over.
I have never completely thrown out code. Even when going from a foxpro system to a c# system.
If the old system worked then why just throw it out?
I have come across a few really bad system. Threads being used where not needed. Horrible inheritance and abuse of interfaces.
It is best to understand what the old code is doing and why it is doing it. Then change it so that it is not confusing.
Of course if the old code doesn't work. I mean can't even compile. Then you might be justified in just starting over. But how often does that actually happen?
Yes, it totally can happen. I've seen money be saved by doing it.
This is not a tech decision, it's a business decision. Code rewrites are long term gains, while "if it ain't totally broke..." is a short term gain. If you are in a first year startup that is focused on getting a product out the door, the answer is usually to just live with it. If you're in an established company, or the errors with the current systems are causing more workload, therefor more company money.. then they might go for it.
Present the problem as best as you can to your GM, use dollar values where you can. "I don't like dealing with it" means nothing. "It'll take twice the time to do everything until this is fixed" means a lot.
I think there are a number of issues here that depend largely on where you are at.
Is the software working well from a customer perspective? (If yes be very careful about changes). I would think there would be little point re-witting unless you were expanding the feature set if the system was working. And are you planning to expand the features and customer base of the software? If so then you have much more reason to change.
As much as anything just trying to understand some else's code even if well written can be difficult, when badly written I would imagine almost impossible. What you describe sounds like something that would be very difficult to expand.
I would take into consideration if the application does what it is intended to do, is required for you to ever make modifications, and are you confident that the app has been thoroughly tested in all scenarios that it will be used in.
Do not invest the time if the app does not need alterations. However, if it doesn't function as you need and you need to control the hours and time invested to make corrections, scrap it and re-write to the standards that your team can support. There's nothing worse than terrible code that you have to support / decipher but still have to live with. Remember, Murphy's Law says it will 10 at night when you'll have to make things work, and that is never productive.
Production code always has some value. The only case where I would truly throw it all out and start again is if we determine the intellectual property is irrevocably contaminated. For example if someone brought large amounts of code from a previous employer, or a large percentage of the code was ripped from a GPLd codebase.
I'm going to post this book every time I see a discussion on Refactoring. Everyone should read "Working Effectively with Legacy Code" by Michael Feathers. I found it to be an excellent book - if nothing else, it's a fun read, and motivational.
When the code has reached a point that is not maintainable or extensible anymore. Is full of short-term hacky fixes. It has lots of coupling. It has long (100+lines) methods. It has database access in the UI. It generates a lot of random, impossible to debug errors.
Bottom line: When maintaining it is more expensive (i.e. takes longer) than rewriting it.
I used to believe in just re-write from scratch, but it is wrong.
http://www.joelonsoftware.com/articles/fog0000000069.html
Changed my mind.
What I would suggested is figuring out a way to properly refactor the code. Keep all existing functionality and test as you go. We have all seen horrible code bases, but it is important to keep the knowledge over time you application has.

At what point does refactoring become not worth it?

Say you have a program that currently functions the way it is supposed to. The application has very poor code behind it, eats up a lot of memory, is unscalable and would take major rewriting to implement any changes in functionality.
At what point does refactoring become less logical then a total rebuild?
Joel wrote a nice essay about this very topic:
Things You Should Never Do, Part 1
The key lesson I got from this is that although the old code is horrible, hurts your eyes and your aesthetic sense, there's a pretty good chance that a lot of that code is patching undocumented errors and problems. Ie., it has a lot of domain knowledge embedded in it and it will be difficult or impossible for you to replicate it. You'll constantly be hitting against bugs-of-omission.
A book I found immensely useful is Working Effectively With Legacy Code by Michael C. Feathers. It offers strategies and methods for approaching even truly ugly legacy code.
One benefit of refactoring over rebuilding is that IF you can do refactoring step by step, i.e. in increments, you can test the increments in the context of the whole system, making development and debugging faster.
Old and deployed code, even when ugly and slow, has the benefit of having been tested thoroughly, and this benefit is lost if you start from scratch.
An incremental refactoring approach also has helps to ensure that there is always a product available which can be shipped (and it's improving constantly).
There is a nice article on the web about how Netscape 6 was written from scratch and it was business-wise a bad idea.
Robert L. Glass suggests that
Modification of reused code is particularly error-prone. If more than 20 to 25 percent of a component is to be revised, it is more efficient and effective to write it from scratch.
Well, the simplest answer is if it will take longer to refactor than it will to rebuild, then you should just rebuild.
If it's a personal project then you might want to rebuild it anyway as you will probably learn more from building from scratch than you would from refactoring, and that's one big objective of personal projects.
However, in a professional time-limited environment, you should always go with whatever costs the company the least amount of money (for the same payoff) in the long run, which means choosing whichever takes less time.
Of course, it can be a little more complicated than that. If other people can be working on features while the refactoring is being done, then that might be a better choice over having everyone wait for a completely new version to be built. In that case rebuilding might take less time than just the refactoring would have taken, but you need to take the entire project and all contributors of the project in to account.
When you spend more time refactoring than actually writing code.
At the point where the software doesn't do what it's supposed to do. Refactoring (changing the code without changing the functionality) makes sense if and only if the functionality is "as intended".
If you can afford the time to completely rebuild the app, don't need to improve functionality incrementally, and don't wish to retain any of the existing code then rewriting is certainly a viable alternative. You can, on the other hand, use refactoring to do an incremental rewrite by slowly replacing the existing functions with equivalent functions that are better written and more efficient.
If the application is very small, then you can rewrite it from scratch. If the application is big, never do it. Rewrite it progressively, one step at a time validating you didn't break anything.
The application is the specification. If your rewrite it from scratch you will most likely run into a lots of insidious bugs because "no one knew that the call to this function was supposed to return 3 in that very specific case" (undocumented behaviour...).
It's always more fun to rewrite from scratch so your brain might trick you into thinking it's the right choice. Be careful, it's most likely not.
I've worked with such applications in the past. The best approach I've found is a gradual one: When you are working on the code, find things that are done multiple times, group them together in functions. Keep a notebook (you know, a real one, with paper, and a pencil or pen) so that you can mark your progress. Use that in combination with your VCS, not instead of it. The notebook can be used to provide an overview of the new functions you've created as part of the refactoring, and the VCS of course fills in the blanks for the details.
Over time, you will have consolidated a lot of code into more appropriate places. Code duplication during this period of time is going to be next to impossible, so just do it as best as you can until you've reached a point where you can really start the refactoring process, auditing the entire code base and working on it as a whole.
If you've not enough time for that process (which will take a very long time), then rewriting from scratch using a test-first approach is probably better.
One option would be to write unit tests to cover the existing application and then start to refactor it bit by bit, using the unit tests to make sure everything works as before.
In an ideal world you'd already have unit tests for the program, but given your comments about the quality of the app I'm guessing you don't...
No document, no original writer, no test case, and a bunch of remaining bugs.
Uncle Bob weighs in with the following:
When is a redesign the right strategy?
I’m glad you asked that question. Here’s the answer. Never.
Look, you made the mess, now clean it up.
I’ve not had much luck with small incremental changes when the code I inherit is really bad. In theory the small incremental approach sounds good, but in practice all it ends up with is a better, but still poorly designed application that everyone thinks is now YOUR design. When things break, people no longer think it is because of the previous code, it now becomes YOUR fault. So, I would not use the word redesign, refactor or anything else that implies to a manager type that you are changing things to your way unless I was really going to do it my way. Otherwise, even though you may have fixed dozens of problems, any problems that still existed (but weren’t discovered) are now going to be attributed to your rework. And be assured that if the code is bad then your fixes will uncover a lot more bugs that were simply ignored before because the code was so bad to begin with.
If you truly know how to develop software systems then I would do a redesign of the whole system. If you don’t TRULY know how to design GOOD software then I’d say stick with the small incremental changes as you may otherwise end up with a code base that is just as bad as the original.
One mistake that is often made when redesigning is that people ignore the original code base. However, redesign does not have to mean totally ignore the old code. The old code still had to do what your new code has to do, so in many cases the steps you need are already in the old code. Copy and Paste then tweak works wonders when redesigning systems. I have found that in many cases, redesigning and rewriting an application and stealing snippets from the original code is far quicker and much more reliable than small incremental changes.

Rewrite or repair?

I'm sure you have all been there, you take on a project where there is a creaky old code base which is barely fit for purpose and you have to make the decision to either re-write it from scratch or repair what already exists.
Conventional wisdom tends to suggest that you should never attempt a re-write from scratch as the risk of failure is very high. So what did you do when faced with this problem, how did you make the decision and how did it turn out?
It really depends on how bad it is.
If it's a small system, and you fully understand it, then a rewrite is not crazy.
On the other hand, if it's a giant legacy monster with ten million lines of undocumented mystery code, then you're really going to have a hard time with a full rewrite.
Points to consider:
If it looks good to the user, they
won't care what kind of spaghetti
mess it is for you. On the other
hand, if it's bad for them too, then
it's easier to get agreement (and
patience).
If you do rewrite, try to do it one
part at a time. A messy,
disorganized codebase may make this
difficult (i.e, replacing just one
part requires a rewrite of large
icebergs of dependency code), but if
possible, this makes it a lot easier
to gradually do the rewrite and get
feedback from users along the way.
I would really hesitate to take on a giant rewrite project for a large system without being able to release the new edition one part at a time.
Just clean up the code a little bit every time you work with it. If there isn't one already, setup a unit testing framework. All new code should get tests written. Any old code you fix as a result of bugs, try to slide in tests too.
As the cleanups progress, you'll be able to sweep more and more of the nasty code into encapsulated bins. Then you can pick those off one by one in the future.
A tool like javadoc or doxygen, if not already in use, can also help improve code documentation and comprehensibility.
The arguments against a complete rewrite a pretty strong. Those tons of "little bugs" and behaviors that were coded in over the time frame of the original project will sneak right back in again.
See Joel Spolsky's essay Things You Should Never Do. In summary, when you rewrite you lose all the lessons you learned to make your current code work the way it needs to work.
See also: Big Ball of Mud
It is rare for a re-write of anything complex to succeed. It's tempting, but a low percentage strategy.
Get legacy code under unit tests and refactor it, and/or completely replace small portions of it incrementally when opportune.
Refactor unless it is very bad indeed.
Joel has a lot to say on this...
At the very least, rewrite the code with the old code in front of you and don't just start over from scratch. The old code may be terrible, but it is the way it is for a reason and if you ignore it you'll end up seeing the same bugs that were probably fixed years ago in the old code.
One reason for rewriting at one of my previous jobs was an inability to find developers with enough experience to work on the original code base.
The decision was made to first clean up the underlying database structure, then rewrite in something that would make it easier to find full-time employees and/or contractors.
I haven't heard yet how it worked out :)
I think people have a tendency to go for rewrites because it seems more fun on the surface.
We get to rebuild from scratch!
We'll do it right this time!
etc.
There is a new book coming out, Brownfield Application Development in .NET by Baley and Belcham. The first chapter is free, and talks about these issues from a mostly platform agnostic perspective.
Repair, or more importantly, refactor. Both because Joel said so and also because, if it's your code, you've probably learned a ton more stuff since you touched this code last. If you wrote it in .NET 1.1, you can upgrade it to 3.5 SP1. You get to go in and purge all the old commented out code. You're 100x better as a developer now than when you first wrote this code.
The one exception I think is when the code uses really antiquated technologies - in which case you might be better served by writing a new version. If you're looking at some VB6 app with 10,000 lines of code with an Access database backend obviously set up by someone who didn't know much about how databases work (which could very well be you eight years ago) then you can probably pull off a quicker, C#/SQL-based solution in a fraction of the time and code.
It's not so black and white... it really depends on a lot of factors (the more important being "what does the person paying you want you to do")
Where I work we re-wrote a development framework, and on the other hand, we keep modifying some old systems that cannot be migrated (because of the client's technology and time restrictions). In this case, we try to mantain the coding style and sometimes you have to implement a lot of workarounds because of the way it was built
Depending on your situation, you might have another option: in-license third-party code.
I've consulted at a couple of companies where that would be the sensible choice, although seemingly "throwing away IP" can be a big barrier for management. At my current company, we seriously considered the viable option of using third-party code to replace our core framework, but that idea was ultimately rejected more for business reasons than technical reasons.
To directly answer your question, we finally chose to rewrite the legacy framework - a decision we didn't take lightly! 14 months on, we don't regret this choice at all. Just considering the time spent fixing bugs, our new framework has nearly paid for itself. On the negative side, it is not quite feature-complete yet so we are in the unenviable position of maintaining two separate frameworks in parallel until we can port the last of our "front-end" applications.
I highly recommend reading "Working Effectively with Legacy Code" by Michael Feathers. It's coaching advice on how to refactor your code so that it is unit testable.

Resources