Code reuse and refactoring [closed] - refactoring

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
What's best practice for reuse of code versus copy/paste?
The problem with reuse can be that changing the reused code will affect many other pieces of functionality.
This is good & bad : good if the change is a bugfix or useful enhancement. Bad if other reusing code unexpectedly becomes broken because it relied on the old version (or the new version has a bug).
In some cases it would seem that copy/paste is better - each user of the pasted code has a private copy which it can customize without consequences.
Is there a best practice for this problem; does reuse require watertight unit tests?

Every line of code has a cost.
Studies show that the cost is not linear with the number of lines of code, it's exponential.
Copy/paste programming is the most expensive way to reuse software.
"does reuse require watertight unit tests?"
No.
All code requires adequate unit tests. All code is a candidate for reuse.

It seems to me that a piece of code that is used in multiple places that has the potential to change for one place and not for another place isn't following proper rules of scope. If the "same" method/class is needed by two different things to do two different functions, then that method/class should be split up.
Don't copy/paste. If it does turn out that you need to modify the code for one place, then you can extend it, possibly through inheritance, overloading, or if you must, copying and pasting. But don't start out by copy-pasting similar segments.

Using copy and paste is almost always a bad idea. As you said, you can have tests to check in case you break something.
The point is, when you call a method, you shouldn't really care about how it works, but about what it does. If you change the method, changing what it does, then it should be a new method, or you should check wherever this method is called.
On the other side, if the change doesn't modify WHAT the method does (only how), then you shouldn't have a problem elsewhere. If you do, you've done something wrong...

One very appropriate use of copy and paste is Triangulation. Write code for one case, see a second application that has some variation, copy & paste into the new context - but you're not done. It's if you stop at that point that you get into trouble. Having this code duplicated, perhaps with minor variation, exposes some common functionality that your code needs. Once it's in both places, tested, and working in both places, you should extract that commonality into a single place, call it from the two original places, and (of course) re-test.
If you have concerns that code which is called from multiple places is introducing risk of fragility, your functions are probably not fine-grained enough. Excessively coarse-grained functions, functions that do too much, are hard to reuse, hard to name, hard to debug. Find the atomic bits of functionality, name them, and reuse them.

So the consumer (reuser) code is dependent on the reused code, that's right.
You have to manage this dependency.
It is true for binary reuse (eg. a dll) and code reuse (eg. a script library) as well.
Consumer should depend on a certain (known) version of the reused code/binary.
Consumer should keep a copy of the reused code/binary, but never directly modify it, only update to a newer version when it is safe.
Think carefully when you modify resused codebase. Branch for breaking changes.
If a Consumer wants to update the reused code/binary then it first has to test to see if it's safe. If tests fail then Consumer can alway fall back to the last known (and kept) good version.
So you can benefit from reuse (eg. you have to fix a bug in one place), and still you're in control of changes. But nothing saves you from testing whenever you update the reused code/binary.

Is there a best practice for this
problem; does reuse require watertight
unit tests?
Yes and sort of yes. Rewriting code you have already did right once is never a good idea. If you never reuse code and just rewrite it you are doubling you bug surface. As with many best practice type questions Code Complete changed the way I do my work. Yes unit test to the best of your ability, yes reuse code and get a copy of Code Complete and you will be all set.

Copy and pasting is never good practice. Sometimes it might seem better as a short-term fix in a pretty poor codebase, but in a well designed codebase you will have the following affording easy re-use:
encapsulation
well defined interfaces
loose-coupling between objects (few dependencies)
If your codebase exhibits these properties, copy and pasting will never look like the better option. And as S Lott says, there is a huge cost to unnecessarily increasing the size of your codebase.

Copy/Paste leads to divergent functionality. The code may start out the same but over time, changes in one copy don't get reflected in all the other copies where it should.
Also, copy/paste may seem "OK" in very simple cases but it also starts putting programmers into a mindset where copy/paste is fine. That's the "slippery slope". Programmers start using copy/paste when refactoring should be the right approach. You always have to be careful about setting precedent and what signals that sends to future developers.
There's even a quote about this from someone with more experience than I,
"If you use copy and paste while you're coding, you're probably committing a design error."-- David Parnas

You should be writing unit tests, and while yes, having cloned code can in some sense give you the sense of security that your change isn't effecting a large number of other routines, it is probably a false sense of security. Basically, your sense of security comes from an ignorance of knowing how the code is used. (ignorance here isn't a pejorative, just comes from as a result of not being able to know everything about the codebase) Get used to using your IDE to learn where the code is being use, and get used to reading code to know how it is being used.

Where you write:
The problem with reuse can be that
changing the reused code will affect
many other pieces of functionality.
... In some cases it would seem that
copy/paste is better - each user of
the pasted code has a private copy
which it can customize without
consequences.
I think you've reversed the concerns related to copy-paste. If you copy code to 10 places and then need to make a slight modification to behavior, will you remember to change it in all 10 places?
I've worked on an unfortunately large number of big, sloppy codebases and generally what you'll see is the results of this - 20 versions of the same 4 lines of code. Some (usually small) subset of them have 1 minor change, some other small (and only partially intersecting subset) have some other minor change, not because the variations are correct but because the code was copied and pasted 20 times and changes were applied almost, but not quite consistently.
When it gets to that point it's nearly impossible to tell which of those variations are there for a reason and which are there because of a mistake (and since it's more often a mistake of omission - forgetting to apply a patch rather than altering something - there's not likely to be any evidence or comments).
If you need different functionality call a different function. If you need the same functionality, please avoid copy paste for the sanity of those who will follow you.

There are metrics that can be used to measure your code, and it's up to yo (or your development team) to decide on an adequate threshold. Ruby on Rails has the "Metric-Fu" Gem, which incorporates many tools that can help you refactor your code and keep it in tip top shape.
I'm not sure what tools are available for other laguages, but I believe there is one for .NET.

In general, copy and paste is a bad idea. However, like any rule, this has exceptions. Since the exceptions are less well-known than the rule I'll highlight what IMHO are some important ones:
You have a very simple design for something that you do not want to make more complicated with design patterns and OO stuff. You have two or three cases that vary in about a zillion subtle ways, i.e. a line here, a line there. You know from the nature of the problem that you won't likely ever have more than 2 or 3 cases. Sometimes it can be the lesser of two evils to just cut and paste than to engineer the hell out of the thing to solve a relatively simple problem like this. Code volume has its costs, but so does conceptual complexity.
You have some code that's very similar for now, but the project is rapidly evolving and you anticipate that the two instances will diverge significantly over time, to the point where trying to even identify reasonably large, factorable chunks of functionality that will stay common, let alone refactor these into reusable components, would be more trouble than it's worth. This applies when you believe that the probability of a divergent change to one instance is much greater than that of a change to common functionality.

Related

How much duplicated code do you tolerate? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
In a recent code review I spotted a few lines of duplicated logic in a class (less than 15 lines). When I suggested that the author refactor the code, he argued that the code is simpler to understand that way. After reading the code again, I have to agree extracting the duplicated logic would hurt readability a little.
I know DRY is guideline, not an absolute rule. But in general, are you willing to hurt readability in the name of DRY?
Refactoring: Improving the Design of Existing Code
The Rule of Three
The first time you do something, you
just do it. The second time you do
something similar, you wince at the duplication, but you do the duplicate
thing anyway. The third time you do something similar, you refactor.
Three strikes and you refactor.
Coders at Work
Seibel: So for each of these XII calls you're writing an
implementation.
Did you ever find that you were accumulating lots of
bits of very similar code?
Zawinski: Oh, yeah, definitely. Usually by the second or third time
you've cut and pasted
that piece of code it's like, alright, time to stop
cutting and pasting and put it in a
subroutine.
I tolerate none. I may end up having some due to time constraints or whatnot. But I still haven't found a case where duplicated code is really warranted.
Saying that it'll hurt readability only suggests that you are bad at picking names :-)
Personally, I prefer keeping code understandable, first and foremost.
DRY is about easing the maintenance in code. Making your code less understandable in order to remove repeated code hurts the maintainability more, in many cases, than having some repeated lines of code.
That being said, I do agree that DRY is a good goal to follow, when practical.
If the code in question has a clear business or technology-support purpose P, you should generally refactor it. Otherwise you'll have the classic problem with cloned code: eventually you'll discover a need to modify code supporting P, and you won't find all the clones that implement it.
Some folks suggest 3 or more copies is the threshold for refactoring. I believe that if you have two, you should do so; finding the other clone(s) [or even knowing they might exist] in a big system is hard, whether you have two or three or more.
Now this answer is provided in the context of not having any tools for finding the clones. If you can reliably find clones, then the original reason to refactor (avoiding maintenance errors) is less persausive (the utility of having a named abstraction is still real). What you really want is a way to find and track clones; abstracting them is one way to ensure you can "find" them (by making finding trivial).
A tool that can find clones reliably can at least prevent you from making failure-to-update-clone maintenance errors. One such tool (I'm the author) is the CloneDR. CloneDR finds clones using the targeted langauge structure as guidance, and thus finds clones regardless of whitespace layout, changes in comments, renamed variables, etc. (It is implemented for a number a languages including C, C++, Java, C#, COBOL and PHP). CloneDR will find clones across large systems, without being given any guidance. Detected clones are shown, as well as the antiunifier, which is essentially the abstraction you might have written instead. Versions of it (for COBOL) now integrate with Eclipse, and show you when you are editing inside a clone in a buffer, as well as where the other clones are, so that you may inspect/revise the others while you are there. (One thing you might do is refactor them :).
I used to think cloning was just outright wrong, but people do it because they don't know how the clone will vary from the original and so the final abstraction isn't clear at the moment the cloning act is occurring. Now I believe that cloning is good, if you can track the clones and you attempt to refactor after the abstraction becomes clear.
As soon as you repeat anything you're creating multiple places to have make edits if you find that you've made a mistake, need to extend it, edit, delete or any other of the dozens of other reasons you might come up against that force a change.
In most languages, extracting a block to a suitably named method can rarely hurt your readability.
It is your code, with your standards, but my basic answer to your "how much?" is none ...
you didn't say what language but in most IDEs it is a simple Refactor -> Extract Method. How much easier is that, and a single method with some arguments is much more maintainable than 2 blocks of duplicate code.
Very difficult to say in abstract. But my own belief is that even one line of duplicated code should be made into a function. Of course, I don't always achieve this high standard myself.
Refactoring can be difficult, and this depends on the language. All languages have limitations, and sometimes a refactored version of duplicated logic can be linguistically more complex than the repeated code.
Often duplications of code LOGIC occur when two objects, with different base classes, have similarities in the way they operate. For example 2 GUI components that both display values, but don't implement a common interface for accessing that value. Refactoring this kind of system either requires methods taking more generic objects than needed, followed by typechecking and casting, or else the class hierarchy needs to be rethought & restructured.
This situation is different than if the code was exactly duplicated. I would not necessarily create a new interface class if I only intended it to be used twice, and both times within the same function.
The point of DRY is maintainability. If code is harder to understand it's harder to maintain, so if refactoring hurts readability you may actually be failing to meet DRY's goal. For less than 15 lines of code, I'd be inclined to agree with your classmate.
In general, no. Not for readability anyway. There is always some way to refactor the duplicated code into an intention revealing common method that reads like a book, IMO.
If you want to make an argument for violating DRY in order to avoid introducing dependencies, that might carry more weight, and you can get Ayende's opinionated opinion along with code to illustrate the point here.
Unless your dev is actually Ayende though I would hold tight to DRY and get the readability through intention revealing methods.
BH
I accept NO duplicate code. If something is used in more than one place, it will be part of the framework or at least a utility library.
The best line of code is a line of code not written.
It really depends on many factors, how much the code is used, readability, etc. In this case, if there is just one copy of the code and it is easier to read this way then maybe it is fine. But if you need to use the same code in a third place I would seriously consider refactoring it into a common function.
Readability is one of the most important things code can have, and I'm unwilling to compromise on it. Duplicated code is a bad smell, not a mortal sin.
That being said, there are issues here.
If this code is supposed to be the same, rather than is coincidentally the same, there's a maintainability risk. I'd have comments in each place pointing to the other, and if it needed to be in a third place I'd refactor it out. (I actually do have code like this, in two different programs that don't share appropriate code files, so comments in each program point to the other.)
You haven't said if the lines make a coherent whole, performing some function you can easily describe. If they do, refactor them out. This is unlikely to be the case, since you agree that the code is more readable embedded in two places. However, you could look for a larger or smaller similarity, and perhaps factor out a function to simplify the code. Just because a dozen lines of code are repeated doesn't mean a function should consist of that dozen lines and no more.

How do you decide which parts of the code shall be consolidated/refactored next?

Do you use any metrics to make a decision which parts of the code (classes, modules, libraries) shall be consolidated or refactored next?
I don't use any metrics which can be calculated automatically.
I use code smells and similar heuristics to detect bad code, and then I'll fix it as soon as I have noticed it. I don't have any checklist for looking problems - mostly it's a gut feeling that "this code looks messy" and then reasoning that why it is messy and figuring out a solution. Simple refactorings like giving a more descriptive name to a variable or extracting a method take only a few seconds. More intensive refactorings, such as extracting a class, might take up to a an hour or two (in which case I might leave a TODO comment and refactor it later).
One important heuristic that I use is Single Responsibility Principle. It makes the classes nicely cohesive. In some cases I use the size of the class in lines of code as a heuristic for looking more carefully, whether a class has multiple responsibilities. In my current project I've noticed that when writing Java, most of the classes will be less than 100 lines long, and often when the size approaches 200 lines, the class does many unrelated things and it is possible to split it up, so as to get more focused cohesive classes.
Each time I need to add new functionality I search for already existing code that does something similar. Once I find such code I think of refactoring it to solve both the original task and the new one. Surely I don't decide to refactor each time - most often I reuse the code as it is.
I generally only refactor "on-demand", i.e. if I see a concrete, immediate problem with the code.
Often when I need to implement a new feature or fix a bug, I find that the current structure of the code makes this difficult, such as:
too many places to change because of copy&paste
unsuitable data structures
things hardcoded that need to change
methods/classes too big to understand
Then I will refactor.
I sometimes see code that seems problematic and which I'd like to change, but I resist the urge if the area is not currently being worked on.
I see refactoring as a balance between future-proofing the code, and doing things which do not really generate any immediate value. Therefore I would not normally refactor unless I see a concrete need.
I'd like to hear about experiences from people who refactor as a matter of routine. How do you stop yourself from polishing so much you lose time for important features?
We use Cyclomatic_complexity to identify the code that needs to be refactored next.
I use Source Monitor and routinely refactor methods when the complexity metric goes aboove around 8.0.

Dealing with god objects

I work in a medium sized team and I run into these painfully large class files on a regular basis. My first tendency is to go at them with a knife, but that usually just makes matters worse and puts me into a bad state of mind.
For example, imagine you were just given a windows service to work on. Now there is a bug in this service and you need to figure out what the service does before you can have any hope of fixing it. You open the service up and see that someone decided to just use one file for everything. Start method is in there, Stop method, Timers, all the handling and functionality. I am talking thousands of lines of code. Methods under a hundred lines of code are rare.
Now assuming you cannot rewrite the entire class and these god classes are just going to keep popping up, what is the best way to deal with them? Where do you start? What do you try to accomplish first? How do you deal with this kind of thing and not just want to get all stabby.
If you have some strategy just to keep your temper in check, that is welcome as well.
Tips Thus Far:
Establish test coverage
Code folding
Reorganize existing methods
Document behavior as discovered
Aim for incremental improvement
Edit:
Charles Conway recommend a podcast which turned out to be very helpful. link
Michael Feathers (guy in the podcast) begins with the premise that were are too afraid to simply take a project out of source control and just play with it directly and then throw away the changes. I can say that I am guilty of this.
He essentially said to take the item you want to learn more about and just start pulling it apart. Discover it's dependencies and then break them. Follow it through everywhere it goes.
Great Tip
Take the large class that is used elsewhere and have it implement an emtpy interface. Then take the code using the class and have it instantiate the interface instead. This will give you a complete list of all the dependencies to that large class in your code.
Ouch! Sounds like the place I use to work.
Take a look at Working effectivly with legacy code. It has some gems on how to deal with atrocious code.
DotNetRocks recently did a show on working with legacy code. There is no magic pill that is going to make it work.
The best advice I've heard is start incrementally wrapping the code in tests.
That reminds me of my current job and when I first joined. They didn't let me re-write anything because I had the same argument, "These classes are so big and poorly written! no one could possibly understand them let alone add new functionality to them."
So the first thing I would do is to make sure there are comprehensive testing behind the areas that you're looking to change. And at least then you will have a chance of changing the code and not having (too many) arguments (hopefully). And by tests, I mean testing the components functionally with integration or acceptance tests and making sure it is 100% covered. If the tests are good, then you should be able to confidently change the code by splitting up the big class into smaller ones, getting rid of duplication etc etc
Even if you cannot refactor the file, try to reorganize it. Move methods/functions so that they are at least organized within the file logically. Then put in lots of comments explaining each section. No, you haven't rewritten the program, but at least now you can read it properly, and the next time you have to work on the file, you'll have lots of comments, written by you (which hopefully means that you will be able to understand them) which will help you deal with the program.
Code Folding can help.
If you can move stuff around within the giant class and organize it in a somewhat logical way, then you can put folds around various blocks.
Hide everthing, and you're back to a C paradigm, except with folds rather than separate files.
I've come across this situation as well.
Personally I print out (yeah, it can be a lot of pages) the code first. Then I draw a box around sections of code that are not part of any "main-loop" or are just helper functions and make sure I understand these things first. The reason is they are probably referred to many times within the main body of the class and it's good to know what they do
Second, I identify the main algorithm(s) and decompose them into their parts using a numbering system that alternates between numbers and letters (it's ugly but works well for me). For example you could be looking at part of an algorithm 4 "levels" deep and the numbering would be 1.b.3.e or some other god awful thing. Note that when I say levels, I am not referring directly to control blocks or scope necessarily, but where I have identified steps and sub-steps of an algorithm.
Then it's a matter of just reading and re-reading the algorithm. When you start out it sounds like a lot of time, but I find that doing this develops a natural ability to comprehend a great deal of logic all at once. Also, if you discover an error attributed to this code, having visually broken it down on paper ahead of time helps you "navigate" the code later, since you have a sort of map of it in your head already.
If your bosses don't think you understand something until you have some form of UML describing it, a UML sequence diagram could help here if you pretend the sub-step levels are different "classes" represented horizontally, and start-to-finish is represented vertically from top-to-bottom.
I feel your pain. I tackled something like this once for a hobby project involving processing digital TV data on my computer. A fellow on a hardware forum had written an amazing tool for recording shows, seeing everything that was on, and more. Plus, he had done incredibly vital work of working around bugs in real broadcast signals that were in violation of the standard. He'd done amazing work with thread scheduling to be sure that no matter what, you wouldn't lose those real-time packets: on an old Pentium, he could record four streams simultaneously while also playing Doom and never lose a package. In short, this code incorporated a ton of great knowledge. I was hoping to take some pieces and incorporate them into my own project.
I got the source code. One file, 22,000 lines of C, no abstraction. I spent hours reading it; there was all this great work, but it was all done badly. I was not able to reuse a single line or even a single idea.
I'm not sure what the moral of the story is, but if I had been forced to use this stuff at work, I would have begged permission to chip pieces off it one at a time, build unit tests for each piece, and eventually grow a new, sensible thing out of the pieces. This approach is a bit different than trying to refactor and maintain a large brick in place, but I would rather have left the legacy code untouched and tried to bring up a new system in parallel.
The first thing I would do is write some unit tests to box the current behavior, assuming that there are none already. Then I'd start in the area where I need to make the change and try to get that method cleaned up -- i.e. refactor working code before introducing changes. Use common refactoring techniques to extract and reuse methods from existing long methods to make them more understandable. When you extract a method, look for other places in the code where similar code exists, box that area, and reuse the method you've just extracted.
Look for groups of methods that "hang together" that can be broken out into their own classes. Write some tests for how those classes should work, build the classes using the existing code as a template if need be, then substitute the new classes into the existing code, removing the methods that they replace. Again, using your tests to make sure that you're not breaking anything.
Make enough improvement to the existing code so that you feel you can implement your new feature/fix in a clean way. Then write the tests for the new feature/fix and implement to pass the tests. Don't feel that you have to fix everything the first time. Aim for gradual improvement, but always leave the code better than you found it.

When does a code base become large and unwieldy? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
When do you start to consider a code base to be getting too large and unwieldy?
-when a significant amount of your coding time is devoted to "where do I put this code?"
-when reasoning about side-effects starts to become really hard.
-when there's a significant amount of code that's just "in there", and nobody knows what it does or if it's still running but it's too scary to remove
-when lots of team members spend significant chunks of their time chasing down intermittent bugs caused by some empty string somewhere in the data where it wasn't expected, or something that you think would usually be caught in a well-written application, in some edge case
-when, in considering how to implement a new feature, "complete rewrite" starts to seem like a good answer
-when you dread looking at the mess of code you need to maintain and wish you could find work building something clean and logical instead of dumpster diving through the detritus of someone else's poorly organized thinking
When it's over 100 lines. Joke. This is probably the hardest question to answer, because it's very individual.
But if you structure the application well and use different layers for i.e. interfaces, data, services and front-end you will automaticly get a nice "base"-structure. Then you can dividie each layer into different classes and then inside the classes you point out the appropriet methods for the class.
However, there's not an "x amount of lines per method is bad" but think of it more like this, if there is possibility of replication, split it from the current peice and make it re-usable.
Re-using code is the basics of all good structure.
And splitting up into different layers will help the base to become more and more flexible and modular.
There exist some calculable metrics if that's what you're searching for. Static code analysis tools can help with that:
Here's one list: http://checkstyle.sourceforge.net/config_metrics.html
Other factors can be the time it takes to change/add something.
Other non-calculable factors can be
the risk associated to changes
the level intermingling of features.
if the documentation can keep up with the features / code
if the documentation represent the application.
the level of training needed.
the quantity of repeat instead of reuse.
Ah, the god-program anti-pattern.
When you can't remember at least the
outline of sections of it.
When you have to think about how
changes will affect itself or
dependencies.
When you can't remember all the
things it's dependant on or depend
on it.
When it takes more than a few
minutes(?) to download the source or
compile.
When you have to worry about how to
deploy new versions.
When you encounter classes which are
functionally identical to other
classes elsewhere in the app.
So many possible signs.
I think there are many thoughts to why some code base is too large.
It is hard to remain in a constant naming convention. If classes/methods/atributes can't be named consistently or can't be found consistently, then it's time to reorganize.
When your programmers are surfing the web and going to lunch in order to compile. Keeping compiling/linking time to a minimum is important for management. The last thing you want is a programmer to get distracted by twiddling their thumbs for too long.
When small changes start to affect many MANY other places of code. There is a benefit to consolidation of code, but there is also a cost. If a small change to fix one bug causes a dozen more, and this is commonly happens, then your code base needs to be spread out (versioned libraries) or possibly unconsolidated (yes, duplicate code).
If the learning curve of new programmers to the project is obviously longer than acceptable (usually 90 days), then your code base/training isn't set up right.
..There are many many more, I'm sure. If you think about it from these three perspectives:
Is it hard to support?
Is it hard to change?
Is it hard to learn?
...Then you will have an idea if your code fits the "large and unwieldy" category
For me, code becomes unwieldy when there's been a lot of changes made to the codebase that weren't planned for when the program was initially written or last refactored significantly. At this point, stuff starts to get fitted into the existing codebase in odd places for expediency and you start to get a lot of design artifacts that only make sense if you know the history of the implementation.
Short answer: it depends on the project.
Long answer:
A codebase doesn't have to be large to be unwieldy - spaghetti code can be written from line 1. So, there's not really a magic tripping point from good to bad - it's more of a spectrum of great <---> awful, and it takes daily effort to keep your codebase from heading in the wrong direction. What you generally need is a lead developer that has the ability to review others' code objectively, and keep an eye on the architecture and design of the code as a whole - no one line developer can do that.
When I can't remember what a class does or what other classes it uses off the top of my head. It's really more a function of my cognitive capacity coupled with the code complexity.
I was trying to think of a way of deciding based on how your collegues perceive it to be.
During my first week at a gig a few years ago, I said during a stand-up that I had been tracking a white rabbit around the ContainerManagerBean, the ContainerManagementBean and the ContextManagerBean (it makes me shudder just recalling these words!). At least two of the developers looked at their shoes and I could see them keeping in a snigger.
Right then and there, I knew that this was not a problem with my lack of familiarity with the codebase - all the developers perceived a problem with it.
If over years of development different people code change requests and bug fixes you will sooner or later get parts of code with duplicated functionality, very similar classes, some spaghetti etc.
This is mostly due to the fact that a fix is needed fast and the "new guy" doesn't know the code base. So he happily codes away something which is already there.
But if you have automatic checks in place checking the style, unit test code coverage and similar you can avoid some of it.
A lot of the things that people have identified as indicating problems don't really have to do with the raw size of the codebase, but rather its comprehensibility. How does size relate to comprehensibility? If at all...
I've seen very short programs that are just a mess -- easier to throw away and redo from scratch. I've also seen very large programs whose structure is transparent enough that it is comprehensible even at progressively more detailed views of it. And everything in between...
I think look at this question from the standpoint of an entire codebase is a good one, but it probably pays to work up from the bottom and look first at the comprehensibility of individual classes, to multi-class components, to subsystems, and finally up to an entire system. I would expect the answers at each level of detail to build on each other.
For my money, the simplest benchmark is this: Can you explain the essence of what X does in one sentence? Where X is some granularity of component, and you can assume an understanding of the levels immediately above and below the component.
When you come to need a utility method or class, and have no idea whether someone else has already implemented it or have any idea where to look for one.
Related: when several slightly different implementations of the same functionality exist, because each author was unaware of other authors' work.

Legacy Code Nightmare [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I've inherited a project where the class diagrams closely resemble a spider web on a plate of spaghetti. I've written about 300 unit tests in the past two months to give myself a safety net covering the main executable.
I have my library of agile development books within reach at any given moment:
Working Effectively with Legacy Code
Refactoring
Code Complete
Agile Principles Patterns and Practices in C#
etc.
The problem is everything I touch seems to break something else.
The UI classes have business logic and database code mixed in. There are mutual dependencies between a number of classes. There's a couple of god classes that break every time I change any of the other classes. There's also a mutant singleton/utility class with about half instance methods and half static methods (though ironically the static methods rely on the instance and the instance methods don't).
My predecessors even thought it would be clever to use all the datasets backwards. Every database update is sent directly to the db server as parameters in a stored procedure, then the datasets are manually refreshed so the UI will display the most recent changes.
I'm sometimes tempted to think they used some form of weak obfuscation for either job security or as a last farewell before handing the code over.
Is there any good resources for detangling this mess? The books I have are helpful but only seem to cover half the scenarios I'm running into.
It sounds like you're tackling it in the right way.
Test
Refactor
Test again
Unfortunately, this can be a slow and tedious process. There's really no substitute for digging in and understanding what the code is trying to accomplish.
One book that I can recommend (if you don't already have it filed under "etc.") is Refactoring to Patterns. It's geared towards people who are in your exact situation.
I'm working in a similar situation.
If it is not a small utility but a big enterprise project then it is:
a) too late to fix it
b) beyond the capabilities of a single person to attempt a)
c) can only be fixed by a complete rewriting of the stuff which is out of the question
Refactoring can in many cases be only attempted in your private time at your personal risk. If you don't get an explicit mandate to do it as part of you daily job then you're likely not even get any credit for it. May even be criticized for "pointlessly wasting time on something that has perfectly worked for a long time already".
Just continue hacking it the way it has been hacked before, receive your paycheck and so on. When you get completely frustrated or the system reaches the point of being non-hackable any further, find another job.
EDIT: Whenever I attempt to address the question of the true architecture and doing the things the right way I usually get LOL in my face directly from responsible managers who are saying something like "I don't give a damn about good architecture" (attempted translation from German). I have personally brought one very bad component to the point of non-hackability while of course having given advanced warnings months in advance. They then had to cancel some promised features to customers because it was not doable any longer. Noone touches it anymore...
I've worked this job before. I spent just over two years on a legacy beast that is very similar. It took two of us over a year just to stabilize everything (it's still broke, but it's better).
First thing -- get exception logging into the app if it doesn't exist already. We used FogBugz, and it took us about a month to get reporting integrated into our app; it wasn't perfect right away, but it was reporting errors automatically. It's usually pretty safe to implement try-catch blocks in all your events, and that will cover most of your errors.
From there fix the bugs that come in first. Then fight the small battles, especially those based on the bugs. If you fix a bug that unexpectedly affects something else, refactor that block so that it is decoupled from the rest of the code.
It will take some extreme measures to rewrite a big, critical-to-company-success application no matter how bad it is. Even you get permission to do so, you'll be spending too much time supporting the legacy application to make any progress on the rewrite anyway. If you do many small refactorings, eventually either the big ones won't be that big or you'll have really good foundation classes for your rewrite.
One thing to take away from this is that it is a great experience. It will be frustrating, but you will learn a lot.
I have (once) come across code that was so insanely tangled that I couldn't fix it with a functional duplicate in a reasonable amount of time. That was sort of a special case though, as it was a parser and I had no idea how many clients might be "using" some of the bugs it had. Rendering hundreds of "working" source files erroneous was not a good option.
Most of the time it is imminently doable, just daunting. Read through that refactoring book.
I generally start fixing bad code by moving things around a bit (without actually changing implementation code more than required) so that modules and classes are at least somewhat coherent.
When that is done, you can take your more coherent class and rewrite its guts to perform the exact same way, but this time with sensible code. This is the tricky part with management, as they generally don't like to hear that you are going to take weeks to code and debug something that will behave exactly the same (if all goes well).
During this process I guarantee you will discover tons of bugs, and outright design stupidities. It's OK to fix trivial bugs while recoding, but otherwise leave such things for later.
Once this is done with a couple of classes, you will start to see where things can be modularized better, designed better, etc. Plus it will be easier to make such changes without impacting unrelated things because the code is now more modular, and you probably know it thoroughly.
Mostly, that sounds pretty bad. But I don't understand this part:
My predecessors even thought it would
be clever to use all the datasets
backwards. Every database update is
sent directly to the db server as
parameters in a stored procedure, then
the datasets are manually refreshed so
the UI will display the most recent
changes.
That sounds pretty close to a way I frequently write things. What's wrong with this? What's the correct way?
If your refactorings are breaking code, particularly code that seems to be unrelated, then you're trying to do too much at a time.
I recommend a first-pass refactoring where all you do is ExtractMethod: the goal is simply to name each step in the code, without any attempts at consolidation whatsoever.
After that, think about breaking dependencies, replacing singletons, consolidation.
If your refactorings are breaking things, then it means you don't have adequate unit test coverage - as the unit tests should have broken first. I recommend you get better unit test coverage second, after getting exception logging into place.
I then recommend you do small refactorings first - Extract Method to break large methods into understandable pieces; Introduce Variable to remove some duplication within a method; maybe Introduce Parameter if you find duplication between the variables used by your callers and the callee.
And run the unit test suite after each refactoring or set of refactorings. I'd say run them all until you gain confidence about which tests will need to be rerun every time.
No book will be able to cover all possible scenarios. It also depends on what you'll be expected to do with the project and whether there is any kind of external specification.
If you'll only have to do occasional small changes, just do those and don't bother starting to refactor.
If there is a specification (or you can get someone to write it), consider a complete rewrite if it can be justified by the foreseeable amount of changes to the project
If "the implementation is the specification" and there are a lot of changes planned, then you're pretty much hosed. Write LOTS of unit tests and start refactoring in small steps.
Actually, unit tests are going to be invaluable no matter what you do (if you can write them to an interface that's not going to change much with refactorings or a rewrite, that is).
See blog post Anatomy of an Anti-Corruption Layer, Part 1 and Anatomy of an Anti-Corruption Layer, Part 2.
It cites Eric Evans, Domain-Driven Design: Tackling Complexity in the Heart of Software:
Access the crap behind a facade
You could extract and then refactor some part of it, to break the dependencies and isolate layers into different modules, libraries, assemblies, directories. Then you re-inject the cleaned parts in to the application with a strangler application strategy. Lather, rinse, repeat.
Good luck, that is the tough part of being a developer.
I think your approach is good, but you need to focus on delivering business value (number of unit tests is not a measure of business value, but it may give you an indication if you are on or off track). It's important to have identified the behaviors that need to be changed, prioritize, and focus on the top ones.
The other piece of advise is to remain humble. Realize that if you wrote something so large under real deadlines and someone else saw your code, they would probably have problems understanding it as well. There is a skill in writing clean code, and there is a more important skill in dealing with other people's code.
The last piece of advise is to try to leverage the rest of your team. Past members may know information about the system you can learn. Also, they may be able to help test behaviors. I know the ideal is to have automated tests, but if someone can help by verifying things for you manually consider getting their help.
I particularly like the diagram in Code Complete, in which you start with just legacy code, a rectangle of fuzzy grey texture. Then when you replace some of it, you have fuzzy grey at the bottom, solid white at the top, and a jagged line representing the interface between the two.
That is, everything is either 'nasty old stuff' or 'nice new stuff'. One side of the line or the other.
The line is jagged, because you're migrating different parts of the system at different rates.
As you work, the jagged line gradually descends, until you have more white than grey, and eventually just grey.
Of course, that doesn't make the specifics any easier for you. But it does give you a model you can use to monitor your progress. At any one time you should have a clear understanding of where the line is: which bits are new, which are old, and how the two sides communicate.
You might find the following post useful:
http://refactoringin.net/?p=36
As it is said in the post, don't discard a complete overwrite that easily. Also, if at all possible, try to replace whole layers or tiers with third-party solution like for example ORM for persistence or with new code. But most important of all, try to understand the logic (problem domain) behind the code.

Resources