Related
I need some advice on how to work with legacy code.
A while ago, I was given the task to add a few reports to a reporting app. written in Struts 1, back in 2005. No big deal, but the code is quite messy. No usage of Action forms, and basically the code is one huge action, and a lot of if-else statements inside. Also, no one here has functional knowledge on this. We just happened to have it in our contract.
I'm quite unhappy about this, and not sure how to proceed. This application is invisible: Few people (but all very important) use it, so they don't care whether my eyes bleed while reading the code, standards, etc.
However, I feel that a technical debt is to be paid. How should I proceed on this? Continue down the if-else road, or try to do this requirement the right way, ignoring the rest of the project? Starting a huge refactor, risking my deadline?
Legacy code is a big issue, and I'm sure people will not agree!
I would say that starting a big re-factor could be a mistake.
A big re-factor means doing a lot of work to make it function exactly the way that it does now. If you choose to take this on on your own, there won't be a lot of visibility of what you are doing. If it works, no one will know the hours of work you put it. If it does NOT work, and you end up with tidy code, but add some bugs (and who has ever written code without adding some bugs) then you will get 'why did this change' type questions.
I have currently nearly completed a project working on a 10 year old code base. We have done quite a few bits of re-factoring along the way. But for each re-factor we have made we can justify 'this specific change will make the actual task we are doing now easier'. Rather than 'this is now cleaner for future work'. We have found that as we worked on the code, fixing the issues that we actually come up against one at a time, we have cleaned up a lot of it, without breaking it (much).
And I would say before you can re-factor much, you will need automated tests, so you can be fairly happy that you have put it back together right!
Most re-factoring is done to 'make maintenance and future development easier'. Your project sounds like there is not a lot of future development coming. That limits the advantage a re-factor will give the company.
Rule #1: If it ain't broke, don't fix it.
Rule #2: When in doubt, reread rule #1.
Unfortunately, legacy code can very rarely be described as "it ain't broke." Therefor we must tweak the existing code to correct a newly found bug, tweak the existing code to modify behavior that was previously acceptable, or tweak the existing code to add new functionality.
My experience has taught me that any refactoring must be done in 'infinitesimally' small increments. If you must break rule #2, I suggest that you start your search with the inner-most nested loop or IF structure and expand outward until you find a clean, logical separation point and create a new function/method/subroutine that contains only the guts of that loop or structure. This won't make anything more efficient but it should give you a clearer view of the underlying logic and structure. Once you have several new, smaller functions/methods/subroutines you can refactor and consolidate those into something more manageable.
Rule #3: Ignore my previous paragraph and reread the first two rules.
I agree with other comments. If you don't have to, then don't do it. It usually cost far more then it's worth if the code base is more or less dead any way.
On the other hand, if you feel that you cannot get your head around the code then a refactor is probably unavoidable. If this is the case then, since it's a web application, can you create a solid suite of functional tests using selenium? If so this is the fastest and most rewarding test approach for such code and will catch most bugs for the buck.
Second, start with the extract method refactoring to create compose methods of the big difficult methods. Every time you think to your self "This should have a comment to explain what it does" you should extract it to a method with a name that replaces the comment.
Once this has been done, if you still can't add the functionality required, you can go for more advanced refactorings, and perhaps even adding some unit tests. But I usually find that I can add what is required/fix the bug in legacy code by just creating self documenting code.
In a few words: before make any modifications to legacy code its good idea to start from automated unit tests.
This will give developer understanding about key things: dependencies this piece of code has, input data, output results, boundary conditions so on.
When it’s done most likely you will better understand what this code does and how it works.
After that its make sense (but not must) clean code a bit giving more accurate names to local variables, moving some functionality (repetitive code, if any) in to functions with clear human friendly names.
A simple clean up could make code more readable and at the same time save developer from regression issues with unit tests help.
Refactoring – make small changes, step by step, when you have a time and understanding of the requirements and functionality, regularly unit testing the code.
But do not start from refactoring
I've been learning what TDD is, and one question that comes to mind is what exactly is the "test". For example, do you call the webservice and then build the code to make it work? or is it more unit testing oriented?
In general the test may be...
unit test which tests an individual subcomponent of your software without any external dependencies to other classes
integration test which are tests that test the connection between two separate systems, ie. their integration capability
acceptance test for validating the functionality of the system
...and some others I've most likely temporarily forgotten for now.
In TDD, however, you're mostly focusing on the unit tests when creating your software.
It's entirely Unit Test driven.
The basic idea is to write the unit tests first, and then do the absolute minimum amount of work necessary to pass the tests.
Then write more tests to cover more of the requirements, and implement a bit more to make it pass.
It's an iterative process, with cycles of test writing, and then code writing.
Here are a couple of good articles by Unclebob
Three rules of TDD
TDD with Acceptance and Unit tests
I suggest you not to emphasize on Test because TDD is actually is a software development methodology, not a testing methodology.
I would say it is about unit testing and code coverage. It is about shipping bugless code and being able to make changes easily in the future.
See Uncle Bob's words of wisdom.
How I use it, it's unit testing oriented. Suppose I want a method that square ints I write this method :
int square(int x) { return null; }
and then write some tests like :
[Test]
TestSquare()
{
Assert.AreEqual(square(0),0);
Assert.AreEqual(square(1),1);
Assert.AreEqual(square(10),100);
Assert.AreEqual(square(-1),1);
Assert.AreEqual(square(-10),100);
....
}
Ok, maybe square is a bad example :-)
In each case I test expected behaviour and all borderline vals like maxint and zero and null-value (remember you can test on errors too) and see to it the test fails (which isn't hard :-)) then I keep working on the function until it works.
So : first a unit test that fails an covers what you want it to cover, then the method.
Generally, unit tests in "TDD" shouldn't involve any IO at all.
In fact, you'll have a ton more effectiveness if you write objects that do not create side effects (I/O is almost always, if not always, a side effect!), and define your the behavior of your class either in terms of return values of methods, or calls made to interfaces that have been passed into the object.
I just want to give my view on the topic which may help to understand TDD a bit more in a different way.
TDD is a design method that relies in testing first. because you asked about how the test is, ill go like this:
If you want to build an application, you know the purpose of what you want to build and you know generally that when you are done, or along the way you need to test it e.g check the values of variables you create by code inspection, of quickly drop a button that you can click on and will execute a part of code and pop up a dialog to show the result of the operation etc.
on the other hand TDD changes your mindset and i'll point out one of such ways. commonly , you just rely on the development environment like visual studio to detect errors as you code and compile and somewhere in your head, you know the requirement and just coding and testing via button and pop ups or code inspection. I call this style SDDD (Syntax debugging driven development ).
but when you are doing TDD, is a "semantic debugging driven development " because you write down your thoughts/ goals of your application first by using tests (which and a more dynamic and repeatable version of a white board) which tests the logic (or "semantic") of your application and fails whenever you have a semantic error even if you application passes syntax error (upon compilation).
by the way even though i said "you know the purpose of what you want to build ..", in practice you may not know or have all the information required to build the application , since TDD kind of forces you to write tests first, you are compelled to ask more questions about the functioning of the application at a very early stage of development rather than building a lot only to find out that a lot of what you have written is not required (or at lets not at the moment). you can really avoid wasting your precious time with TDD (even though it may not feel like that initially)
In doing test driven development I have been in the habit of writing the first unit test for a new piece of functionality first, then writing the code for that functionality. If I have additional tests to write to cover all scenarios, I usually write them after the code is written. Is this considered bad form? Should I try and write every conceivable test for a piece of functionality first, before ever writing that code?
In order to do TDD properly, you always write the test first, and then the functionality second.
To add to that, I would take one scenario at a time, don't write 20 tests and then write the code for those 20 tests. Write one test, red/green flag it, then move on to your next test. This makes sure you're also doing one of the core tenets of TDD, which is to do the simplest implementation possible that meets all of your requirements/scenarios.
actually no, I often discover functionality "on-the-go". Let me explain the "no" a bit further:
I usually start out writing a test case for a high level feature, defining its Interface. After that, I usually set this test to ignore and continue writing tests for each of the Interfaces functionality. My cycle goes like:
Integration Test for Story A (high level API)
Write Unit Test for method xyz called in Integration Test
Implement method (red/green/refactor)
Repeat 2+3 till Integration Test passes.
While doing so, I often realize I have forgotten some small functionality in my main test. I then usually take time to look back at my customers requirements. If its a fit, I go back and add a test for it, set to ignored as I first want to finish what I started.
Sometimes I see the chance to do a refactoring. I usually finish an implementation till I reach a commit point and do refactoring then, however sometimes I stash my changes, go back and do the refactoring (including new tests if nescessary) first. This workflow is powererd by Mercurial MQ.
For most people, TDD and incremental/agile development go together. This looks something like:
Write a test for some feature
Write just enough code to make the test pass, refactoring as necessary
Repeat.
If you happen to have a detailed specification ahead of time, you could write all of the tests first, but you'd have to live with having sone tests not passing for a while.
The sooner you write the tests, the better. I usually find writing tests being harder tasks than actually implementing the functionality because you have to be aware of all the possible outcomes. So I tend to write more tests when I'm "in the zone". And when during coding I realize I might have missed a test case I just note that down on the to-do lists.
So in my opinion it's up to your leisure but I would implement tests in multiple batches.
The way I see it, test driven development isn't necessarily tests first development. Your tests drive your development and you are really writing your tests as you develop your application. You start by writing a simple test that fails because you haven't written the functionality yet. Then you write your code to implement that so that the tests pass.
Then you go back to your test, make modifications that will force you to add more functionality or refactor your code to follow better practices or reduce duplicate code, go fix your code to make the test pass...repeat, repeat, repeat.
A couple of videos that demonstrates this is below, although you can probably find a lot more by googling "TDD Video"
http://agilesoftwaredevelopment.com/videos/test-driven-development-basic-tutorial
(oops, only one video, new users can't insert more than one link)
I try to write a test at some level before each bit of functionality. Sometimes, I have to write a little more code to get through the compiler, but I try to minimise that. Writing the test first means that I've thought about what the code is supposed to achieve before writing it.
One technique I find useful is to keep an index card or notepad handy, and make a note of all the cases that I think of along the way. That allows me to focus on the current task without losing track of all the other things I'm supposed to think about. Afterwards, I can work through the list and either complete the extra cases or drop them as not necessary.
You could do that, but you wouldn't be doing TDD. The problem (well, one of them, anyway) with writing all of your tests up front is that in any case where the requirements are non-trivial, your tests will be building in a lot of assumptions about the structure of the code you're test-driving. Big steps lead to missteps.
One of the keys of successful TDD involves taking small steps. Small steps mean fewer changes to back out when something goes wrong. Small steps mean you can more often get your head around the effects of the changes you're making. And because small steps are easier to take with confidence, they have the paradoxical effect of increasing your velocity.
The TDD cycle starts with requirements. Start by choosing a requirement you know how to define through tests immediately, in a few short steps. If you look at a requirement and you're not sure how to test it, or you think, "Yeah, but to do that, I'd need to [insert ill-defined steps] first", then you should either skip to another requirement that you know how to do, or you should break this requirement into smaller requirements that you know how to do.
Once you have that, you work in a short red-green-refactor cycle: Write a test that quantifies some part of the requirement ("red", because it fails, because it has no implementation to test yet), write any code that will pass the test ("green"), then rework the code to remove duplication, magic numbers, and other code smells ("refactor"). During the refactoring phase, you should continue working in small steps, frequently re-running the test to make sure you haven't broken anything. Continue this cycle until you can look your boss/client in the eye and call the requirement met.
Now that you have one simple piece of your system defined, you've opened up the list of requirements available to implement - requirements that are adjacent to or dependent on the one you just implemented can now be tested and implemented in smaller steps building on what you've already done.
So the upshot of all that is: Don't try to do all your tests at once. One (small) thing at a time.
The point of TDD is that you have to observe that test fails when feature is not yet implemented. So you have to write test before code.
When you get into the TDD rhythm you write one test at a time and make it work. Very short red-green-refactor cycles really feel the rhythm. That being said, there is nothing wrong with other approaches (and they may even make more sense for some types of problems) but typically the only thing you need to do about other tests you are thinking of is write them down (or have your pair if you are pair programming write them down) so you don't forget them. You have to do that anyway because you could forget about a test in the middle of writing a different test.
Do just enough tests to test 1 unit of code at a time.. then write the actual code until it passes the test.. rinse, wash, repeat until done.
If you find yourself needing to write many tests for one unit of code ( a method, a function etc) it might be a sign that you are trying to do too much in that unit... which in turn makes the unit dificult to test & to refactor at a later time.
As you work in a legacy codebase what will have the greatest impact over time that will improve the quality of the codebase?
Remove unused code
Remove duplicated code
Add unit tests to improve test coverage where coverage is low
Create consistent formatting across files
Update 3rd party software
Reduce warnings generated by static analysis tools (i.e.Findbugs)
The codebase has been written by many developers with varying levels of expertise over many years, with a lot of areas untested and some untestable without spending a significant time on writing tests.
Read Michael Feather's book "Working effectively with Legacy Code"
This is a GREAT book.
If you don't like that answer, then the best advice I can give would be:
First, stop making new legacy code[1]
[1]: Legacy code = code without unit tests and therefore an unknown
Changing legacy code without an automated test suite in place is dangerous and irresponsible. Without good unit test coverage, you can't possibly know what affect those changes will have. Feathers recommends a "stranglehold" approach where you isolate areas of code you need to change, write some basic tests to verify basic assumptions, make small changes backed by unit tests, and work out from there.
NOTE: I'm not saying you need to stop everything and spend weeks writing tests for everything. Quite the contrary, just test around the areas you need to test and work out from there.
Jimmy Bogard and Ray Houston did an interesting screen cast on a subject very similar to this:
http://www.lostechies.com/blogs/jimmy_bogard/archive/2008/05/06/pablotv-eliminating-static-dependencies-screencast.aspx
I work with a legacy 1M LOC application written and modified by about 50 programmers.
* Remove unused code
Almost useless... just ignore it. You wont get a big Return On Investment (ROI) from that one.
* Remove duplicated code
Actually, when I fix something I always search for duplicate. If I found some I put a generic function or comment all code occurrence for duplication (sometime, the effort for putting a generic function doesn't worth it). The main idea, is that I hate doing the same action more than once. Another reason is because there's always someone (could be me) that forget to check for other occurrence...
* Add unit tests to improve test coverage where coverage is low
Automated unit tests is wonderful... but if you have a big backlog, the task itself is hard to promote unless you have stability issue. Go with the part you are working on and hope that in a few year you have decent coverage.
* Create consistent formatting across files
IMO the difference in formatting is part of the legacy. It give you an hint about who or when the code was written. This can gave you some clue about how to behave in that part of the code. Doing the job of reformatting, isn't fun and it doesn't give any value for your customer.
* Update 3rd party software
Do it only if there's new really nice feature's or the version you have is not supported by the new operating system.
* Reduce warnings generated by static analysis tools
It can worth it. Sometime warning can hide a potential bug.
I'd say 'remove duplicated code' pretty much means you have to pull code out and abstract it so it can be used in multiple places - this, in theory, makes bugs easier to fix because you only have to fix one piece of code, as opposed to many pieces of code, to fix a bug in it.
Add unit tests to improve test coverage. Having good test coverage will allow you to refactor and improve functionality without fear.
There is a good book on this written by the author of CPPUnit, Working Effectively with Legacy Code.
Adding tests to legacy code is certianly more challenging than creating them from scratch. The most useful concept I've taken away from the book is the notion of "seams", which Feathers defines as
"a place where you can alter behavior in your program without editing in that place."
Sometimes its worth refactoring to create seams that will make future testing easier (or possible in the first place.) The google testing blog has several interesting posts on the subject, mostly revolving around the process of Dependency Injection.
I can relate to this question as I currently have in my lap one of 'those' old school codebase. Its not really legacy but its certainly not followed the trend of the years.
I'll tell you the things I would love to fix in it as they bug me every day:
Document the input and output variables
Refactor the variable names so they actually mean something other and some hungarian notation prefix followed by an acronym of three letters with some obscure meaning. CammelCase is the way to go.
I'm scared to death of changing any code as it will affect hundreds of clients that use the software and someone WILL notice even the most obscure side effect. Any repeatable regression tests would be a blessing since there are zero now.
The rest is really peanuts. These are the main problems with a legacy codebase, they really eat up tons of time.
I'd say it largely depends on what you want to do with the legacy code...
If it will indefinitely remain in maintenance mode and it's working fine, doing nothing at all is your best bet. "If it ain't broke, don't fix it."
If it's not working fine, removing the unused code and refactoring the duplicate code will make debugging a lot easier. However, I would only make these changes on the erring code.
If you plan on version 2.0, add unit tests and clean up the code you will bring forward
Good documentation. As someone who has to maintain and extend legacy code, that is the number one problem. It's difficult, if not downright dangerous to change code you don't understand. Even if you're lucky enough to be handed documented code, how sure are you that the documentation is right? That it covers all of the implicit knowledge of the original author? That it speaks to all of the "tricks" and edge cases?
Good documentation is what allows those other than the original author to understand, fix, and extend even bad code. I'll take hacked yet well-documented code that I can understand over perfect yet inscrutable code any day of the week.
The single biggest thing that I've done to the legacy code that I have to work with is to build a real API around it. It's a 1970's style COBOL API that I've built a .NET object model around, so that all the unsafe code is in one place, all of the translation between the API's native data types and .NET data types is in one place, the primary methods return and accept DataSets, and so on.
This was immensely difficult to do right, and there are still some defects in it that I know about. It's not terrifically efficient either, with all the marshalling that goes on. But on the other hand, I can build a DataGridView that round-trips data to a 15-year-old application which persists its data in Btrieve (!) in about half an hour, and it works. When customers come to me with projects, my estimates are in days and weeks rather than months and years.
As a parallel to what Josh Segall said, I would say comment the hell out of it. I've worked on several very large legacy systems that got dumped in my lap, and I found the biggest problem was keeping track of what I already learned about a particular section of code. Once I started placing notes as I go, including "To Do" notes, I stopped re-figuring out what I already figured out. Then I could focus on how those code segments flow and interact.
I would say just leave it alone for the most part. If it's not broken then don't fix it. If it is broken then go ahead and fix and improve the portion of the code that is broken and its immediately surrounding code. You can use the pain of the bug or sorely missing feature to justify the effort and expense of improving that part.
I would not recommend any wholesale kind of rewrite, refactor, reformat, or putting in of unit tests that is not guided by actual business or end-user need.
If you do get the opportunity to fix something, then do it right (the chance of doing it right the first time might have already passed, but since you are touching that part again might as well do it right time around) and this includes all the items you mentioned.
So in summary, there's no single or just a few things that you should do. You should do it all but in small portions and in an opportunistic manner.
Late to the party, but the following may be worth doing where a function/method is used or referenced often:
Local variables often tend to be poorly named in legacy code (often owing to their scope expanding when a method is modified, and not being updated to reflect this). Renaming these in line with their actual purpose can help clarify legacy code.
Even just laying out the method slightly differently can work wonders - for instance, putting all the clauses of an if on one line.
There might be stale/confusing code comments there already. Remove them if they're not needed, or amend them if you absolutely have to. (Of course, I'm not advocating removal of useful comments, just those that are a hindrance.)
These might not have the massive headline impact you're looking for, but they are low risk, particularly if the code can't be unit tested.
When doing TDD, how to tell "that's enough tests for this class / feature"?
I.e. when could you tell that you completed testing all edge cases?
With Test Driven Development, you’ll write a test before you write the code it tests. Once you’re written the code and the test passes, then it’s time to write another test. If you follow TDD correctly, you’ve written enough tests once you’re code does all that is required.
As for edge cases, let's take an example such as validating a parameter in a method. Before you add the parameter to you code, you create tests which verify the code will handle each case correctly. Then you can add the parameter and associated logic, and ensure the tests pass. If you think up more edge cases, then more tests can be added.
By taking it one step at a time, you won't have to worry about edge cases when you've finished writing your code, because you'll have already written the tests for them all. Of course, there's always human error, and you may miss something... When that situation occurs, it's time to add another test and then fix the code.
Kent Beck's advice is to write tests until fear turns into boredom. That is, until you're no longer afraid that anything will break, assuming you start with an appropriate level of fear.
On some level, it's a gut feeling of
"Am I confident that the tests will catch all the problems I can think of
now?"
On another level, you've already got a set of user or system requirements that must be met, so you could stop there.
While I do use code coverage to tell me if I didn't follow my TDD process and to find code that can be removed, I would not count code coverage as a useful way to know when to stop. Your code coverage could be 100%, but if you forgot to include a requirement, well, then you're not really done, are you.
Perhaps a misconception about TDD is that you have to know everything up front to test. This is misguided because the tests that result from the TDD process are like a breadcrumb trail. You know what has been tested in the past, and can guide you to an extent, but it won't tell you what to do next.
I think TDD could be thought of as an evolutionary process. That is, you start with your initial design and it's set of tests. As your code gets battered in production, you add more tests, and code that makes those tests pass. Each time you add a test here, and a test there, you're also doing TDD, and it doesn't cost all that much. You didn't know those cases existed when you wrote your first set of tests, but you gained the knowledge now, and can check for those problems at the touch of a button. This is the great power of TDD, and one reason why I advocate for it so much.
Well, when you can't think of any more failure cases that doesn't work as intended.
Part of TDD is to keep a list of things you want to implement, and problems with your current implementation... so when that list runs out, you are essentially done....
And remember, you can always go back and add tests when you discover bugs or new issues with the implementation.
that common sense, there no perfect answer. TDD goal is to remove fear, if you feel confident you tested it well enough go on...
Just don't forget that if you find a bug later on, write a test first to reproduce the bug, then correct it, so you will prevent future change to break it again!
Some people complain when they don't have X percent of coverage.... some test are useless, and 100% coverage does not mean you test everything that can make your code break, only the fact it wont break for the way you used it!
A test is a way of precisely describing something you want. Adding a test expands the scope of what you want, or adds details of what you want.
If you can't think of anything more that you want, or any refinements to what you want, then move on to something else. You can always come back later.
Tests in TDD are about covering the specification, in fact they can be a substitute for a specification. In TDD, tests are not about covering the code. They ensure the code covers the specification, because the code will fail a test if it doesn't cover the specification. Any extra code you have doesn't matter.
So you have enough tests when the tests look like they describe all the expectations that you or the stakeholders have.
maybe i missed something somewhere in the Agile/XP world, but my understanding of the process was that the developer and the customer specify the tests as part of the Feature. This allows the test cases to substitute for more formal requirements documentation, helps identify the use-cases for the feature, etc. So you're done testing and coding when all of these tests pass...plus any more edge cases that you think of along the way
Alberto Savoia says that "if all your tests pass, chances are that your test are not good enough". I think that it is a good way to think about tests: ask if you are doing edge cases, pass some unexpected parameter and so on. A good way to improve the quality of your tests is work with a pair - specially a tester - and get help about more test cases. Pair with testers is good because they have a different point of view.
Of course, you could use some tool to do mutation tests and get more confidence from your tests. I have used Jester and it improve both my tests and the way that I wrote them. Consider to use something like it.
Kind Regards
Theoretically you should cover all possible input combinations and test that the output is correct but sometimes it's just not worth it.
Many of the other comments have hit the nail on the head. Do you feel confident about the code you have written given your test coverage? As your code evolves do your tests still adequately cover it? Do your tests capture the intended behaviour and functionality for the component under test?
There must be a happy medium. As you add more and more test cases your tests may become brittle as what is considered an edge case continuously changes. Following many of the earlier suggestions it can be very helpful to get everything you can think of up front and then adding new tests as the software grows. This kind of organic grow can help your tests grow without all the effort up front.
I am not going to lie but I often get lazy when going back to write additional tests. I might miss that property that contains 0 code or the default constructor that I do not care about. Sometimes not being completely anal about the process can save you time n areas that are less then critical (the 100% code coverage myth).
You have to remember that the end goal is to get a top notch product out the door and not kill yourself testing. If you have that gut feeling like you are missing something then chances are you are have and that you need to add more tests.
Good luck and happy coding.
You could always use a test coverage tool like EMMA (http://emma.sourceforge.net/) or its Eclipse plugin EclEmma (http://www.eclemma.org/) or the like. Some developers believe that 100% test coverage is a worthy goal; others disagree.
Just try to come up with every way within reason that you could cause something to fail. Null values, values out of range, etc. Once you can't easily come up with anything, just continue on to something else.
If down the road you ever find a new bug or come up with a way, add the test.
It is not about code coverage. That is a dangerous metric, because code is "covered" long before it is "tested well".