Should you code to protect your application from bad coders? [closed] - validation

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
It's no question that we should code our applications to protect themselves against malicious, curious, and/or careless users, but what about from current and/or future colleagues?
For example, I'm writing a web-based API that accepts parameters from the user. Some of these parameters may map to values in a configuration file. If the user messes with the URL and provides an invalid value for the parameter, my application would error out when trying to read from that section of the configuration file that doesn't exist. So of course, I scrub params before trying to read from the config file.
Now, what if down the road, another developer works on this application, adds another valid value for this parameter that would pass the scrubbing process, but doesn't add the corresponding section to the configuration file. Remember, I'm only protecting the app from bad users, not bad coders. My application would fail.
On one hand, I know that all changes should be tested before moving to production and something like this would undoubtedly come up in a decent testing session, but on the other hand, I try to build my applications to resist failure as best as possible. I just don't know if it's "right" to include modification of my code by colleagues in the list of potential points of failure.
For this project, I opted not to check if the relevant section of the config file existed. As the current developer, I wouldn't allow the user to specify a parameter value that would cause failure, so I would expect a future developer to not introduce behavior into a production environment that could cause failure... or at least eliminate such a case during testing.
What do you think?
Lazy... or philosophically sound?

"I just don't know if it's "right" to include modification of my code by colleagues in the list of potential points of failure."
It isn't right to prevent your colleagues from breaking things.
You don't know what new purposes your software will be put to. You don't know how it will be modified in the future.
Instead, do this.
Write simple correct software, put it into production, and stop worrying about somebody "breaking" something.
If your software is actually simple, other people can maintain it without breaking it.
If you make it too complex, they will (a) break it anyway, in spite of everything you do and (b) hate you for making it complex.
So, make it as simple as possible to do the job.

It sounds like you're taking reasonable steps to protect against incompetence by doing some scrubbing of the input. I don't believe that you're responsible for protecting against any possible misuse of your code or bad input. I'd go further than that and say that as long as your code explicitly documents what is and isn't an acceptable input then you've done enough, especially if the added "idiot error checking" code is bloated or (especially) slower.
A procedure that documents exactly what inputs are acceptable is reasonable for an inner api. That being said, I often code (over) defensively, but that's mostly due to the environment I'm in and my level of trust in the rest of the code.

Lazy... or philosophically sound?
Lazy... and arrogant. By coding in a way that makes mistakes show up quickly, you protect the app against your own mistakes just as much as against the mistakes of others. Everyone makes more mistakes than they think.
Of course, rather than adding more code to detect whether the config file and the parameter checking match, it would be much better if the parameter checking were based on the config file so that there's only one place where new values are added and an inconsistency is not possible.

I think it is the future developer's responsibility to ensure that she does not introduce bugs or failure points into your code. When you sign off from the project (if ever?!), then part of the signing process should at least be that the code has been presented as bug free as possible, this would then limit the liability you hold for future problems.
If your code is kept in a version control system, it would be trivial for you to create a tag which would mark the point at which you handed over your code, thus enabling you to compare the current codebase to your original should a bug arise which some one may be trying to blame on you (if this is the angle you're coming from!), therefore allowing you to prove that it is the changes that have been made to your original implementation which has caused these bugs. (Assuming of course that the implementations do cause unexpected behaviour and don't fix your "bug-free" code grin).
One method I have used in the past to ensure data integrity (and it isn't fool proof), is to check an input at specific offsets for specific values, ensuring that input wasn't tainted.
Hope that helps.

Both ways are fine, philosophically speaking. I would make the judgment based on how likely you think it is for this to happen. If you can be almost positive that somebody will break your code in that way, it might be polite for you to provide a check that will catch that when it happens.
If it doesn't seem particularly likely, then it's just part of their job to make sure they don't break the code.
In either event though, your technical notes (or other appropriate documentation) should clearly indicate that when the one change is made, the other change is also required.

I would use an inline comment for future developers, or for a developer like me who tends to forget what was going on in every part of the application after I haven't worked on it for months.
Don't worry about actually coding to foil future coders, that's impossible. Just add all the information someone needs to support or extend what you're doing within the source code context.
This is documentation, as Steve B. mentioned. I'd just make sure it's not external, as that has a tendency to get lost.

Related

How to document undefined behaviour in the Scrum/agile/TDD process [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
We're using a semi-agile process at the moment where we still write a design/specification document and update it during the development process.
When we're refining our requirements we often hit edge cases where we decide it's not important to handle it, so we don't write any code for that use case and we don't test it. In the design spec we explicitly state that this scenario is out of scope because the system isn't designed to be used in that way.
In a more fully-fledged agile process, the tests are supposed to act as a specification for the expected behaviour of the system, but how would you record the fact that a certain scenario is explicitly out-of-scope rather than just getting accidentally missed out?
As a bit of clarification, here's the situation I'm trying to avoid: We have discussed a scenario and decided we won't handle it because it doesn't make sense. Then later on, when someone is trying to write the user guide, or give a training session, or a customer calls the help desk, exactly the same scenario comes up, so they ask me how the system handles it, and I think "I remember talking about this a year ago, but there are no tests for it. Maybe it got missed of the plan, or maybe we decided it wasn't a sensible use-case, or maybe there's a subtle reason why you can't actually ever get into that situation", so I have to try and search old skype chats or emails to find out the answer. What I want to achieve is to make sure we have a record of why we decided not to support that scenario so that I can refer back to it in the future. At the moment I put this in the spec where everyone can see it.
I would document deliberately unsupported use cases/stories/requirements/features in your test files, which are much more likely to be regularly consulted, updated, etc. than specifications would be. I'd document each unsupported feature in the highest-level test file in which it was appropriate to discuss that feature. If it was an entire use case, I'd document it in an acceptance test (e.g. a Cucumber feature file or RSpec feature spec); if it was a detail I might document it in a unit test.
By "document" I mean that I'd write a test if I could. If not, I'd just comment. Which one would depend on the feature:
For features that a user might expect to be there, but for which there is no way for the user to access (e.g. a link or menu item that simply isn't present), I'd write a comment in the appropriate acceptance test file, next to the tests of the related features that do exist.
Side note: Some testing tools (e.g. Cucumber and RSpec) also allow you to have scenarios or examples in feature or spec files which aren't actually run, so you can use them like comments. I'd only do that if those disabled scenarios/examples didn't result in messages when you ran the tests that might make someone think that something was broken or unfinished. For example, RSpec's pending/skip loudly announces that there is work left to be done, so it would probably be annoying to use it for cases that were never meant to be implemented.
For situations that you decided not to handle, but which an inquisitive user might get themselves into anyway (e.g. entering an invalid value into a field or editing a URL to access a page for which they don't have permission), don't just ignore them, handle them in a clean if minimal way: quietly clear the invalid value, redirect the user to the home page, etc. Document this behavior in tests, perhaps with a comment explaining why you aren't doing anything even more helpful. It's not a lot of extra work, and it's a lot better than showing the user an error page or other alarming behavior.
For situations like the previous, but that you for some reason decided not to or couldn't find a way to handle at all, you can still write a test that documents the situation, for example that entering some invalid value into a form results in an HTTP 500.
If you would like to write a test, but for some reason you just can't, there are always comments -- again, in the appropriate test file near tests of related things that are implemented.
You should never test undefined behavior, by ...definition. The moment you test a behavior, you are defining it.
In practice, either it's valuable to handle a hedge case or it isn't. If it is, then there should be a user story for it, which acts as documentation for that edge case. What you don't want to have is an old user story documenting a future behavior, so it's probably not advisable to document undefined behavior in stories that don't handle it.
More in general, agile development always works iteratively. Edge case discovery is part of iterative work: with work comes increased knowledge, with increased knowledge comes more work. It is important to capture these discoveries in new stories, instead of trying to handle everything in one go.
For example. suppose we're developing Stack Overflow and we're doing this story:
As a user I want to search questions so that I can find them
The team develops a simple question search and discovers that we need to handle closed questions... we hadn't thought of that! So we simply don't handle them (whatever the simplest to implement behavior is). Notice that the story doesn't document anything about closed questions in the results. We then add a new story
As a user I want to specifically search closed questions so that I can find more results
We develop this story, and find more edge cases, which are then more stories, etc.
In the design spec we explicitly state that this scenario is out of scope because the system isn't designed to be used in that way
Having undocumented functionality in your product really is a bad practice.
If your development team followed BDD/TDD techniques they should (note emphasis) reduce the likelihood of this happening. If you found this edge-case then what makes you think your customer won't? Having an untested and unexpected feature in your product could compromise the stability of your product.
I'd suggest that if an undocumented feature is found:
Find out how it was introduced (common reason: a developer thought it might be a good feature to have as it might be useful in the future and they didn't want to throw away work they produced!)
Discuss the feature with your Business Analysts and Product owner. Find out if they want such a feature in your product. If they do, great, document and test it. If they don', remove it as it could be a liability.
You also had a question regarding the tracking of the outcome of these edge-case scenarios:
What I want to achieve is to make sure we have a record of why we decided not to support that scenario so that I can refer back to it in the future.
As you are writing a design/specification document, one approach you could take is to version that document. Then, when a feature/scenario is taken out you can note within a version change section in your document why the change was made. You can then refer to this change history at a later date.
However I'd recommend using a planning board to keep track of your user stories. Using such a board you could write a note on the card (virtual/physical) explaining why the feature was dropped which also could be referred to at a later date.

When should I add comments to my code? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
When I'm writing it?
After I got a part done (Single class/function/if-elses)?
After I got the whole thing working?
The short answer
The short answer is anytime something is non-obvious relative to whose going to be reading it. If its code that is still in flux so you are the only consumer, just comments for you (hours and days). Ready to check in for others to try out - comments for you and your team (days and weeks, possibly months). Ready for wide release - comments for the immediate and future public (months and years). You have to think of comments as tools, not documentation.
The long answer:
When I'm writing it? - Yes
After I got a part done (Single class/function/if-elses)? - Yes
After I got the whole thing working? - Yes
When I'm writing it? - Yes
Drop comments anytime you hit a place where the code isn't immediately clear. For example, describe the class when the class name isn't clear or could be interpreted too widely. Another example is if I'm about to write a non-obvious code block, I'll first add a comment reminding me of what I want/need. Or if I just added some code and I immediately realized there was a gotcha in there, drop a comment to remind yourself. These comments are implementor comments, less to help future maintainers, but rather to help yourself in the coding process.
Drop FIXME - explanation and TODO explanations reminders as you go.
Code is still in flux, so I'm not yet documenting every and all method and parameter.
After I got a part done (Single class/function/if-elses)? - Yes
When I'm reasonably done with a method or class, now is the time to review it. Along with checking scopes of methods, ordering methods, and other code cleanup to improve understandability, now's the time to begin to standardize it against your team standards. Consider what comments are need based on the audience it will be released to (future you is part of the audience too!) Does the class have a header block? Are there non-obvious conditions under which this method should not be called? Does this parameter have any conditions on it, e.g. should not be null?
Check the FIXME and TODO items - still valid? Any you should address now before moving on?
These are still notes for you and your team, but the beginnings of standardized notes for future maintainers.
After I got the whole thing working? - Yes
Now is the time to review everything and finalize comments against your standards.
All FIXME and TODO items addressed (fixed or captured as known issue)?
These notes now are for future maintainers.
Now the dirty little secret
More is not always better. Like unit tests, you have to balance use of your tools weighing costs vs benefits. The fact is that a coder can only type so many physical lines per hour - what percent should be comments? A low percentage means I've got a lot of code, but its confusing and difficult to understand and use correctly. A high percentage means that, in an hour when someone changes a method signature or redefines an interface, all the time is spent fully commenting every parameters of those methods just got trashed.
Find the right percentage based on the stability of the code, how long it will live, and how widely it will be released. Not stable yet - minimal comments to help you and your team. Stable and ready for project - fully commented. Public release? - fully commented (check again!) with copyrights (if applicable). As you gain experience, adjust the percentage.
You should never "add" comments - they are not additions. Comments are part of the code - you use them when you need them. Asking when you should add them is like asking when you should add functions or classes. Though thinking about it, I remember doing a program advice slot at university I worked for where one of the students came in with about 1000 lines of Pascal, with no functions. When I queried why he hadn't used functions, his response was "I'll add them later, once I've got it working."
This is subjective, but sometimes it's better to add them before the actual code, eg. when you implement an algorithm that has clearly defined steps. By that way it's harder to miss steps.
This is a matter of style. Personally, I like writing comments during the coding, not after. Because it I leave it to after, I usually get lazy and don't write them at all. That said, sometimes it's useful to go over a completed piece of code, figure out what isn't obvious from the code itself and document it. In particular, the parts where assumptions are made.
I would suggest writing comments whenever you edit any code, while you are editing it. According to Robert C. Martin in Clean Code, a disadvantage of comments is that the code can change without the comments being updated, making the comments not only useless, but dangerous. To reduce this problem, if you must use comments (because you are unable to express yourself in the code itself), make sure you update them every time you update the code.
You should try writing comments BEFORE you write any code. eg
public string getCurrentUserName() {
//init user database repository
//retrieve logged in user
//return name if a user is logged in, otherwise return null
}
Writing comments before you code, helps you learn how to structure your code without actually coding it and realising that you should have done it another way. It's also a good way to quickly visualise a clean solution to a complex problem without getting bogged down in implementation. It's also good because if you get interrupted, when you come back to your work you can go straight back to it, as opposed to refigure out what you have done and what you need to do next.
Not suited to all situations, but often a good option!
A disadvantage of adding comments later is that a lot of times that will simply not be done, due to lazyness, other tasks, etc.
If you find you can always go back and add the appropriate comments without any problem, then by all means do so, but otherwise making a conscious effort to add them as you're coding or before you code a section may be a way to ensure that you don't leave the code uncommented.
Put a comment ANYWHERE the programmer reading your code, may generate a WTF moment.
If you find yourself commenting every line, perhaps you need to take a look at trying to improve your code with simpler, more elegant statements.
Comments should reflect why you are doing the things the way you do, not what it does. Most of the time the one reading your code can read what it does.
You should explain the the things one cannot reduce from the code.
I tend to put basic comments as I'm going, just to remind myself what I was thinking at the time when I wrote it (i.e. why I wrote it that way). I do this especially if it's code that looks like it might be wrong but is actually right, or code that has an inherent race condition that I don't care about, or code that might not be optimal but is a quick way to get something working, so that even ten minutes later when I go back and look at it I can see that I've thought about the problem already and don't have to waste any brain cycles on it.
When the code is more complete, I'll often go back and review the comments I've written and then have a think about whether I still think the decisions made are reasonable, and whether things could be done better. I'll also often expand the basic comment into a longer comment that's more useful for other people when they come to maintain the code; I usually save comment expansion to the end because a lot of the time basic comments just get deleted during refactoring, so writing a long comment is a waste of time until you know you're going to keep it.
In a nutshell, write basic comments as you go along, and then improve them as your code becomes more stable.
Oh, and also, any time you review a bit of existing code and you're struck with a WTF?! moment but then realise the code is actually decent, put a comment in to save yourself and the next person time when they look at it in the future.
The question should be, when do I add code to my comments?
My practice is to write out the functionality of a module/object/function as a series of comments. Not comments like "add one to counter". Higher level comments like
"sort list by account number". Detailed comments are pretty much redundant with the code. So I avoid those unless I'm writing a very tricky algorithm.
Once I have the functionality "designed" in comments, I act like a human compiler and
add in the code after each line of comments.
Give it a try and let us know how it works!
Personally, I tend to write comments to summarise code where necessary - often before I write the code, as well as to save WTFs. I treat them very exactly as notes - of things to do, things that I have done this way, or will do this way, and as such they are put in when and where I feel the need for them.
Before you forget what specification and design the code is required to implement.
Before you forget that some unfortunate coder will have to read it later on.
Before you forget that the unfortunate coder could well be you.
When you do something non-trivial, as you're writing it.
You gave a lot of cases in your question. I think it depends on what you're doing at the time.
If you're writing a function or a class, comments are a way to declare what's supposed to happen with the function. Things like input variables, output type, special behavior, exceptions, etc. IMHO that kind of comment should be written before the actual code is started, in your "code design" phase. Most languages have packages which process those kind of comments into documentation (javadoc, epydoc, POD, etc, so that stuff will be read by users.
If you're making a bit of code work, I think it's OK to wait until you've got it working to put in a comment triumphantly describing your working solution. That kind of comment is only going to get read by a code reviewer.
Then, as others have said, you want to avoid WTF moments, by yourself or others. I once got an attaboy for a comment I made once in an open-source project. The comment was "Yes, I really do want = and not == on that line."
A. when you decide an arbitrary decion that would be difficult to re-understand.
B. Every thing that you feel that you should remember while writting the code
C. in the beginning of a program explain the logic and use
Advice - instead of commenting a lot use long names for functions and vars that realy explain what the function does or what the variable stands for.
Mostly at time when you write that code. You can go there after the function/block/whatever is done and organize your comments on fresh mind. Most of the stuff we write while coding are not meaningful later.
Early on in my career I added comments to nearly every line of code, as you may do perhaps in an ASM program. As time went by I ran into many of the problems mentioned here. It was a bear to maintain which resulted in not updating comments and then they become stale at best, usually moldy.
I feel that the # of comments should reflect how complex or non-obvious the code itself is. In a more challenging environment, such as ASM, you will probably need more comments to understand what is going on. In more modern languages like C# you shouldn't need a whole lot of comments in most cases.
Generally I use tools that evaluate the complexity of my methods in C#. Those that are high on the complexity scale first get refactored. Then when I'm satisfied with the complexity remaining and I still have some code that is not obvious, or even more important, seems obvious but does something different, then I tack a comment on it.
I add comments while writing any code that is not easily understandable. I find that if I don't do it immediately then it gets forgotten. I (or more likely someone else) then spends more time figuring what I did than it would have taken to write the comment.
To be more precise, commenting immediately after the code is written is the best avenue to ensure comments actually get written.

When do you refactor code? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Do you do it when you’re in the code doing something else?
When your manager approves it? (Seems this never happens)
I guess some of this depends on the impact of the changes. If I change the code and it affects nothing outside of the class, to me that is low impact.
What does it become a design change? When it effect X object or X projects?
I’m just curious how others teams tackle this...
As part of original development (red/green/refactor)
When suggested by a code reviewer
When we've noticed a design pain-point
When making another change, if the refactoring is low impact, i.e. typically not affecting any other files.
If it affects the public API, I generally like to make the refactoring a single source code commit which doesn't change behaviour (and then build new behaviour into another commit). If it affects other projects too, there needs to be consensus over it and I would want to get permission to change their code to go in the same refactoring commit.
I find I refactor when revisiting code (presumably to add/extend functionality) more than 3 months after it was written.
If it takes me more than 2 minutes to discern what a chunk of code is doing, I'll break it apart to make it more immediately understandable (or just add some more comments.)
as soon as all of the tests run.
I work in a large system, so I only change things I have to. It is easy to have bad side effects to changes.
I will refactor sections of code that are performing poorly, not working properly, or needs new functionality.
I never just decide to fix things, I would never be done. if it works, and no one is asking for changes or complaining about problems, move on. life is too short to fix everything.
I often refactor my code when there is a user requirement change or bug fixes. Then there will be a chance for people to review your changes.
Otherwise, I normally don't touch the workable code even it smells.
We found small refactorings are best done while we were working on a bit of code - do what's required, preferably paired.
For bigger things, we had a Technical Debt section on the wall - if you spotted something and didn't have the time to address it, or it was going to take some discussion to solve, you'd add it to the wall and they would be scheduled for future iterations (or when free time cropped up).
Refactoring while you're already in the code is sometimes easiest, especially if your manager does not support the initiative, but if you only change a small part it will break consistency with surrounding parts. In these cases it's better to be selective and, as you suggested, do things that are low-impact. It may also be helpful to refactor long select/switch statements into functions and delay on refactoring the inner code until sometime later.
At a previous job, I was the manager, so I refactored whenever I wanted. At my current job, I'm an analyst so most of the code is not directly my responsibility. When I do write code, I avoid impacting anything that I'm not writing. I have one project which is entirely under my own control and I refactor any time I learn a better way to do something.
We refactor as often as we can. Having unit tests to ensure that everything works pre- and post- refactoring really helps.
Code review processes often help with this. If I touch some code, it gets reviewed, reviewer asks, "why did you do it this way?", I say, "I had to because of (insert ugliness here)". This is a sign that the code should be refactored right after the review is done.
To look at our company, we have decided that our upcoming application release is mostly dedicated to performance optimizations rather than new functionality. This was something we felt was needed and also was requested by some clients.
Therefore we have spent a lot of time identifying performance bottlenecks in our app and reviewing code and refactoring it to make things run faster.
So in our case we did it because management approved us doing it for this new release, because we showed to them how much performance improvement could be gained.
Refactor when needed:
when you need a better understanding of the code you are working on (pairing often helps here), examples are: renaming, method extraction etc.
when the current design doesn't allow for a 'clean' change: at this point you can actually argue with your manager on a value basis (e.g. what is this new feature worth to the project)
I am always making small refactorings in my code. I know as long as I have my unit tests to verify that everything is still functioning properly afterward, I see no harm in doing it as I go. That way you don't get that vague "needs refactoring" feeling every time you work on it.
Now if it requires a large refactoring, it's best to plan for that and set aside some time.
Seems most other posters are resistant to refacotring mercilessly. Of course this isn't possible if the system you're working on doesn't support this through extensive unit tests. But in general, If I can see an opportunity to make the code tighter without spending more than a few minutes or hours at most, I go for it. If I'm not sure what I should be working on, I look for something to refactor.
I refactor when I'm fixing a bug or adding a feature and the process of refactoring makes the code easier to read and easier to maintain.
Following DRY principles vehemently will often be a trigger for me to refactor.
Insufficiently often, thus building up technical debt.
Sad, but so.
Do as I say, not as the team I work on does.

How "defensive" should my code be?

I was having a discussion with one of my colleagues about how defensive your code should be. I am all pro defensive programming but you have to know where to stop. We are working on a project that will be maintained by others, but this doesn't mean we have to check for ALL the crazy things a developer could do. Of course, you could do that but this will add a very big overhead to your code.
How do you know where to draw the line?
Anything a user enters directly or indirectly, you should always sanity-check. Beyond that, a few asserts here and there won't hurt, but you can't really do much about crazy programmers editing and breaking your code, anyway!-)
I tend to change the amount of defense I put in my code based on the language. Today I'm primarily working in C++ so my thoughts are drifting in that direction.
When working in C++ there cannot be enough defensive programming. I treat my code as if I'm guarding nuclear secrets and every other programmer is out to get them. Asserts, throws, compiler time error template hacks, argument validation, eliminating pointers, in depth code reviews and general paranoia are all fair game. C++ is an evil wonderful language that I both love and severely mistrust.
I'm not a fan of the term "defensive programming". To me it suggests code like this:
void MakePayment( Account * a, const Payment * p ) {
if ( a == 0 || p == 0 ) {
return;
}
// payment logic here
}
This is wrong, wrong, wrong, but I must have seen it hundreds of times. The function should never have been called with null pointers in the first place, and it is utterly wrong to quietly accept them.
The correct approach here is debatable, but a minimal solution is to fail noisily, either by using an assert or by throwing an exception.
Edit: I disagree with some other answers and comments here - I do not think that all functions should check their parameters (for many functions this is simply impossible). Instead, I believe that all functions should document the values that are acceptable and state that other values will result in undefined behaviour. This is the approach taken by the most succesful and widely used libraries ever written - the C and C++ standard libraries.
And now let the downvotes begin...
I don't know that there's really any way to answer this. It's just something that you learn from experience. You just need to ask yourself how common a potential problem is likely to be and make a judgement call. Also consider that you don't necessarily have to always code defensively. Sometimes it's acceptable just to note any potential problems in your code's documentation.
Ultimately though, I think this is just something that a person has to follow their intuition on. There's no right or wrong way to do it.
If you're working on public APIs of a component then its worth doing a good amount of parameter validation. This led me to have a habit of doing validation everywhere. Thats a mistake. All that validation code never gets tested and potentially makes the system more complicated than it needs to be.
Now I prefer to validate by unit testing. Validation definitely happens for data coming from external sources, but not for calls from non-external developers.
I always Debug.Assert my assumptions.
My personal ideology: the defensiveness of a program should be proportional to the maximum naivety/ignorance of the potential user base.
Being defensive against developers consuming your API code is not that different from being defensive against regular users.
Check the parameters to make sure they are within appropriate bounds and of expected types
Verify that the number of API calls which could be made are within your Terms of Service. Generally called throttling it usually only applies to web services and password checking functions.
Beyond that there's not much else to do except make sure your app recovers well in the event of a problem and that you always give ample information to the developer so that they understand what's going on.
Defensive programming is only one way of hounouring a contract in a design-by-contract manner of coding.
The other two are
total programming and
nominal programming.
Of course you shouldnt defend yourself against every crazy thing a developer could do, but then you should state in wich context it will do what is expected to using preconditions.
//precondition : par is so and so and so
function doSth(par)
{
debug.assert(par is so and so and so )
//dostuf with par
return result
}
I think you have to bring in the question of whether you're creating tests as well. You should be defensive in your coding, but as pointed out by JaredPar -- I also believe it depends on the language you're using. If it's unmanaged code, then you should be extremely defensive. If it's managed, I believe you have a little bit of wiggleroom.
If you have tests, and some other developer tries to decimate your code, the tests will fail. But then again, it depends on test coverage on your code (if there is any).
I try to write code that is more than defensive, but down right hostile. If something goes wrong and I can fix it, I will. if not, throw or pass on the exception and make it someone elses problem. Anything that interacts with a physical device - file system, database connection, network connection should be considered unereliable and prone to failure. anticipating these failures and trapping them is critical
Once you have this mindset, the key is to be consistent in your approach. do you expect to hand back status codes to comminicate problems in the call chain or do you like exceptions. mixed models will kill you or at least drive you to drink. heavily. if you are using someone elses api, then isolate these things into mechanisms that trap/report in terms you use. use these wrapping interfaces.
If the discussion here is how to code defensively against future (possibly malevolent or incompetent) maintainers, there is a limit to what you can do. Enforcing contracts through test coverage and liberal use of asserting your assumptions is probably the best you can do, and it should be done in a way that ideally doesn't clutter the code and make the job harder for the future non-evil maintainers of the code. Asserts are easy to read and understand and make it clear what the assumptions of a given piece of code is, so they're usually a great idea.
Coding defensively against user actions is another issue entirely, and the approach that I use is to think that the user is out to get me. Every input is examined as carefully as I can manage, and I make every effort to have my code fail safe - try not to persist any state that isn't rigorously vetted, correct where you can, exit gracefully if you cannot, etc. If you just think about all the bozo things that could be perpetrated on your code by outside agents, it gets you in the right mindset.
Coding defensively against other code, such as your platform or other modules, is exactly the same as users: they're out to get you. The OS is always going to swap out your thread at an inopportune time, networks are always going to go away at the wrong time, and in general, evil abounds around every corner. You don't need to code against every potential problem out there - the cost in maintenance might not be worth the increase in safety - but it sure doesn't hurt to think about it. And it usually doesn't hurt to explicitly comment in the code if there's a scenario you thought of but regard as unimportant for some reason.
Systems should have well designed boundaries where defensive checking happens. There should be a decision about where user input is validated (at what boundary) and where other potential defensive issues require checking (for example, third party integration points, publicly available APIs, rules engine interaction, or different units coded by different teams of programmers). More defensive checking than that violates DRY in many cases, and just adds maintenance cost for very little benifit.
That being said, there are certain points where you cannot be too paranoid. Potential for buffer overflows, data corruption and similar issues should be very rigorously defended against.
I recently had scenario, in which user input data was propagated through remote facade interface, then local facade interface, then some other class, to finally get to the method where it was actually used. I was asking my self a question: When should be the value validated? I added validation code only to the final class, where the value was actually used. Adding other validation code snippets in classes laying on the propagation path would be too defensive programming for me. One exception could be the remote facade, but I skipped it too.
Good question, I've flip flopped between doing sanity checks and not doing them. Its a 50/50
situation, I'd probably take a middle ground where I would only "Bullet Proof" any routines that are:
(a) Called from more than one place in the project
(b) has logic that is LIKELY to change
(c) You can not use default values
(d) the routine can not be 'failed' gracefully
Darknight

What can you do to a legacy codebase that will have the greatest impact on improving the quality?

As you work in a legacy codebase what will have the greatest impact over time that will improve the quality of the codebase?
Remove unused code
Remove duplicated code
Add unit tests to improve test coverage where coverage is low
Create consistent formatting across files
Update 3rd party software
Reduce warnings generated by static analysis tools (i.e.Findbugs)
The codebase has been written by many developers with varying levels of expertise over many years, with a lot of areas untested and some untestable without spending a significant time on writing tests.
Read Michael Feather's book "Working effectively with Legacy Code"
This is a GREAT book.
If you don't like that answer, then the best advice I can give would be:
First, stop making new legacy code[1]
[1]: Legacy code = code without unit tests and therefore an unknown
Changing legacy code without an automated test suite in place is dangerous and irresponsible. Without good unit test coverage, you can't possibly know what affect those changes will have. Feathers recommends a "stranglehold" approach where you isolate areas of code you need to change, write some basic tests to verify basic assumptions, make small changes backed by unit tests, and work out from there.
NOTE: I'm not saying you need to stop everything and spend weeks writing tests for everything. Quite the contrary, just test around the areas you need to test and work out from there.
Jimmy Bogard and Ray Houston did an interesting screen cast on a subject very similar to this:
http://www.lostechies.com/blogs/jimmy_bogard/archive/2008/05/06/pablotv-eliminating-static-dependencies-screencast.aspx
I work with a legacy 1M LOC application written and modified by about 50 programmers.
* Remove unused code
Almost useless... just ignore it. You wont get a big Return On Investment (ROI) from that one.
* Remove duplicated code
Actually, when I fix something I always search for duplicate. If I found some I put a generic function or comment all code occurrence for duplication (sometime, the effort for putting a generic function doesn't worth it). The main idea, is that I hate doing the same action more than once. Another reason is because there's always someone (could be me) that forget to check for other occurrence...
* Add unit tests to improve test coverage where coverage is low
Automated unit tests is wonderful... but if you have a big backlog, the task itself is hard to promote unless you have stability issue. Go with the part you are working on and hope that in a few year you have decent coverage.
* Create consistent formatting across files
IMO the difference in formatting is part of the legacy. It give you an hint about who or when the code was written. This can gave you some clue about how to behave in that part of the code. Doing the job of reformatting, isn't fun and it doesn't give any value for your customer.
* Update 3rd party software
Do it only if there's new really nice feature's or the version you have is not supported by the new operating system.
* Reduce warnings generated by static analysis tools
It can worth it. Sometime warning can hide a potential bug.
I'd say 'remove duplicated code' pretty much means you have to pull code out and abstract it so it can be used in multiple places - this, in theory, makes bugs easier to fix because you only have to fix one piece of code, as opposed to many pieces of code, to fix a bug in it.
Add unit tests to improve test coverage. Having good test coverage will allow you to refactor and improve functionality without fear.
There is a good book on this written by the author of CPPUnit, Working Effectively with Legacy Code.
Adding tests to legacy code is certianly more challenging than creating them from scratch. The most useful concept I've taken away from the book is the notion of "seams", which Feathers defines as
"a place where you can alter behavior in your program without editing in that place."
Sometimes its worth refactoring to create seams that will make future testing easier (or possible in the first place.) The google testing blog has several interesting posts on the subject, mostly revolving around the process of Dependency Injection.
I can relate to this question as I currently have in my lap one of 'those' old school codebase. Its not really legacy but its certainly not followed the trend of the years.
I'll tell you the things I would love to fix in it as they bug me every day:
Document the input and output variables
Refactor the variable names so they actually mean something other and some hungarian notation prefix followed by an acronym of three letters with some obscure meaning. CammelCase is the way to go.
I'm scared to death of changing any code as it will affect hundreds of clients that use the software and someone WILL notice even the most obscure side effect. Any repeatable regression tests would be a blessing since there are zero now.
The rest is really peanuts. These are the main problems with a legacy codebase, they really eat up tons of time.
I'd say it largely depends on what you want to do with the legacy code...
If it will indefinitely remain in maintenance mode and it's working fine, doing nothing at all is your best bet. "If it ain't broke, don't fix it."
If it's not working fine, removing the unused code and refactoring the duplicate code will make debugging a lot easier. However, I would only make these changes on the erring code.
If you plan on version 2.0, add unit tests and clean up the code you will bring forward
Good documentation. As someone who has to maintain and extend legacy code, that is the number one problem. It's difficult, if not downright dangerous to change code you don't understand. Even if you're lucky enough to be handed documented code, how sure are you that the documentation is right? That it covers all of the implicit knowledge of the original author? That it speaks to all of the "tricks" and edge cases?
Good documentation is what allows those other than the original author to understand, fix, and extend even bad code. I'll take hacked yet well-documented code that I can understand over perfect yet inscrutable code any day of the week.
The single biggest thing that I've done to the legacy code that I have to work with is to build a real API around it. It's a 1970's style COBOL API that I've built a .NET object model around, so that all the unsafe code is in one place, all of the translation between the API's native data types and .NET data types is in one place, the primary methods return and accept DataSets, and so on.
This was immensely difficult to do right, and there are still some defects in it that I know about. It's not terrifically efficient either, with all the marshalling that goes on. But on the other hand, I can build a DataGridView that round-trips data to a 15-year-old application which persists its data in Btrieve (!) in about half an hour, and it works. When customers come to me with projects, my estimates are in days and weeks rather than months and years.
As a parallel to what Josh Segall said, I would say comment the hell out of it. I've worked on several very large legacy systems that got dumped in my lap, and I found the biggest problem was keeping track of what I already learned about a particular section of code. Once I started placing notes as I go, including "To Do" notes, I stopped re-figuring out what I already figured out. Then I could focus on how those code segments flow and interact.
I would say just leave it alone for the most part. If it's not broken then don't fix it. If it is broken then go ahead and fix and improve the portion of the code that is broken and its immediately surrounding code. You can use the pain of the bug or sorely missing feature to justify the effort and expense of improving that part.
I would not recommend any wholesale kind of rewrite, refactor, reformat, or putting in of unit tests that is not guided by actual business or end-user need.
If you do get the opportunity to fix something, then do it right (the chance of doing it right the first time might have already passed, but since you are touching that part again might as well do it right time around) and this includes all the items you mentioned.
So in summary, there's no single or just a few things that you should do. You should do it all but in small portions and in an opportunistic manner.
Late to the party, but the following may be worth doing where a function/method is used or referenced often:
Local variables often tend to be poorly named in legacy code (often owing to their scope expanding when a method is modified, and not being updated to reflect this). Renaming these in line with their actual purpose can help clarify legacy code.
Even just laying out the method slightly differently can work wonders - for instance, putting all the clauses of an if on one line.
There might be stale/confusing code comments there already. Remove them if they're not needed, or amend them if you absolutely have to. (Of course, I'm not advocating removal of useful comments, just those that are a hindrance.)
These might not have the massive headline impact you're looking for, but they are low risk, particularly if the code can't be unit tested.

Resources