Related
A common practice of TDD is that you make tiny steps. But one thing which is bugging me is something I've seen a few people do, where by they just hardcode values/options, and then refactor later to make it work properly. For example…
describe Calculator
it should multiply
assert Calculator.multiply(4, 2) == 8
Then you do the least possible to make it pass:
class Calculator
def self.multiply(a, b)
return 8
And it does!
Why do people do this? Is it to ensure they're actually implementing the method in the right class or something? Cause it just seems like a sure-fire way to introduce bugs and give false-confidence if you forget something. Is it a good practice?
This practice is known as "Fake it 'til you make it." In other words, put fake implementations in until such time as it becomes simpler to put in a real implementation. You ask why we do this.
I do this for a number of reasons. One is simply to ensure that my test is being run. It's possible to be configured wrong so that when I hit my magic "run tests" key I'm actually not running the tests I think I'm running. If I press the button and it's red, then put in the fake implementation and it's green, I know I'm really running my tests.
Another reason for this practice is to keep a quick red/green/refactor rhythm going. That is the heartbeat that drives TDD, and it's important that it have a quick cycle. Important so you feel the progress, important so you know where you're at. Some problems (not this one, obviously) can't be solved in a quick heartbeat, but we must advance on them in a heartbeat. Fake it 'til you make it is a way to ensure that timely progress. See also flow.
There is a school of thought, which can be useful in training programmers to use TDD, that says you should not have any lines of source code that were not originally part of a unit test. By first coding the algorithm that passes the test into the test, you verify that your core logic works. Then, you refactor it out into something your production code can use, and write integration tests to define the interaction and thus the object structure containing this logic.
Also, religious TDD adherence would tell you that there should be no logic coded that a requirement, verified by an assertion in a unit test, does not specifically state. Case in point; at this time, the only test for multiplication in the system is asserting that the answer must be 8. So, at this time, the answer is ALWAYS 8, because the requirements tell you nothing different.
This seems very strict, and in the context of a simple case like this, nonsensical; to verify correct functionality in the general case, you would need an infinite number of unit tests, when you as an intelligent human being "know" how multiplication is supposed to work and could easily set up a test that generated and tested a multiplication table up to some limit that would make you confident it would work in all necessary cases. However, in more complex scenarios with more involved algorithms, this becomes a useful study in the benefits of YAGNI. If the requirement states that you need to be able to save record A to the DB, and the ability to save record B is omitted, then you must conclude "you ain't gonna need" the ability to save record B, until a requirement comes in that states this. If you implement the ability to save record B before you know you need to, then if it turns out you never need to then you have wasted time and effort building that into the system; you have code with no business purpose, that regardless can still "break" your system and thus requires maintenance.
Even in the simpler cases, you may end up coding more than you need if you code beyond requirements that you "know" are too light or specific. Let's say you were implementing some sort of parser for string codes. The requirements state that the string code "AA" = 1, and "AB" = 2, and that's the limit of the requirements. But, you know the full library of codes in this system include 20 others, so you include logic and tests that parse the full library. You go back the the client, expecting your payment for time and materials, and the client says "we didn't ask for that; we only ever use the two codes we specified in the tests, so we're not paying you for the extra work". And they would be exactly right; you've technically tried to bilk them by charging for code they didn't ask for and don't need.
I've always followed the logic: if assert fails, then there is a bug. Root cause could either be:
Assert itself is invalid (bug)
There is a programming error (bug)
(no other options)
I.E. Are there any other conclusions one could come to? Are there cases where an assert would fail and there is no bug?
If assert fails there is a bug in either the caller or callee. Why else would there be an assertion?
Yes, there is a bug in the code.
Code Complete
Assertions check for conditions that
should never occur. [...]
If an
assertion is fired for an anomalous
condition, the corrective action is
not merely to handle an error
gracefully- the corrective action is
to change the program's source code,
recompile, and release a new version
of the software.
A good way to
think of assertions is as executable
documentation - you can't rely on them
to make the code work, but they can
document assumptions more actively
than program-language comments can.
That's a good question.
My feeling is, if the assert fails due to your code, then it is a bug. The assertion is an expected behaviour/result of your code, so an assertion failure will be a failure of your code.
Only if the assert was meant to show a warning condition - in which case a special class of assert should have been used.
So, any assert should show a bug as you suggest.
If you are using assertions you're following Bertrand Meyer's Design by Contract philosophy. It's a programming error - the contract (assertion) you have specified is not being followed by the client (caller).
If you are trying to be logically inclusive about all the possibilities, remember that electronic circuitry is known to be affected by radiation from space. If the right photon/particle hits in just the right place at just the right time, it can cause an otherwise logically impossible state transition.
The probability is vanishingly small but still non-zero.
I can think of one case that wouldn't really class as a bug:
An assert placed to check for something external that normally should be there. You're hunting something nutty that occurs on one machine and you want to know if a certain factor is responsible.
A real world example (although from before the era of asserts): If a certain directory was hidden on a certain machine the program would barf. I never found any piece of code that should have cared if the directory was hidden. I had only very limited access to the offending machine (it had a bunch of accounting stuff on it) so I couldn't hunt it properly on the machine and I couldn't reproduce it elsewhere. Something that was done with that machine (the culprit was never identified) occasionally turned that directory hidden.
I finally resorted to putting a test in the startup to see if the directory was hidden and stopping with an error if it was.
No. An assertion failure means something happened that the original programmer did not intend or expect to occur.
This can indicate:
A bug in your code (you are simply calling the method incorrectly)
A bug in the Assertion (the original programmer has been too zealous and is complaining about you doing something that is quite reasonable and the method will actually handle perfectly well.
A bug in the called code (a design flaw). That is, the called code provides a contract that does not allow you to do what you need to do. The assertion warns you that you can't do things that way, but the solution is to extend the called method to handle your input.
A known but unimplemented feature. Imagine I implement a method that could process positive and negative integers, but I only need it (for now) to handle positive ones. I know that the "perfect" implementation would handle both, but until I actually need it to handle negatives, it is a waste of effort to implement support (and it would add code bloat and possibly slow down my application). So I have considered the case but I decide not to implement it until the need is proven. I therefore add an assert to mark this unimplemented code. When I later trigger the assert by passing a negative value in, I know that the additional functionality is now needed, so I must augment the implementation. Deferring writing the code until it is actually required thus saves me a lot of time (in most cases I never imeplement the additiona feature), but the assert makes sure that I don't get any bugs when I try to use the unimplemented feature.
I was having a discussion with one of my colleagues about how defensive your code should be. I am all pro defensive programming but you have to know where to stop. We are working on a project that will be maintained by others, but this doesn't mean we have to check for ALL the crazy things a developer could do. Of course, you could do that but this will add a very big overhead to your code.
How do you know where to draw the line?
Anything a user enters directly or indirectly, you should always sanity-check. Beyond that, a few asserts here and there won't hurt, but you can't really do much about crazy programmers editing and breaking your code, anyway!-)
I tend to change the amount of defense I put in my code based on the language. Today I'm primarily working in C++ so my thoughts are drifting in that direction.
When working in C++ there cannot be enough defensive programming. I treat my code as if I'm guarding nuclear secrets and every other programmer is out to get them. Asserts, throws, compiler time error template hacks, argument validation, eliminating pointers, in depth code reviews and general paranoia are all fair game. C++ is an evil wonderful language that I both love and severely mistrust.
I'm not a fan of the term "defensive programming". To me it suggests code like this:
void MakePayment( Account * a, const Payment * p ) {
if ( a == 0 || p == 0 ) {
return;
}
// payment logic here
}
This is wrong, wrong, wrong, but I must have seen it hundreds of times. The function should never have been called with null pointers in the first place, and it is utterly wrong to quietly accept them.
The correct approach here is debatable, but a minimal solution is to fail noisily, either by using an assert or by throwing an exception.
Edit: I disagree with some other answers and comments here - I do not think that all functions should check their parameters (for many functions this is simply impossible). Instead, I believe that all functions should document the values that are acceptable and state that other values will result in undefined behaviour. This is the approach taken by the most succesful and widely used libraries ever written - the C and C++ standard libraries.
And now let the downvotes begin...
I don't know that there's really any way to answer this. It's just something that you learn from experience. You just need to ask yourself how common a potential problem is likely to be and make a judgement call. Also consider that you don't necessarily have to always code defensively. Sometimes it's acceptable just to note any potential problems in your code's documentation.
Ultimately though, I think this is just something that a person has to follow their intuition on. There's no right or wrong way to do it.
If you're working on public APIs of a component then its worth doing a good amount of parameter validation. This led me to have a habit of doing validation everywhere. Thats a mistake. All that validation code never gets tested and potentially makes the system more complicated than it needs to be.
Now I prefer to validate by unit testing. Validation definitely happens for data coming from external sources, but not for calls from non-external developers.
I always Debug.Assert my assumptions.
My personal ideology: the defensiveness of a program should be proportional to the maximum naivety/ignorance of the potential user base.
Being defensive against developers consuming your API code is not that different from being defensive against regular users.
Check the parameters to make sure they are within appropriate bounds and of expected types
Verify that the number of API calls which could be made are within your Terms of Service. Generally called throttling it usually only applies to web services and password checking functions.
Beyond that there's not much else to do except make sure your app recovers well in the event of a problem and that you always give ample information to the developer so that they understand what's going on.
Defensive programming is only one way of hounouring a contract in a design-by-contract manner of coding.
The other two are
total programming and
nominal programming.
Of course you shouldnt defend yourself against every crazy thing a developer could do, but then you should state in wich context it will do what is expected to using preconditions.
//precondition : par is so and so and so
function doSth(par)
{
debug.assert(par is so and so and so )
//dostuf with par
return result
}
I think you have to bring in the question of whether you're creating tests as well. You should be defensive in your coding, but as pointed out by JaredPar -- I also believe it depends on the language you're using. If it's unmanaged code, then you should be extremely defensive. If it's managed, I believe you have a little bit of wiggleroom.
If you have tests, and some other developer tries to decimate your code, the tests will fail. But then again, it depends on test coverage on your code (if there is any).
I try to write code that is more than defensive, but down right hostile. If something goes wrong and I can fix it, I will. if not, throw or pass on the exception and make it someone elses problem. Anything that interacts with a physical device - file system, database connection, network connection should be considered unereliable and prone to failure. anticipating these failures and trapping them is critical
Once you have this mindset, the key is to be consistent in your approach. do you expect to hand back status codes to comminicate problems in the call chain or do you like exceptions. mixed models will kill you or at least drive you to drink. heavily. if you are using someone elses api, then isolate these things into mechanisms that trap/report in terms you use. use these wrapping interfaces.
If the discussion here is how to code defensively against future (possibly malevolent or incompetent) maintainers, there is a limit to what you can do. Enforcing contracts through test coverage and liberal use of asserting your assumptions is probably the best you can do, and it should be done in a way that ideally doesn't clutter the code and make the job harder for the future non-evil maintainers of the code. Asserts are easy to read and understand and make it clear what the assumptions of a given piece of code is, so they're usually a great idea.
Coding defensively against user actions is another issue entirely, and the approach that I use is to think that the user is out to get me. Every input is examined as carefully as I can manage, and I make every effort to have my code fail safe - try not to persist any state that isn't rigorously vetted, correct where you can, exit gracefully if you cannot, etc. If you just think about all the bozo things that could be perpetrated on your code by outside agents, it gets you in the right mindset.
Coding defensively against other code, such as your platform or other modules, is exactly the same as users: they're out to get you. The OS is always going to swap out your thread at an inopportune time, networks are always going to go away at the wrong time, and in general, evil abounds around every corner. You don't need to code against every potential problem out there - the cost in maintenance might not be worth the increase in safety - but it sure doesn't hurt to think about it. And it usually doesn't hurt to explicitly comment in the code if there's a scenario you thought of but regard as unimportant for some reason.
Systems should have well designed boundaries where defensive checking happens. There should be a decision about where user input is validated (at what boundary) and where other potential defensive issues require checking (for example, third party integration points, publicly available APIs, rules engine interaction, or different units coded by different teams of programmers). More defensive checking than that violates DRY in many cases, and just adds maintenance cost for very little benifit.
That being said, there are certain points where you cannot be too paranoid. Potential for buffer overflows, data corruption and similar issues should be very rigorously defended against.
I recently had scenario, in which user input data was propagated through remote facade interface, then local facade interface, then some other class, to finally get to the method where it was actually used. I was asking my self a question: When should be the value validated? I added validation code only to the final class, where the value was actually used. Adding other validation code snippets in classes laying on the propagation path would be too defensive programming for me. One exception could be the remote facade, but I skipped it too.
Good question, I've flip flopped between doing sanity checks and not doing them. Its a 50/50
situation, I'd probably take a middle ground where I would only "Bullet Proof" any routines that are:
(a) Called from more than one place in the project
(b) has logic that is LIKELY to change
(c) You can not use default values
(d) the routine can not be 'failed' gracefully
Darknight
I want to know if there are method to quickly find bugs in the program.
It seems that the more you master the architecture of your software, the more quickly
you can locate the bugs.
How the programmers improve their ability to find a bug?
Logging, and unit tests. The more information you have about what happened, the easier it is to reproduce it. The more modular you can make your code, the easier it is to check that it really is misbehaving where you think it is, and then check that your fix solves the problem.
Divide and conquer. Whenever you are debugging, you should be thinking about cutting down the possible locations of the problem. Every time you run the app, you should be trying to eliminate a possible source and zero in on the actual location. This can be done with logging, with a debugger, assertions, etc.
Here's a prophylactic method after you have found a bug: I find it really helpful to take a minute and think about the bug.
What was the bug exactly in essence.
Why did it occur.
Could you have found it earlier, easier.
Anything else you learned from the bug.
I find taking a minute to think about these things will make it far less likely that you will produce the same bug in the future.
I will assume you mean logic bugs. The best way I have found to capture logic bugs is to implement some sort of testing scheme. Check out jUnit as the standard. Pretty much you define a set of accepted outputs of your methods. Every time you compile your system it checks all of your test cases. If you have introduced new logic that breaks your tests, you will know about it instantly and know exactly what you have to fix.
Test driven design is a pretty big movement in programming right now. You will be hard pressed to find a language that doesn't support some kind of testing. Even JavaScript has a multitude of test suites.
Experience makes you a better debugger. Pay close attention to the bugs that you AND others commonly make. Try to figure out if/how these bugs apply to ALL code that affects you, not the single instance of where the bug was seen.
Raymond Chen is famous for his powers of psychic debugging.
Most of what looks like psychic
debugging is really just knowing what
people tend to get wrong.
That means that you don't necessarily have to be intimately familiar with the architecture / system. You just need enough knowledge to understand the types of bugs that apply and are easy to make.
I personally take the approach of thinking about where the bug may be in the code before actually opening up the code and taking a look. When you first start with this approach, it may not actually work very well, especially if you are pretty unfamiliar with the code base. However, over time someone will be able to tell you the behavior they are experiencing and you'll have a good idea where the problem is located or you may even know what to fix in the code to remedy the problem before even looking at the code.
I was on a project for several years that maintained by a vendor. They were not very good debuggers and most of the time it was up to us to point them to an area of the code that had the problem. What made our problem worse was that we didn't have a nice way to view the source code, so a lot of our "debugging" was just feeling.
Error checking and reporting. The #1 newbie coder debugging mistake is to turn off error reporting, avoid checking for whether what's going on makes sense, etc etc. In general, people feel like if they can't see anything going wrong then nothing is going wrong. Which of course could not be further from the case.
Instead, your code should be chock full of error conditions that will make lots of noise, with detailed reporting, someplace you will see it. (This doesn't mean inside a production web page.) Then, instead of having to trace an error all over the place because it got passed through sixteen layers of execution before it finally got someplace that broke, your errors start happening proximately to the actual issue.
It seems that the more you master the
architecture of your software ,the
more quickly you can locate the bugs.
After understanding the architecture, one's ability to find bugs in the application increases with their ability to identify and write extensive tests.
Know your tools.
Make sure that you know how to use conditional breakpoints and watches in your debugger.
Use static analysis tools as well - they can point out the more obvious issues.
Sleep and rest.
Use programming methods that produce fewer bugs in the first place.
If to implement a single stand-alone functional requirement it takes N separate point-edits to source code, the number of bugs put into the code is roughly proportional to N, so find programming methods that minimize N. Ways to do this: DRY (don't repeat yourself), code generation, and DSL (domain-specific-language).
Where bugs are likely, have unit tests.
Obviously.IMHO, the best unit tests are monte-carlo.
Make intermediate results visible.
For example, compilers have intermediate representations, in the form of 4-tuples. If there is a bug, the intermediate code can be examined. That tells if the bug is in the first or second half of the compiler.
P.S. Most programmers are not aware that they have a choice of how much data structure to use. The less data structure you use, the less are the chances for bugs (and performance issues) caused by it.
I find tracepoints to be an invaluable debugging tool. They are a bit like logging, except you create them during a debugging session to solve a particular issue, like breakpoints.
Printing the stacktrace in a tracepoint can be especially useful. For example, you can print the hash code and stacktrace in the constructor of an object, and then later on when the object is used again you can search for its hashcode to see which client code created it. Same for seeing who disposed it or called a certain method etc.
They are also great for debugging issues related to window focus changes etc, where the debugger would interfere if you drop in break mode.
Static code tools like FindBugs
Assertions, assertions, and assertions.
Some areas of our code has 4 or 5 assertions for each line of real code. When we get a bug report the first thing that happens is that the customer data is processed in our debug build 99 times out a hundred an assert will fire near the cause of the bug.
Additionally our debug build perform redundant calculations to ensure that an optimized algorithm is returning the correct result, and also debug functions are used to examine the sanity of data structures.
The hardest thing new developers have to contend with is getting their code to survive the assertions of the code gthey are calling.
Additionally we do not allow any code to be putback to toplevel that causes any integration or unit test to fail.
Stepping through the code, examining flow/state where unexpected behavior is occurring. (Then develop a test for it, of course).
Writing Debug.Write(message) in your code and using DebugView is another option. And then run your application find out what is going on.
"Architecture" in software means something like:
Several components
The components interact across clearly-defined interfaces
Each component has a well-defined responsibility
The responsibility of one component is unlike the responsibilities of other components
So, as you said, the better the architecture the easier it is to find bugs.
First: knowing the bug, you can decide which functionality is broken, and therefore know which component implements that functionality. For example, if the bug is that something isn't being logged properly, therefore this bug should be in one of 3 places:
In the component that's responsible for logging (your logging library)
Or, above that in the application code which is using this library
Or, below that in the system code which this library is using
Second: examine the data transfered across the interfaces between components. To continue the previous example above:
Set a debugger breakpoint on the application code which invokes the logger API, to verify whether the logger API is being used correctly (e.g. whether it's being invoked at all, whether parameters are as-expected, etc.).
Doing this tells you whether the bug is in the component above this interface, or in the component that's below this interface.
Repeat (perhaps using binary search if the call stack is very deep) until you've found which component is at fault.
When you come to the point that you think there must be a bug in the OS, check your assertions -- and put them into the code with "assert" statements.
Conversely, as you are writing the code, think of the range of valid inputs for your algorithms and put in assertions to make sure you have what you think you have. Same goes for output: Check that you produced what you think you produced.
E.g. if you expect a non-empty list:
l = getList(input)
assert l, "List was empty for input: %s" % str(input)
I'm part of the QA team # work, and knowing anything about the product and how it is developed, helps a lot in finding bugs, also when I make new QA tools I pass it to our dev team to test it, finding bugs in your own code is just plain hard!
Some people say programmers are tainted, so we cannot see bugs in their own product; we are not talking about code here, we are beyond that, usability and functionality itself.
Meanwhile unit testing seams to be a nice solution to find bugs in your own code, its totally pointless if you're wrong even before writing the unit test, how are you going to find the bugs then? you don't!, let your co-worker find them, hire a QA guy.
Scientific debugging is what I always used, and it greatly helps.
Basically, if you can replicate a bug, you can track its origin. You should then experiment some tests, observe the results, and infer hypotheses on why the bug happens.
Writing about all your hypotheses, attempts, expected results and observed results can help you track down the bugs, particularly if they're nasty.
There are automated tools that can help you with that process, particularly git-bisect (and similar bisection tools on other revision systems) to quickly find which change introduced the bug, unit testing to reproduce a bug and prevent regressions in your code (can be used in combination with bisect), and delta debugging to find the culprit in your code (similar to git-bisect but whereas git-bisect works on the code history, delta debugging works on the code directly).
But whatever the tools you are using, the most important benefit is in the scientific methodology, as this is the formalization of what most experienced debuggers do.
The code in this question made me think
assert(value>0); //Precondition
if (value>0)
{
//Doit
}
I never write the if-statement. Asserting is enough/all you can do.
"Crash early, crash often"
CodeComplete states:
The assert-statement makes the application Correct
The if-test makes the application Robust
I don't think you've made an application more robust by correcting invalid input values, or skipping code:
assert(value >= 0 ); //Precondition
assert(value <= 90); //Precondition
if(value < 0) //Just in case
value = 0;
if (value > 90) //Just in case
value = 90;
//Doit
These corrections are based on assumptions you made about the outside world.
Only the caller knows what "a valid input value" is for your function, and he must check its validity before he calls your function.
To paraphrase CodeComplete:
"Real-world programs become too messy when we don't rely solely on assertions."
Question: Am I wrong, stuborn, stupid, too non-defensive...
The problem with trusting just Asserts, is that they may be turned off in a production environment. To quote the wikipedia article:
Most languages allow assertions to be
enabled or disabled globally, and
sometimes independently. Assertions
are often enabled during development
and disabled during final testing and
on release to the customer. Not
checking assertions avoiding the cost
of evaluating the assertions while,
assuming the assertions are free of
side effects, still producing the same
result under normal conditions. Under
abnormal conditions, disabling
assertion checking can mean that a
program that would have aborted will
continue to run. This is sometimes
preferable.
Wikipedia
So if the correctness of your code relies on the Asserts to be there you may run into serious problems. Sure, if the code worked during testing it should work during production... Now enter the second guy that works on the code and is just going to fix a small problem...
Use assertions for validating input you control: private methods and such.
Use if statements for validating input you don't control: public interfaces designed for consumption by the user, user input testing etc.
Test you application with assertions built in. Then deploy without the assertions.
I some cases, asserts are disabled when building for release. You may not have control over this (otherwise, you could build with asserts on), so it might be a good idea to do it like this.
The problem with "correcting" the input values is that the caller will not get what they expect, and this can lead to problems or even crashes in wholly different parts of the program, making debugging a nightmare.
I usually throw an exception in the if-statement to take over the role of the assert in case they are disabled
assert(value>0);
if(value<=0) throw new ArgumentOutOfRangeException("value");
//do stuff
I would disagree with this statement:
Only the caller knows what "a valid
input value" is for your function, and
he must check its validity before he
calls your function.
Caller might think that he know that input value is correct. Only method author knows how it suppose to work. Programmer's best goal is to make client to fall into "pit of success". You should decide what behavior is more appropriate in given case. In some cases incorrect input values can be forgivable, in other you should throw exception\return error.
As for Asserts, I'd repeat other commenters, assert is a debug time check for code author, not code clients.
Don't forget that most languages allow you to turn off assertions... Personally, if I was prepared to write if tests to protect against all ranges of invalid input, I wouldn't bother with the assertion in the first place.
If, on the other hand you don't write logic to handle all cases (possibly because it's not sensible to try and continue with invalid input) then I would be using the assertion statement and going for the "fail early" approach.
If I remember correctly from CS-class
Preconditions define on what conditions the output of your function is defined. If you make your function handle errorconditions your function is defined for those condition and you don't need the assert statement.
So I agree. Usually you don't need both.
As Rik commented this can cause problems if you remove asserts in released code. Usually I don't do that except in performance-critical places.
I should have stated I was aware of the fact that asserts (here) dissappear in production code.
If the if-statement actually corrects invalid input data in production code, this means the assert never went off during testing on debug code, this means you wrote code that you never executed.
For me it's an OR situation:
(quote Andrew) "protect against all ranges of invalid input, I wouldn't bother with the assertion in the first place." -> write an if-test.
(quote aku) "incorrect input values can be forgivable" -> write an assert.
I can't stand both...
For internal functions, ones that only you will use, use asserts only. The asserts will help catch bugs during your testing, but won't hamper performance in production.
Check inputs that originate externally with if-conditions. By externally, that's anywhere outside the code that you/your team control and test.
Optionally, you can have both. This would be for external facing functions where integration testing is going to be done before production.
A problem with assertions is that they can (and usually will) be compiled out of the code, so you need to add both walls in case one gets thrown away by the compiler.