Avoiding, in general, "undefined method 'some_method' for nil:NilClass" in Ruby - ruby

Ruby's duck-typing is great, but this is the one way that it bites me in the ass. I'll have some long running text-processing script or something running, and after several hours, some unexpected set of circumstances ends up causing the script to exit with at NoMethodError due to a variable becoming nil.
Now, once it happens, it's usually an easy fix, but it would be nicer if I could predict these better, or at least handle these types of errors more gracefully. Sorry for the vagueness of the question, but this type of error just happens too often to me and I wonder if there's a good way to avoid it.
Is there some best practice related to these kinds of "type errors" for Ruby?

Look up Design by Contract. It's useful in many programming paradigms, but it's particularly useful when you don't have a compiler to help you catch these sort of errors, of forbidding particular sorts of values for a parameter.
In essence, DbC allows you to make an assumption about a parameter. It allows you (in all but one place) to skip the mundane checks that guarantee this assumption to hold.

What about Object.nil?

Related

Ruby nested assignment and bracketless method calls

I'm surprised by a particular bit of syntax-sensitivity in Ruby. These all work:
var = method arg
var2 = (var1 = method arg)
method2(method1 arg)
But this does not:
method2(var = method1 arg)
Instead, I have to do either this, with extra parentheses:
method2(var = method1(arg))
..or this, which I find much more ambiguous than the version that fails:
method2 var = method1(arg)
I assume that this is either a specific design decision or the side effect of another one, and would appreciate any insight into those decisions.
Please note that I'm not looking for any opinions about style; I'm not asking what looks better, or what you think should or should not work. I will even stipulate that this particular construct would be clearer if split into two separate statements entirely. I'm just curious about the actual reasons why Ruby works this way, from anyone who might have that background information.
I assume that this is either a specific design decision or the side effect of another one, and would appreciate any insight into those decisions.
Ruby's syntax is ridiculously complex. And since most Ruby implementations use a parser generator like Bison, which however isn't actually powerful enough to parse such a ridiculously complex language, the parsers tend to be even more ridiculously complex. It's much more likely that it's two weird parsing corner cases interacting in an even weirder way than any sort of conscious design decision.

Alternative to using -1 for invalid value

I was reading the coding standards for insomniac games here: http://www.insomniacgames.com/core-coding-standard/#id.8a4ef3275bfd and it mentions to not use "magic values". I'm not talking about magic numbers replaced with a named constant though.
In other words, don't use -1 to mean "invalid" or "not set". But it doesn't tell you what a better practice is. What do you guys do? How can you trust that the valid is actually valid without checking it? The only thing I could think of, would to have a bool that indicates it's valid or invalid, but this seems sloppy.
Actually, if you read the next paragraph, they do tell you what they regard as better practice:
“For things with discrete enumerated values, like ids, having a sentinel for "invalid" is okay, but keep make it an explicit constant. For example, the are encoding of DYN_JOINT_ID_INVALID (0xffffffff) in the old dynamic joint code is an appropriate use of sentinel values.”
Depending on the circumstances, it's arguable whether returning a unique error code, as they prefer, is preferable to returning a negative value, which is a well-established idiom. Exceptions, which they don't even mention, might be a better alternative. But what might be the best choice in theory is pretty well irrelevant: they've already made the decision, presumably for good reason, and put it in their coding standard, which you just have to live with.
You can use an enum to return a better value that makes more sense. enums really just give numbers names.
enum Errors {
NO_DATA_RECIEVED
COMPUTER_ON_FIRE
WIFI_OFF
}
You would use it like:
throw COMPUTER_ON_FIRE;
And in the error handler:
if(error == COMPUTER_ON_FIRE)
displayDialog("Grab a fire extinguisher!");
This is a lot more informative to the programmer than return -1- now you know what the problem actually is.

Ruby features that are risky to use

As a Ruby programmer did you feel any time about any feature which is a bit risky to use, may be because of it's strange behaviour? It might be well documented but hard to find while debugging or counterintuitive to remember?
I generally try to stay away from String#gsub!. The doc says "Performs the substitutions of String#gsub in place, returning str, or nil if no substitutions were performed." So if there's nothing to substitute then it'll return nil. Practically I didn't see any use case where this comes handy.
SO, with your experience, is there anything else you'd like to add?
Using return in a lambda, Proc or block. The semantics are well defined, but you will get it wrong, and you will get a LocalJumpError.
The meta programming features of Ruby can be used in very dangerous ways. I've seen code that attempts to runtime-redefine common methods on code classes like String, or Array, and while I can see this being MAYBE acceptable for a tiny temporary script, I don't think it is remotely a good idea in a complex application with many dependancies, or many maintainers.
Well, the most widely abused dangerous feature of Ruby is certainly evaling strings. In vast majority of cases (if not all) it can be avoided using other methods, usually define_method, const_get, const_set etc.
throw/catch (not the same as begin/rescue!) is basically GOTO, and that could be considered to be a risky feature to use in any language.

Is checking Perl function arguments worth it?

There's a lot of buzz about MooseX::Method::Signatures and even before that, modules such as Params::Validate that are designed to type check every argument to methods or functions. I'm considering using the former for all my future Perl code, both personal and at my place of work. But I'm not sure if it's worth the effort.
I'm thinking of all the Perl code I've seen (and written) before that performs no such checking. I very rarely see a module do this:
my ($a, $b) = #_;
defined $a or croak '$a must be defined!';
!ref $a or croak '$a must be a scalar!";
...
#_ == 2 or croak "Too many arguments!";
Perhaps because it's simply too much work without some kind of helper module, but perhaps because in practice we don't send excess arguments to functions, and we don't send arrayrefs to methods that expect scalars - or if we do, we have use warnings; and we quickly hear about it - a duck typing approach.
So is Perl type checking worth the performance hit, or are its strengths predominantly shown in compiled, strongly typed languages such as C or Java?
I'm interested in answers from anyone who has experience writing Perl that uses these modules and has seen benefits (or not) from their use; if your company/project has any policies relating to type checking; and any problems with type checking and performance.
UPDATE: I read an interesting article on the subject recently, called Strong Testing vs. Strong Typing. Ignoring the slight Python bias, it essentially states that type checking can be suffocating in some instances, and even if your program passes the type checks, it's no guarantee of correctness - proper tests are the only way to be sure.
If it's important for you to check that an argument is exactly what you need, it's worth it. Performance only matters when you already have correct functioning. It doesn't matter how fast you can get a wrong answer or a core dump. :)
Now, that sounds like a stupid thing to say, but consider some cases where it isn't. Do I really care what's in #_ here?
sub looks_like_a_number { $_[0] !~ /\D/ }
sub is_a_dog { eval { $_[0]->DOES( 'Dog' ) } }
In those two examples, if the argument isn't what you expect, you are still going to get the right answer because the invalid arguments won't pass the tests. Some people see that as ugly, and I can see their point, but I also think the alternative is ugly. Who wins?
However, there are going to be times that you need guard conditions because your situation isn't so simple. The next thing you have to pass your data to might expect them to be within certain ranges or of certain types and don't fail elegantly.
When I think about guard conditions, I think through what could happen if the inputs are bad and how much I care about the failure. I have to judge that by the demands of each situation. I know that sucks as an answer, but I tend to like it better than a bondage-and-discipline approach where you have to go through all the mess even when it doesn't matter.
I dread Params::Validate because its code is often longer than my subroutine. The Moose stuff is very attractive, but you have to realize that it's a way for you to declare what you want and you still get what you could build by hand (you just don't have to see it or do it). The biggest thing I hate about Perl is the lack of optional method signatures, and that's one of the most attractive features in Perl 6 as well as Moose.
I basically concur with brian. How much you need to worry about your method's inputs depends heavily on how much you are concerned that a) someone will input bad data, and b) bad data will corrupt the purpose of the method. I would also add that there is a difference between external and internal methods. You need to be more diligent about public methods because you're making a promise to consumers of your class; conversely you can be less diligent about internal methods as you have greater (theoretical) control over the code that accesses it, and have only yourself to blame if things go wrong.
MooseX::Method::Signatures is an elegant solution to adding a simple declarative way to explain the parameters of a method. Method::Signatures::Simple and Params::Validate are nice but lack one of the features I find most appealing about Moose: the Type system. I have used MooseX::Declare and by extension MooseX::Method::Signatures for several projects and I find that the bar to writing the extra checks is so minimal it's almost seductive.
Yes its worth it - defensive programming is one of those things that are always worth it.
The counterargument I've seen presented to this is that checking parameters on every single function call is redundant and a waste of CPU time. This argument's supporters favor a model in which all incoming data is rigorously checked when it first enters the system, but internal methods have no parameter checks because they should only be called by code which will pass them data which has already passed the checks at the system's border, so it is assumed to still be valid.
In theory, I really like the sound of that, but I can also see how easily it can fall like a house of cards if someone uses the system (or the system needs to grow to allow use) in a way that was unforeseen when the initial validation border is established. All it takes is one external call to an internal function and all bets are off.
In practice, I'm using Moose at the moment and Moose doesn't really give you the option to bypass validation at the attribute level, plus MooseX::Declare handles and validates method parameters with less fuss than unrolling #_ by hand, so it's pretty much a moot point.
I want to mention two points here.
The first are the tests, the second the performance question.
1) Tests
You mentioned that tests can do a lot and that tests are the only way
to be sure that your code is correct. In general i would say this is
absolutly correct. But tests itself only solves one problem.
If you write a module you have two problems or lets say two different
people that uses your module.
You as a developer and a user that uses your module. Tests helps with the
first that your module is correct and do the right thing, but it didn't
help the user that just uses your module.
For the later, i have one example. i had written a module using Moose
and some other stuff, my code ended always in a Segmentation fault.
Then i began to debug my code and search for the problem. I spend around
4 hours of time to find the error. In the end the problem was that i have
used Moose with the Array Trait. I used the "map" function and i didn't
provide a subroutine function, just a string or something else.
Sure this was an absolutly stupid error of mine, but i spend a long time to
debug it. In the end just a checking of the input that the argument is
a subref would cost the developer 10 seconds of time, and would cost me
and propably other a lot of more time.
I also know of other examples. I had written a REST Client to an interface
completly OOP with Moose. In the end you always got back Objects, you
can change the attributes but sure it didn't call the REST API for
every change you did. Instead you change your values and in the end you
call a update() method that transfers the data, and change the values.
Now i had a user that then wrote:
$obj->update({ foo => 'bar' })
Sure i got an error back, that update() does not work. But sure it didn't
work, because the update() method didn't accept a hashref. It only does
a synchronisation of the actual state of the object with the online
service. The correct code would be.
$obj->foo('bar');
$obj->update();
The first thing works because i never did a checking of the arguments. And i don't throw an error if someone gives more arguments then i expect. The method just starts normal like.
sub update {
my ( $self ) = #_;
...
}
Sure all my tests absolutely works 100% fine. But handling these errors that
are not errors cost me time too. And it costs the user propably a lot
of more time.
So in the end. Yes, tests are the only correct way to ensure that your code
works correct. But that doesn't mean that type checking is meaningless.
Type checking is there to help all your non-developers (on your module)
to use your module correctly. And saves you and others time finding
dump errors.
2) Performance
The short: You don't care for performance until you care.
That means until your module works to slow, Performance is always fast
enough and you don't need to care for this. If your module really works
to slow you need further investigations. But for these investigions
you should use a profiler like Devel::NYTProf to look what is slow.
And i would say. In 99% slowliness is not because you do type
checking, it is more your algorithm. You do a lot of computation, calling
functions to often etc. Often it helps if you do completly other solutions
use another better algorithm, do caching or something else, and the
performance hit is not your type checking. But even if the checking is the
performance hit. Then just remove it where it matters.
There is no reason to leave the type checking where performance don't
matters. Do you think type checking does matter in a case like above?
Where i have written a REST Client? 99% of performance issues here are
the amount of request that goes to the webservice or the time for such an
request. Don't using type checking or MooseX::Declare etc. would propably
speed up absolutly nothing.
And even if you see performance disadvantages. Sometimes it is acceptable.
Because the speed doesn't matter or sometimes something gives you a greater
value. DBIx::Class is slower then pure SQL with DBI, but DBIx::Class
gives you a lot for these.
Params::Validate works great,but of course checking args slows things down. Tests are mandatory(at least in the code I write).
Yes it's absolutely worth it, because it will help during development, maintenance, debugging, etc.
If a developer accidentally sends the wrong parameters to a method, a useful error message will be generated, instead of the error being propagated down to somewhere else.
I'm using Moose extensively for a fairly large OO project I'm working on. Moose's strict type checking has saved my bacon on a few occassions. Most importantly it has helped avoid situations where "undef" values are incorrectly being passed to the method. In just those instances alone it saved me hours of debugging time..
The performance hit is definitely there, but it can be managed. 2 hours of using NYTProf helped me find a few Moose Attributes that I was grinding too hard and I just refactored my code and got 4x performance improvement.
Use type checking. Defensive coding is worth it.
Patrick.
Sometimes. I generally do it whenever I'm passing options via hash or hashref. In these cases it's very easy to misremember or misspell an option name, and checking with Params::Check can save a lot of troubleshooting time.
For example:
sub revise {
my ($file, $options) = #_;
my $tmpl = {
test_mode => { allow => [0,1], 'default' => 0 },
verbosity => { allow => qw/^\d+$/, 'default' => 1 },
force_update => { allow => [0,1], 'default' => 0 },
required_fields => { 'default' => [] },
create_backup => { allow => [0,1], 'default' => 1 },
};
my $args = check($tmpl, $options, 1)
or croak "Could not parse arguments: " . Params::Check::last_error();
...
}
Prior to adding these checks, I'd forget whether the names used underscores or hyphens, pass require_backup instead of create_backup, etc. And this is for code I wrote myself--if other people are going to use it, you should definitely do some sort of idiot-proofing. Params::Check makes it fairly easy to do type checking, allowed value checking, default values, required options, storing option values to other variables and more.

What is your personal approach/take on commenting?

Duplicate
What are your hard rules about commenting?
A Developer I work with had some things to say about commenting that were interesting to me (see below). What is your personal approach/take on commenting?
"I don't add comments to code unless
its a simple heading or there's a
platform-bug or a necessary
work-around that isn't obvious. Code
can change and comments may become
misleading. Code should be
self-documenting in its use of
descriptive names and its logical
organization - and its solutions
should be the cleanest/simplest way
to perform a given task. If a
programmer can't tell what a program
does by only reading the code, then
he's not ready to alter it.
Commenting tends to be a crutch for
writing something complex or
non-obvious - my goal is to always
write clean and simple code."
"I think there a few camps when it
comes to commenting, the
enterprisey-type who think they're
writing an API and some grand
code-library that will be used for
generations to come, the
craftsman-like programmer that thinks
code says what it does clearer than a
comment could, and novices that write
verbose/unclear code so as to need to
leave notes to themselves as to why
they did something."
There's a tragic flaw with the "self-documenting code" theory. Yes, reading the code will tell you exactly what it is doing. However, the code is incapable of telling you what it's supposed to be doing.
I think it's safe to say that all bugs are caused when code is not doing what it's supposed to be doing :). So if we add some key comments to provide maintainers with enough information to know what a piece of code is supposed to be doing, then we have given them the ability to fix a whole lot of bugs.
That leaves us with the question of how many comments to put in. If you put in too many comments, things become tedious to maintain and the comments will inevitably be out of date with the code. If you put in too few, then they're not particularly useful.
I've found regular comments to be most useful in the following places:
1) A brief description at the top of a .h or .cpp file for a class explaining the purpose of the class. This helps give maintainers a quick overview without having to sift through all of the code.
2) A comment block before the implementation of a non-trivial function explaining the purpose of it and detailing its expected inputs, potential outputs, and any oddities to expect when calling the function. This saves future maintainers from having to decipher entire functions to figure these things out.
Other than that, I tend to comment anything that might appear confusing or odd to someone. For example: "This array is 1 based instead of 0 based because of blah blah".
Well written, well placed comments are invaluable. Bad comments are often worse than no comments. To me, lack of any comments at all indicates laziness and/or arrogance on the part of the author of the code. No matter how obvious it is to you what the code is doing or how fantastic your code is, it's a challenging task to come into a body of code cold and figure out what the heck is going on. Well done comments can make a world of difference getting someone up to speed on existing code.
I've always liked Refactoring's take on commenting:
The reason we mention comments here is that comments often are used as a deodorant. It's surprising how often you look at thickly commented code and notice that the comments are there because the code is bad.
Comments lead us to bad code that has all the rotten whiffs we've discussed in the rest of this chapter. Our first action is to remove the bad smells by refactoring. When we're finished, we often find that the comments are superfluous.
As controversial as that is, it's rings true for the code I've read. To be fair, Fowler isn't saying to never comment, but to think about the state of your code before you do.
You need documentation (in some form; not always comments) for a local understanding of the code. Code by itself tells you what it does, if you read all of it and can keep it all in mind. (More on this below.) Comments are best for informal or semiformal documentation.
Many people say comments are a code smell, replaceable by refactoring, better naming, and tests. While this is true of bad comments (which are legion), it's easy to jump to concluding it's always so, and hallelujah, no more comments. This puts all the burden of local documentation -- too much of it, I think -- on naming and tests.
Document the contract of each function and, for each type of object, what it represents and any constraints on a valid representation (technically, the abstraction function and representation invariant). Use executable, testable documentation where practical (doctests, unit tests, assertions), but also write short comments giving the gist where helpful. (Where tests take the form of examples, they're incomplete; where they're complete, precise contracts, they can be as much work to grok as the code itself.) Write top-level comments for each module and each project; these can explain conventions that keep all your other comments (and code) short. (This supports naming-as-documentation: with conventions established, and a place we can expect to find subtleties noted, we can be confident more often that the names tell all we need to know.) Longer, stylized, irritatingly redundant Javadocs have their uses, but helped generate the backlash.
(For instance, this:
Perform an n-fold frobulation.
#param n the number of times to frobulate
#param x the x-coordinate of the center of frobulation
#param y the y-coordinate of the center of frobulation
#param z the z-coordinate of the center of frobulation
could be like "Frobulate n times around the center (x,y,z)." Comments don't have to be a chore to read and write.)
I don't always do as I say here; it depends on how much I value the code and who I expect to read it. But learning how to write this way made me a better programmer even when cutting corners.
Back on the claim that we document for the sake of local understanding: what does this function do?
def is_even(n): return is_odd(n-1)
Tests if an integer is even? If is_odd() tests if an integer is odd, then yes, that works. Suppose we had this:
def is_odd(n): return is_even(n-1)
The same reasoning says this is_odd() tests if an integer is odd. Put them together, of course, and neither works, even though each works if the other does. Change it a bit and we'd have code that does work, but only for natural numbers, while still locally looking like it works for integers. In microcosm that's what understanding a codebase is like: tracing dependencies around in circles to try to reverse-engineer assumptions the author could have explained in a line or two if they'd bothered. I hate the expense of spirit thoughtless coders have put me to this way over the past couple of decades: oh, this method looks like it has the side effect of farbuttling the warpcore... always? Well, if odd crobuncles desaturate, at least; do they? Better check all the crobuncle-handling code... which will pose its own challenges to understanding. Good documentation cuts this O(n) pointer-chasing down to O(1): e.g. knowing a function's contract and the contracts of the things it explicitly uses, the function's code should make sense with no further knowledge of the system. (Here, contracts saying is_even() and is_odd() work on natural numbers would tell us that both functions need to test for n==0.)
My only real rule is that comments should explain why code is there, not what it is doing or how it is doing it. Those things can change, and if they do the comments have to be maintained. The purpose the code exists in the first place shouldn't change.
the purpose of comments is to explain the context - the reason for the code; this, the programmer cannot know from mere code inspection. For example:
strangeSingleton.MoveLeft(1.06);
badlyNamedGroup.Ignite();
who knows what the heck this is for? but with a simple comment, all is revealed:
//when under attack, sidestep and retaliate with rocket bundles
strangeSingleton.MoveLeft(1.06);
badlyNamedGroup.Ignite();
seriously, comments are for the why, not the how, unless the how is unintuitive.
While I agree that code should be self-readable, I still see a lot of value in adding extensive comment blocks for explaining design decisions. For example "I did xyz instead of the common practice of abc because of this caveot ..." with a URL to a bug report or something.
I try to look at it as: If I'm dead and gone and someone straight out of college has to fix a bug here, what are they going to need to know?
In general I see comments used to explain poorly written code. Most code can be written in a way that would make comments redundant. Having said that I find myself leaving comments in code where the semantics aren't intuitive, such as calling into an API that has strange or unexpected behavior etc...
I also generally subscribe to the self-documenting code idea, so I think your developer friend gives good advice, and I won't repeat that, but there are definitely many situations where comments are necessary.
A lot of times I think it boils down to how close the implementation is to the types of ordinary or easy abstractions that code-readers in the future are going to be comfortable with or more generally to what degree the code tells the entire story. This will result in more or fewer comments depending on the type of programming language and project.
So, for example if you were using some kind of C-style pointer arithmetic in an unsafe C# code block, you shouldn't expect C# programmers to easily switch from C# code reading (which is probably typically more declarative or at least less about lower-level pointer manipulation) to be able to understand what your unsafe code is doing.
Another example is when you need to do some work deriving or researching an algorithm or equation or something that is not going to end up in your code but will be necessary to understand if anyone needs to modify your code significantly. You should document this somewhere and having at least a reference directly in the relevant code section will help a lot.
I don't think it matters how many or how few comments your code contains. If your code contains comments, they have to maintained, just like the rest of your code.
EDIT: That sounded a bit pompous, but I think that too many people forget that even the names of the variables, or the structures we use in the code, are all simply "tags" - they only have meaning to us, because our brains see a string of characters such as customerNumber and understand that it is a customer number. And while it's true that comments lack any "enforcement" by the compiler, they aren't so far removed. They are meant to convey meaning to another person, a human programmer that is reading the text of the program.
If the code is not clear without comments, first make the code a clearer statement of intent, then only add comments as needed.
Comments have their place, but primarily for cases where the code is unavoidably subtle or complex (inherent complexity is due to the nature of the problem being solved, not due to laziness or muddled thinking on the part of the programmer).
Requiring comments and "measuring productivity" in lines-of-code can lead to junk such as:
/*****
*
* Increase the value of variable i,
* but only up to the value of variable j.
*
*****/
if (i < j) {
++i;
} else {
i = j;
}
rather than the succinct (and clear to the appropriately-skilled programmer):
i = Math.min(j, i + 1);
YMMV
The vast majority of my commnets are at the class-level and method-level, and I like to describe the higher-level view instead of just args/return value. I'm especially careful to describe any "non-linearities" in the function (limits, corner cases, etc) that could trip up the unwary.
Typically I don't comment inside a method, except to mark "FIXME" items, or very occasionally some sort of "here be monsters" gotcha that I just can't seem to clean up, but I work very hard to avoid those. As Fowler says in Refactoring, comments tend to indicate smally code.
Comments are part of code, just like functions, variables and everything else - and if changing the related functionality the comment must also be updated (just like function calls need changing if function arguments change).
In general, when programming you should do things once in one place only.
Therefore, if what code does is explained by clear naming, no comment is needed - and this is of course always the goal - it's the cleanest and simplest way.
However, if further explanation is needed, I will add a comment, prefixed with INFO, NOTE, and similar...
An INFO: comment is for general information if someone is unfamiliar with this area.
A NOTE: comment is to alert of a potential oddity, such as a strange business rule / implementation.
If I specifically don't want people touching code, I might add a WARNING: or similar prefix.
What I don't use, and am specifically opposed to, are changelog-style comments - whether inline or at the head of the file - these comments belong in the version control software, not the sourcecode!
I prefer to use "Hansel and Gretel" type comments; little notes in the code as to why I'm doing it this way, or why some other way isn't appropriate. The next person to visit this code will probably need this info, and more often than not, that person will be me.
As a contractor I know that some people maintaining my code will be unfamiliar with the advanced features of ADO.Net I am using. Where appropriate, I add a brief comment about the intent of my code and a URL to an MSDN page that explains in more detail.
I remember learning C# and reading other people's code I was often frustrated by questions like, "which of the 9 meanings of the colon character does this one mean?" If you don't know the name of the feature, how do you look it up?! (Side note: This would be a good IDE feature: I select an operator or other token in the code, right click then shows me it's language part and feature name. C# needs this, VB less so.)
As for the "I don't comment my code because it is so clear and clean" crowd, I find sometimes they overestimate how clear their very clever code is. The thought that a complex algorithm is self-explanatory to someone other than the author is wishful thinking.
And I like #17 of 26's comment (empahsis added):
... reading the code will tell you exactly
what it is doing. However, the code is
incapable of telling you what it's
supposed to be doing.
I very very rarely comment. MY theory is if you have to comment it's because you're not doing things the best way possible. Like a "work around" is the only thing I would comment. Because they often don't make sense but there is a reason you are doing it so you need to explain.
Comments are a symptom of sub-par code IMO. I'm a firm believer in self documenting code. Most of my work can be easily translated, even by a layman, because of descriptive variable names, simple form, and accurate and many methods (IOW not having methods that do 5 different things).
Comments are part of a programmers toolbox and can be used and abused alike. It's not up to you, that other programmer, or anyone really to tell you that one tool is bad overall. There are places and times for everything, including comments.
I agree with most of what's been said here though, that code should be written so clear that it is self-descriptive and thus comments aren't needed, but sometimes that conflicts with the best/optimal implementation, although that could probably be solved with an appropriately named method.
I agree with the self-documenting code theory, if I can't tell what a peice of code is doing simply by reading it then it probably needs refactoring, however there are some exceptions to this, I'll add a comment if:
I'm doing something that you don't
normally see
There are major side effects or implementation details that aren't obvious, or won't be next year
I need to remember to implement
something although I prefer an
exception in these cases.
If I'm forced to go do something else and I'm having good ideas, or a difficult time with the code, then I'll add sufficient comments to tmporarily preserve my mental state
Most of the time I find that the best comment is the function or method name I am currently coding in. All other comments (except for the reasons your friend mentioned - I agree with them) feel superfluous.
So in this instance commenting feels like overkill:
/*
* this function adds two integers
*/
int add(int x, int y)
{
// add x to y and return it
return x + y;
}
because the code is self-describing. There is no need to comment this kind of thing as the name of the function clearly indicates what it does and the return statement is pretty clear as well. You would be surprised how clear your code becomes when you break it down into tiny functions like this.
When programming in C, I'll use multi-line comments in header files to describe the API, eg parameters and return value of functions, configuration macros etc...
In source files, I'll stick to single-line comments which explain the purpose of non-self-evident pieces of code or to sub-section a function which can't be refactored to smaller ones in a sane way. Here's an example of my style of commenting in source files.
If you ever need more than a few lines of comments to explain what a given piece of code does, you should seriously consider if what you're doing can't be done in a better way...
I write comments that describe the purpose of a function or method and the results it returns in adequate detail. I don't write many inline code comments because I believe my function and variable naming to be adequate to understand what is going on.
I develop on a lot of legacy PHP systems that are absolutely terribly written. I wish the original developer would have left some type of comments in the code to describe what was going on in those systems. If you're going to write indecipherable or bad code that someone else will read eventually, you should comment it.
Also, if I am doing something a particular way that doesn't look right at first glance, but I know it is because the code in question is a workaround for a platform or something like that, then I'll comment with a WARNING comment.
Sometimes code does exactly what it needs to do, but is kind of complicated and wouldn't be immediately obvious the first time someone else looked at it. In this case, I'll add a short inline comment describing what the code is intended to do.
I also try to give methods and classes documentation headers, which is good for intellisense and auto-generated documentation. I actually have a bad habit of leaving 90% of my methods and classes undocumented. You don't have time to document things when you're in the middle of coding and everything is changing constantly. Then when you're done you don't feel like going back and finding all the new stuff and documenting it. It's probably good to go back every month or so and just write a bunch of documentation.
Here's my view (based on several years of doctoral research):
There's a huge difference between commenting functions (sort of a black box use, like JavaDocs), and commenting actual code for someone who will read the code ("internal commenting").
Most "well written" code shouldn't require much "internal commenting" because if it performs a lot then it should be broken into enough function calls. The functionality for each of these calls is then captured in the function name and in the function comments.
Now, function comments are indeed the problem, and in some ways your friend is right, that for most code there is no economical incentive for complete specifications the way that popular APIs are documented. The important thing here is to identify what are the "directives": directives are those information pieces that directly affect clients, and require some direct action (and are often unexpected). For example, X must be invoked before Y, don't call this from outside a UI thread, be aware that this has a certain side effect, etc. These are the things that are really important to capture.
Since most people never read full function documentations, and skim what they do read, you can actually increase the chances of awareness by capturing only the directives rather than the whole description.
I comment as much as needed - then, as much as I will need it a year later.
We add comments which provide the API reference documentation for all public classes / methods / properties / etc... This is well worth the effort because XML Documentation in C# has the nice effect of providing IntelliSense to users of these public APIs. .NET 4.0's code contracts will enable us to improve further on this practice.
As a general rule, we do not document internal implementations as we write code unless we are doing something non-obvious. The theory is that while we are writing new implementations, things are changing and comments are more likely than not to be wrong when the dust settles.
When we go back in to work on an existing piece of code, we add comments when we realize that it's taking some thought to figure out what in the heck is going on. This way, we wind up with comments where they are more likely to be correct (because the code is more stable) and where they are more likely to be useful (if I'm coming back to a piece of code today, it seems more likely that I might come back to it again tomorrow).
My approach:
Comments bridge the gap between context / real world and code. Therefore, each and every single line is commented, in correct English language.
I DO reject code that doesn't observe this rule in the strictest possible sense.
Usage of well formatted XML - comments is self-evident.
Sloppy commenting means sloppy code!
Here's how I wrote code:
if (hotel.isFull()) {
print("We're fully booked");
} else {
Guest guest = promptGuest();
hotel.checkIn(guest);
}
here's a few comments that I might write for that code:
// if hotel is full, refuse checkin, otherwise
// prompt the user for the guest info, and check in the guest.
If your code reads like a prose, there is no sense in writing comments that simply repeats what the code reads since the mental processing needed for reading the code and the comments would be almost equal; and if you read the comments first, you will still need to read the code as well.
On the other hand, there are situations where it is impossible or extremely difficult to make the code looks like a prose; that's where comment could patch in.

Resources