What's an example of Ruby code that's "too clever"? [closed] - ruby

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I was having a discussion with some programmer friends who said that they see Ruby programmers (in particular) producing a lot of code that's "too clever". So I'm wondering what would that look like? I'm referring to the unnecessary use of an obscure language feature in a context in which something straightforward would have worked just as well or better. Know any good Ruby examples of this?

After giving a straight answer to your question, I'd like to also dispute the premise; whenever a group of programmers characterizes the users of another language in this way, the odds are that they are telling you more about themselves than about the community they are describing.
You could, for example, accuse c programmers of being too obsessed with low level details, or haskell programmers with being blinded by their desire for functional purity; perl mongers for brevity, etc. But you would, IMHO, by getting the causality backwards when you do so.
When I want to write a program that is best expressed in a certain style, I try to choose a language that supports that style. Sometimes you want a tool that lets you do unusual things, and for such a task having a language such as ruby is as valuable as having mathematica for math or javascript for browser manipulation in your toolkit. If I want to play with typography I hop into postscript because that's what it's best at.
It's like saying "Have you ever noticed that people who use power drills are always poking holes in things?" It's true, but it kind of misses the point.

class Tree
def initialize*d;#d,=d;end
def to_s;#l||#r?"<#{#d},<#{#l}>,<#{#r}>>":#d;end
def total;(#d.is_a?(Numeric)?#d:0)+(#l?#l.total: 0)+(#r?#r.total: 0);end
def insert d
alias g instance_variable_get
p=lambda{|s,o|d.to_s.send(o,#d.to_s)&&
(g(s).nil??instance_variable_set(s,Tree.new(d)):g(s).insert(d))}
#d?p[:#l,:<]||p[:#r,:>]:#d=d
end
end

The double-bang: !!something
I'm not gonna write what it does. Forget that you ever saw this syntax.

Any use of metaprogramming without having thought damn hard about whether there's a better way to acheive this using the normal, non-'meta' idioms of the language, I tend to find annoying.
An obsession with "DRY" (don't repeat yourself) where some fiendish piece of metaprogramming spaghetti is invoked to avoid repeating yourself, say, twice in a simple and actually-more-straightforward-and-readable-than-the-alternative fashion.
Any use of 'eval' in particular. As metaprogramming goes, this one should be your absolute last resort after trying everything else. eg a lot of rubyists appear not to have heard of Class#define_method.

The output phase of yaml.rb; that's why I co-authored zaml.rb. The standard yaml version does all sorts of metaprogramming (it was originally written by why-the-lucky-stiff, who I generally admire) but by replacing it with a straight forward hierarchical version that directly maps to the class tree we were able to eliminate several O(n^3) cases, resulting in a factor of ten speedup for cases of interest, fix several bugs, and do so in a fraction of the code.
Plus, even people who aren't ruby gurus can see what it does.

Many of the examples in this article would seem to qualify:
21 Ruby Tricks You Should Be Using In Your Own Code.
The title of the article was a bit of a giveaway, given that it reads "Should" instead of "Should Not". Code "should" be transparent. Code "should not" be tricky.

I'm not sure if this qualifies as "too clever," but I have seen code that made me wonder if the author was either a genius or an idiot. One developer seemed to have a rule that no method should have more than two lines of code. That pushed the call stack very deep and made debugging rather difficult. The upside is that his overall design was very abstract and even elegant from a distance.

Cucumber (or RSpec Stories)
Quoted from the above RSpec Stories link:
Based around plain text descriptions
of application behaviour, it lets you
write integration tests with good
reuse and good diagnostic reporting.
For example, here's a story I wrote to
check the login process.
Story: login as an existing user
As an unauthenticated user
I want to log in to Expectnation
So I can see my account details
Scenario: login details are correct
Given an event provider
And my test#example.org account
When I log in with email test#example.org and password foofoo
Then I will be logged in
And I will be shown the account page
The words such as "Given", "When" and
"Then" are cues to the story runner to
execute some code. Behind the story
sits a collection of steps. Here's a
couple of steps from this test:
Given "my $email account" do |email|
#user = find_or_create_user_by_email({:email => email,
:password => 'foofoo',
:password_confirmation => 'foofoo'})
end
When "I log in with email $email and password $password" do |email, password|
post '/user/account/authenticate',
:user => {:email => email, :password => password}
end
Notice how a clever bit of string
matching allows you to pass parameters
from the story prose.
With a small bit of bolting together, the prose stories are then run as code and the tests executed.

It depends. (I love "it depends" questions)
It depends on the knowledge of the writer and reader. I used to think the use of Symbol#to_proc in Rails was unnecessarily arcane, for example, preferring
a.map { |e| e.downcase }
to
a.map(&:downcase)
Now I'm happy when I read it, although I still don't think to write it.
There are areas of libraries (Rails and others) where I have felt excessive and self-indulgent metaprogramming may have occurred but again the division between "too clever" and "really very clever indeed" is often paper-thin. DSLs are a good area: the ways in which "macros" are made available within classes (think of all that declarative goodness in things like ActiveRecord::Base or ActionController::Base) is very hard for a relative novice to understand and would probably seem like over-cleverness. It did to me. Now I find myself referencing the same code for guidance as I implement similar capabilities.

method_missing can be abused and it's one of those things that may cause you to pull your hair out when you have to fix a bug 3 months after you've written code.

Take a look at the source of Markaby. Insanity.

You shouldn't have to go from method to method to method to try to figure out what in the hell something is doing, for the sole purpose of not repeating a few lines of code. Being too focused on the LOC count and ultra-skinny methods might feel cool at the time but is time-consuming for someone else trying to debug or follow the code (and that someone may be you months later).

compare:
if MODELS.keys.inject(true) {|b, klass| b and klass.constantize.columns.map(&:name).include? association.options [:foreign_key]} then
# ...
end
1 line (if), 132 chars, 132 avg len, 22.9 flog
vs
fk = association.options[:foreign_key]
columns = MODELS.keys.map { |key| key.constantize.columns.map { |c| c.name } }
if columns.all? {|column| column.include? fk} then
# ...
end
4 lines, 172 chars, 43 avg chars, 15.9 flog
much faster too.
Original author actually argued maintainability for the first version.

Recently uncovered this monster:
def id
unless defined?(#id)
#id = if id = local_body.to_s[/(?:#\s*|#[[:punct:]]?)#{URL_REGEX}/,1]
id.to_i
end
end
#id
end
Not that I disagree with caching a calculation, it could just be far more clear.

Related

Use parentheses to enclose a block in Ruby?

I have accidentally discovered the Ruby idiom of ||=(),
as in:
def app_logger
#app_logger ||= (
logfile = File.open(::Rails.root.join(LOG_FILE), 'a')
logfile.sync = true
AppLogger.new(logfile)
)
end
I tried to use {} instead of (), but it didn't work. I thought {} is meant to enclose a block.
Is this a known idiom? Is it a good style?
I haven't found much documentation on this kind of use of parentheses. Any pointers would be helpful.
Please note this post is about the use of () this way, not the use of ||=. There are many posts about this latter idiom already.
Like a lot of things in Ruby that can be done, many of them shouldn't be done and this is one of them.
Using brackets to group code when there are already other facilities is potentially confusing and almost certainly contrary to many coding style guides. If I saw this in code I was managing, I'd immediately fix it.
What's best is to use the begin/end markers to make it completely clear what's going on:
def app_logger
#app_logger ||= begin
logfile = File.open(::Rails.root.join(LOG_FILE), 'a')
logfile.sync = true
AppLogger.new(logfile)
end
end
A lot of things in Ruby evaluate down to a single value, and the contents of (...) are apparently one of those things.
Your other example of foo ||= (a=10; a+10) being "nicer" is also rather controversial. This does two assignment and addition, but only conditionally. This one would almost always be better written in long-form with begin/end to make it clear that a+10 is the result.
From a style perspective, hiding the "important" part of that, the a+10, at the end of a line is bad, it can be overlooked. Having it as the last line makes it abundantly clear. This is also why having if statements tacked on the end of long lines is also bad, it hides that the line is only conditionally executed.
Concern for brevity is always trumped by concerns of readability. Saving a couple of bytes on disk is not going to help you one bit when, due to someone misreading your code, they introduce a crippling bug.
That someone could be you in the future when you get caught up in your own cleverness. It's happened to all of us at some point.
In addition to #tadman's answer, I'd counter that writing the method in a more Ruby-like way maintains the readability and keeps it nice and tight:
def app_logger
return #app_logger if #app_logger
logfile = File.open(::Rails.root.join(LOG_FILE), 'a')
logfile.sync = true
#app_logger = AppLogger.new(logfile)
end
For my intent and rationale in writing it this way, see my comment to this answer in response to #fotanus's comment.
Our brains become conditioned to see patterns as we learn to program and learn new languages. Whether the patterns are in hexcode emitted by a debugger or C, Perl, Python, Java or Ruby, we still see patterns. When we're used to seeing them we tend to want to change code to resemble those familiar constructs. That's why Perl and C programmers tend to write Java, Python and Ruby like C and Perl at first. The "beauty is in the eye of the beholder", but that same beauty is a weed when it's out of place, just as wild-flowers are out of place in formal gardens or the middle of a fairway. We should write in the vernacular of the programming language, because that is the way of speaking expected by those who live in that land.
Something to remember is, though we, personally, might be a code-studly monster who can reduce code from multiple lines down to a single character, what will keep me in a job is the ability to write code other people can understand, without necessarily having to be of the same caliber. Code crunching invariably reaches a point of diminishing returns, well before it has been reduced to its minimal size, especially when that code is running in a production environment and is being maintained by junior programmers and there is a need to extend or debug it, and the clock is ticking. Minutes = dollars where I work, and the dollars have really big multipliers, which causes the walls to leak managers like cockroaches when a bug-bomb goes off. (Oh... did I call managers cockroaches? Whatever.)
Being called the morning after an event and being thanked for writing code that made it easy for them to debug/fix or amend/extend is a whole lot better than being called at 2:45AM and asked to get online to help. I value my sleep and that of my coding-partners and push for maintainable code as our #1 priority.
That's my $0.02.
Parentheses are really ugly for in my opinion. Why you don't delegate logic behind creating logger to method?
def app_logger
#app_logger ||= instantiate_app_logger
end
def instantiate_app_logger
logfile = File.open(::Rails.root.join(LOG_FILE), 'a')
logfile.sync = true
AppLogger.new(logfile)
end
But I am afraid that it breaks some OOP, your code also. Why AppLoger class can't open file based on passed path and turning syncing on? It will be so much cleaner.

When are comments "too much", and when are they not enough? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
There is an on-going minor debate where I work about the efficacy of comments within code. One of the leads instructed his developers not to use comments as they are too "old fashioned", and a couple of other developers indicated that they never use comments because they feel all they do is clutter up the code.
I have always pretty much adhered to the practice that I comment the top of every file with a basic comment block, comment each method/class/etc definition, and then I comment any place in the code where I think I might come back in 6 months and think to myself, "WTF".
Clearly this is subjective, but I'm curious to know if anyone has any really good arguments or experiences for one way or the other.
I'll just point you to Jeff Atwood's wonderful post on the subject, which hits the nail right on the head.
In all my career, I have never come across that wonderful beast "self-documenting code". Maybe I've just been unlucky, but I'm beginning to suspect it doesn't actually exist.
Every once in a while I run across code that is so elegantly partitioned, has such compellingly obvious method, field and variable names that everything I need to know is obvious from the code.
In the general case, only really great code gurus write such code. The rest of us cobble together something that works.
If you're a really great code guru, don't bother sullying your divine code with superfluous comments.
If you barely know what you're doing, be careful to document your blundering attempts so others can try to salvage the mess.
If you're average (and most of us are, sort of by definition) then leave some hints in comments for yourself and others to make things easier at maintenance time, but don't insult anyone's intelligence and waste space by documenting the REALLY obvious. Ideally, your comments should describe your code at a meta-level, indicating not what you're doing but why. Also how, if you're doing something unusual or tricky.
"One of the leads instructed his developers not to use comments as they are too "old fashioned", and a couple of other developers indicated that they never use comments because they feel all they do is clutter up the code."
If I ever heard a developer I was working with talk like this, I would correct them. If I didn't have the necessary rank to correct them, I would leave the job.
Very clearly written code, with good identifiers -- the stuff sometimes referred to as 'self-documenting' -- does a fine job of illustrating what the code is doing. That's fine as far as it goes. The job of the comments is to explain why.
This topic tends to be discussed a lot, but here are my US$0.02 on the subject:
I would rather see too many comments than not enough. Failing at anything, you can always delete superfluous comments from the code; however, you can not derive meaning from them if there are none there to begin with.
I've heard some developers argue that other developers that "over document" (definitions of this vary by person) their code are not good developers. While saying that you are updating a counter might be a sign that you don't know what you are doing, having a clear guide to some of the business logic sitting there in the middle of the method you are working on can be quite useful.
While there are some excellent developers out there that can write extremely clear code that doesn't require comments, most developers aren't that good or they spend more time writing the code to be self documenting than they would if they had just included a couple comments.
You don't know the skill level of the next person to read your code and if the language constructs you are using might be confusing it is usually a good idea to include a comment that someone can use to Google a tutorial with.
The problem with comments is that they tend to stay long after the code that was commented has changed or even been deleted.
As a rule of thumb I'd only comment public API and difficult to understand algorithms.
Don't use comments to explain what you did - that's what the code is for, use comments to explain why you did it.
Diomidis Spinellis just wrote a nice column for IEEE column (quoted on his blog), outlining the problem, and a few solutions:
When commenting, we’re always a couple
of keystrokes away from disaster:
restating the code’s function in
English. And that’s when problems
start.
One should write comment before the code or before the function so that next time looking at the function he/she can know immediately what was the purpose of that code.
It is happened to me many times that I write the code and then forgot the purpose of that. so, I make habit of writing comment before code.
What I'd like to see in the comments is an explanation why a method that is obvious and much simpler than the method used in the code doesn't work.

Designing a poker parser in Ruby

I'm writing a small program in Ruby to parse a hand history log from a poker site.
The log is split over several lines and looks a bit like this:
Table 123456 NL Hold'em $1/$2
5 Players
Seat 3 is the button
Seat 1: randomGuy112 $152.56
Seat 2: randomGirl99 $200
Seat 3: PokerPro $357.12
Seat 4: FishCake556 $57.19
Seat 6: MooMoo $188.98
Dealt to MooMoo [Ah, Ks]
randomGuy112 folds
randomGirl99 raises to $7
etc.. etc..
I want to summarise this information in an object which then might, for example,
render it differently or save it to database.
When I originally thought of this problem I thought I'd just have one realativly straight forward class with a number of regexes and several if/else statements. I then realised this could turn into quite a large method and potentially be a nightmare to debug/maintain. Keep in mind it needs to loop at each stage of the game (preflop,flop etc) to collect player's actions.
I also want to tackle this with a TDD approach, but the 'one long method' way means that the tests for with checking later input will kind of rely on earlier tests.
I'm quite new to Ruby and havn't yet clicked on the 'Ruby way' to do things. I'm catching myself writing C# code
in a different language.
Can you give me some pointers on how to design the parser so it isn't one huge mess of if/else statements and more testable?
Use Treetop
It does look like you are on the borderline between what ad hoc string matching and RE's are good for, and what requires an actual parser.
There is nothing wrong with handwritten parsers, and as long as you keep your methods short, without a lot of complexity in any given one, it's OK to have as many if statements in total as the parser requires.
I'm not sure 10 lines with incomprehensible regular expressions is any better than 30 lines of nice looking code.
Now, Ruby does have an advanced PEG parser generator. I think in this case I wouldn't worry about whether it was overkill, I would just go ahead and use Treetop.
State Machine, anyone?
At any point in the play of a poker hand there is a clearly-defined set of possible next actions. I'd think you could encapsulate them into a state machine. There are a few around, amongst which (no recommendations, I'm afraid - not enough experience with any) are
Alter Ego (updated July this year)
ruby-state-machine (seems also to be alive)
statemachine (looks a bit stale)
You can checkout this open source poker game hand parser
It looks like they created a hash of regular expressions and then they probably iterate over the regex data structures. It is a more simple machine than a parser and probably a more light weight approach.
I wrote hand history parser for PokerStars log files https://github.com/malikbakt/pokerstars
You may want to look at: StringScanner.
I have two different pointers for you, which will point you to the solution, on how to write code in the ruby way.
Get a ruby book. The ruby book will have a lot of examples on how to write code in the ruby way. From my personal expirience I can recommend you the pixake(is this spelled right?) book: http://www.ruby-doc.org/docs/ProgrammingRuby/html/index.html
Read existing ruby code. You seem to know enough ruby to write code? Then you should certainly be able to read existing code. I assume you already have installed ruby on your system. If so, you will find plenty of sourcecode on your harddrive. If not just use the internet.
I'd recommend the book Refactoring by Martin Fowler (available in both dead-tree and electronic formats, IIRC). He covers object-oriented remedies for exactly the design problems you're asking about, all in a test-driven context. This is one of those books that everyone in the profession should read.

What is your personal approach/take on commenting?

Duplicate
What are your hard rules about commenting?
A Developer I work with had some things to say about commenting that were interesting to me (see below). What is your personal approach/take on commenting?
"I don't add comments to code unless
its a simple heading or there's a
platform-bug or a necessary
work-around that isn't obvious. Code
can change and comments may become
misleading. Code should be
self-documenting in its use of
descriptive names and its logical
organization - and its solutions
should be the cleanest/simplest way
to perform a given task. If a
programmer can't tell what a program
does by only reading the code, then
he's not ready to alter it.
Commenting tends to be a crutch for
writing something complex or
non-obvious - my goal is to always
write clean and simple code."
"I think there a few camps when it
comes to commenting, the
enterprisey-type who think they're
writing an API and some grand
code-library that will be used for
generations to come, the
craftsman-like programmer that thinks
code says what it does clearer than a
comment could, and novices that write
verbose/unclear code so as to need to
leave notes to themselves as to why
they did something."
There's a tragic flaw with the "self-documenting code" theory. Yes, reading the code will tell you exactly what it is doing. However, the code is incapable of telling you what it's supposed to be doing.
I think it's safe to say that all bugs are caused when code is not doing what it's supposed to be doing :). So if we add some key comments to provide maintainers with enough information to know what a piece of code is supposed to be doing, then we have given them the ability to fix a whole lot of bugs.
That leaves us with the question of how many comments to put in. If you put in too many comments, things become tedious to maintain and the comments will inevitably be out of date with the code. If you put in too few, then they're not particularly useful.
I've found regular comments to be most useful in the following places:
1) A brief description at the top of a .h or .cpp file for a class explaining the purpose of the class. This helps give maintainers a quick overview without having to sift through all of the code.
2) A comment block before the implementation of a non-trivial function explaining the purpose of it and detailing its expected inputs, potential outputs, and any oddities to expect when calling the function. This saves future maintainers from having to decipher entire functions to figure these things out.
Other than that, I tend to comment anything that might appear confusing or odd to someone. For example: "This array is 1 based instead of 0 based because of blah blah".
Well written, well placed comments are invaluable. Bad comments are often worse than no comments. To me, lack of any comments at all indicates laziness and/or arrogance on the part of the author of the code. No matter how obvious it is to you what the code is doing or how fantastic your code is, it's a challenging task to come into a body of code cold and figure out what the heck is going on. Well done comments can make a world of difference getting someone up to speed on existing code.
I've always liked Refactoring's take on commenting:
The reason we mention comments here is that comments often are used as a deodorant. It's surprising how often you look at thickly commented code and notice that the comments are there because the code is bad.
Comments lead us to bad code that has all the rotten whiffs we've discussed in the rest of this chapter. Our first action is to remove the bad smells by refactoring. When we're finished, we often find that the comments are superfluous.
As controversial as that is, it's rings true for the code I've read. To be fair, Fowler isn't saying to never comment, but to think about the state of your code before you do.
You need documentation (in some form; not always comments) for a local understanding of the code. Code by itself tells you what it does, if you read all of it and can keep it all in mind. (More on this below.) Comments are best for informal or semiformal documentation.
Many people say comments are a code smell, replaceable by refactoring, better naming, and tests. While this is true of bad comments (which are legion), it's easy to jump to concluding it's always so, and hallelujah, no more comments. This puts all the burden of local documentation -- too much of it, I think -- on naming and tests.
Document the contract of each function and, for each type of object, what it represents and any constraints on a valid representation (technically, the abstraction function and representation invariant). Use executable, testable documentation where practical (doctests, unit tests, assertions), but also write short comments giving the gist where helpful. (Where tests take the form of examples, they're incomplete; where they're complete, precise contracts, they can be as much work to grok as the code itself.) Write top-level comments for each module and each project; these can explain conventions that keep all your other comments (and code) short. (This supports naming-as-documentation: with conventions established, and a place we can expect to find subtleties noted, we can be confident more often that the names tell all we need to know.) Longer, stylized, irritatingly redundant Javadocs have their uses, but helped generate the backlash.
(For instance, this:
Perform an n-fold frobulation.
#param n the number of times to frobulate
#param x the x-coordinate of the center of frobulation
#param y the y-coordinate of the center of frobulation
#param z the z-coordinate of the center of frobulation
could be like "Frobulate n times around the center (x,y,z)." Comments don't have to be a chore to read and write.)
I don't always do as I say here; it depends on how much I value the code and who I expect to read it. But learning how to write this way made me a better programmer even when cutting corners.
Back on the claim that we document for the sake of local understanding: what does this function do?
def is_even(n): return is_odd(n-1)
Tests if an integer is even? If is_odd() tests if an integer is odd, then yes, that works. Suppose we had this:
def is_odd(n): return is_even(n-1)
The same reasoning says this is_odd() tests if an integer is odd. Put them together, of course, and neither works, even though each works if the other does. Change it a bit and we'd have code that does work, but only for natural numbers, while still locally looking like it works for integers. In microcosm that's what understanding a codebase is like: tracing dependencies around in circles to try to reverse-engineer assumptions the author could have explained in a line or two if they'd bothered. I hate the expense of spirit thoughtless coders have put me to this way over the past couple of decades: oh, this method looks like it has the side effect of farbuttling the warpcore... always? Well, if odd crobuncles desaturate, at least; do they? Better check all the crobuncle-handling code... which will pose its own challenges to understanding. Good documentation cuts this O(n) pointer-chasing down to O(1): e.g. knowing a function's contract and the contracts of the things it explicitly uses, the function's code should make sense with no further knowledge of the system. (Here, contracts saying is_even() and is_odd() work on natural numbers would tell us that both functions need to test for n==0.)
My only real rule is that comments should explain why code is there, not what it is doing or how it is doing it. Those things can change, and if they do the comments have to be maintained. The purpose the code exists in the first place shouldn't change.
the purpose of comments is to explain the context - the reason for the code; this, the programmer cannot know from mere code inspection. For example:
strangeSingleton.MoveLeft(1.06);
badlyNamedGroup.Ignite();
who knows what the heck this is for? but with a simple comment, all is revealed:
//when under attack, sidestep and retaliate with rocket bundles
strangeSingleton.MoveLeft(1.06);
badlyNamedGroup.Ignite();
seriously, comments are for the why, not the how, unless the how is unintuitive.
While I agree that code should be self-readable, I still see a lot of value in adding extensive comment blocks for explaining design decisions. For example "I did xyz instead of the common practice of abc because of this caveot ..." with a URL to a bug report or something.
I try to look at it as: If I'm dead and gone and someone straight out of college has to fix a bug here, what are they going to need to know?
In general I see comments used to explain poorly written code. Most code can be written in a way that would make comments redundant. Having said that I find myself leaving comments in code where the semantics aren't intuitive, such as calling into an API that has strange or unexpected behavior etc...
I also generally subscribe to the self-documenting code idea, so I think your developer friend gives good advice, and I won't repeat that, but there are definitely many situations where comments are necessary.
A lot of times I think it boils down to how close the implementation is to the types of ordinary or easy abstractions that code-readers in the future are going to be comfortable with or more generally to what degree the code tells the entire story. This will result in more or fewer comments depending on the type of programming language and project.
So, for example if you were using some kind of C-style pointer arithmetic in an unsafe C# code block, you shouldn't expect C# programmers to easily switch from C# code reading (which is probably typically more declarative or at least less about lower-level pointer manipulation) to be able to understand what your unsafe code is doing.
Another example is when you need to do some work deriving or researching an algorithm or equation or something that is not going to end up in your code but will be necessary to understand if anyone needs to modify your code significantly. You should document this somewhere and having at least a reference directly in the relevant code section will help a lot.
I don't think it matters how many or how few comments your code contains. If your code contains comments, they have to maintained, just like the rest of your code.
EDIT: That sounded a bit pompous, but I think that too many people forget that even the names of the variables, or the structures we use in the code, are all simply "tags" - they only have meaning to us, because our brains see a string of characters such as customerNumber and understand that it is a customer number. And while it's true that comments lack any "enforcement" by the compiler, they aren't so far removed. They are meant to convey meaning to another person, a human programmer that is reading the text of the program.
If the code is not clear without comments, first make the code a clearer statement of intent, then only add comments as needed.
Comments have their place, but primarily for cases where the code is unavoidably subtle or complex (inherent complexity is due to the nature of the problem being solved, not due to laziness or muddled thinking on the part of the programmer).
Requiring comments and "measuring productivity" in lines-of-code can lead to junk such as:
/*****
*
* Increase the value of variable i,
* but only up to the value of variable j.
*
*****/
if (i < j) {
++i;
} else {
i = j;
}
rather than the succinct (and clear to the appropriately-skilled programmer):
i = Math.min(j, i + 1);
YMMV
The vast majority of my commnets are at the class-level and method-level, and I like to describe the higher-level view instead of just args/return value. I'm especially careful to describe any "non-linearities" in the function (limits, corner cases, etc) that could trip up the unwary.
Typically I don't comment inside a method, except to mark "FIXME" items, or very occasionally some sort of "here be monsters" gotcha that I just can't seem to clean up, but I work very hard to avoid those. As Fowler says in Refactoring, comments tend to indicate smally code.
Comments are part of code, just like functions, variables and everything else - and if changing the related functionality the comment must also be updated (just like function calls need changing if function arguments change).
In general, when programming you should do things once in one place only.
Therefore, if what code does is explained by clear naming, no comment is needed - and this is of course always the goal - it's the cleanest and simplest way.
However, if further explanation is needed, I will add a comment, prefixed with INFO, NOTE, and similar...
An INFO: comment is for general information if someone is unfamiliar with this area.
A NOTE: comment is to alert of a potential oddity, such as a strange business rule / implementation.
If I specifically don't want people touching code, I might add a WARNING: or similar prefix.
What I don't use, and am specifically opposed to, are changelog-style comments - whether inline or at the head of the file - these comments belong in the version control software, not the sourcecode!
I prefer to use "Hansel and Gretel" type comments; little notes in the code as to why I'm doing it this way, or why some other way isn't appropriate. The next person to visit this code will probably need this info, and more often than not, that person will be me.
As a contractor I know that some people maintaining my code will be unfamiliar with the advanced features of ADO.Net I am using. Where appropriate, I add a brief comment about the intent of my code and a URL to an MSDN page that explains in more detail.
I remember learning C# and reading other people's code I was often frustrated by questions like, "which of the 9 meanings of the colon character does this one mean?" If you don't know the name of the feature, how do you look it up?! (Side note: This would be a good IDE feature: I select an operator or other token in the code, right click then shows me it's language part and feature name. C# needs this, VB less so.)
As for the "I don't comment my code because it is so clear and clean" crowd, I find sometimes they overestimate how clear their very clever code is. The thought that a complex algorithm is self-explanatory to someone other than the author is wishful thinking.
And I like #17 of 26's comment (empahsis added):
... reading the code will tell you exactly
what it is doing. However, the code is
incapable of telling you what it's
supposed to be doing.
I very very rarely comment. MY theory is if you have to comment it's because you're not doing things the best way possible. Like a "work around" is the only thing I would comment. Because they often don't make sense but there is a reason you are doing it so you need to explain.
Comments are a symptom of sub-par code IMO. I'm a firm believer in self documenting code. Most of my work can be easily translated, even by a layman, because of descriptive variable names, simple form, and accurate and many methods (IOW not having methods that do 5 different things).
Comments are part of a programmers toolbox and can be used and abused alike. It's not up to you, that other programmer, or anyone really to tell you that one tool is bad overall. There are places and times for everything, including comments.
I agree with most of what's been said here though, that code should be written so clear that it is self-descriptive and thus comments aren't needed, but sometimes that conflicts with the best/optimal implementation, although that could probably be solved with an appropriately named method.
I agree with the self-documenting code theory, if I can't tell what a peice of code is doing simply by reading it then it probably needs refactoring, however there are some exceptions to this, I'll add a comment if:
I'm doing something that you don't
normally see
There are major side effects or implementation details that aren't obvious, or won't be next year
I need to remember to implement
something although I prefer an
exception in these cases.
If I'm forced to go do something else and I'm having good ideas, or a difficult time with the code, then I'll add sufficient comments to tmporarily preserve my mental state
Most of the time I find that the best comment is the function or method name I am currently coding in. All other comments (except for the reasons your friend mentioned - I agree with them) feel superfluous.
So in this instance commenting feels like overkill:
/*
* this function adds two integers
*/
int add(int x, int y)
{
// add x to y and return it
return x + y;
}
because the code is self-describing. There is no need to comment this kind of thing as the name of the function clearly indicates what it does and the return statement is pretty clear as well. You would be surprised how clear your code becomes when you break it down into tiny functions like this.
When programming in C, I'll use multi-line comments in header files to describe the API, eg parameters and return value of functions, configuration macros etc...
In source files, I'll stick to single-line comments which explain the purpose of non-self-evident pieces of code or to sub-section a function which can't be refactored to smaller ones in a sane way. Here's an example of my style of commenting in source files.
If you ever need more than a few lines of comments to explain what a given piece of code does, you should seriously consider if what you're doing can't be done in a better way...
I write comments that describe the purpose of a function or method and the results it returns in adequate detail. I don't write many inline code comments because I believe my function and variable naming to be adequate to understand what is going on.
I develop on a lot of legacy PHP systems that are absolutely terribly written. I wish the original developer would have left some type of comments in the code to describe what was going on in those systems. If you're going to write indecipherable or bad code that someone else will read eventually, you should comment it.
Also, if I am doing something a particular way that doesn't look right at first glance, but I know it is because the code in question is a workaround for a platform or something like that, then I'll comment with a WARNING comment.
Sometimes code does exactly what it needs to do, but is kind of complicated and wouldn't be immediately obvious the first time someone else looked at it. In this case, I'll add a short inline comment describing what the code is intended to do.
I also try to give methods and classes documentation headers, which is good for intellisense and auto-generated documentation. I actually have a bad habit of leaving 90% of my methods and classes undocumented. You don't have time to document things when you're in the middle of coding and everything is changing constantly. Then when you're done you don't feel like going back and finding all the new stuff and documenting it. It's probably good to go back every month or so and just write a bunch of documentation.
Here's my view (based on several years of doctoral research):
There's a huge difference between commenting functions (sort of a black box use, like JavaDocs), and commenting actual code for someone who will read the code ("internal commenting").
Most "well written" code shouldn't require much "internal commenting" because if it performs a lot then it should be broken into enough function calls. The functionality for each of these calls is then captured in the function name and in the function comments.
Now, function comments are indeed the problem, and in some ways your friend is right, that for most code there is no economical incentive for complete specifications the way that popular APIs are documented. The important thing here is to identify what are the "directives": directives are those information pieces that directly affect clients, and require some direct action (and are often unexpected). For example, X must be invoked before Y, don't call this from outside a UI thread, be aware that this has a certain side effect, etc. These are the things that are really important to capture.
Since most people never read full function documentations, and skim what they do read, you can actually increase the chances of awareness by capturing only the directives rather than the whole description.
I comment as much as needed - then, as much as I will need it a year later.
We add comments which provide the API reference documentation for all public classes / methods / properties / etc... This is well worth the effort because XML Documentation in C# has the nice effect of providing IntelliSense to users of these public APIs. .NET 4.0's code contracts will enable us to improve further on this practice.
As a general rule, we do not document internal implementations as we write code unless we are doing something non-obvious. The theory is that while we are writing new implementations, things are changing and comments are more likely than not to be wrong when the dust settles.
When we go back in to work on an existing piece of code, we add comments when we realize that it's taking some thought to figure out what in the heck is going on. This way, we wind up with comments where they are more likely to be correct (because the code is more stable) and where they are more likely to be useful (if I'm coming back to a piece of code today, it seems more likely that I might come back to it again tomorrow).
My approach:
Comments bridge the gap between context / real world and code. Therefore, each and every single line is commented, in correct English language.
I DO reject code that doesn't observe this rule in the strictest possible sense.
Usage of well formatted XML - comments is self-evident.
Sloppy commenting means sloppy code!
Here's how I wrote code:
if (hotel.isFull()) {
print("We're fully booked");
} else {
Guest guest = promptGuest();
hotel.checkIn(guest);
}
here's a few comments that I might write for that code:
// if hotel is full, refuse checkin, otherwise
// prompt the user for the guest info, and check in the guest.
If your code reads like a prose, there is no sense in writing comments that simply repeats what the code reads since the mental processing needed for reading the code and the comments would be almost equal; and if you read the comments first, you will still need to read the code as well.
On the other hand, there are situations where it is impossible or extremely difficult to make the code looks like a prose; that's where comment could patch in.

How can I program defensively in Ruby?

Here's a perfect example of the problem: Classifier gem breaks Rails.
** Original question: **
One thing that concerns me as a security professional is that Ruby doesn't have a parallel of Java's package-privacy. That is, this isn't valid Ruby:
public module Foo
public module Bar
# factory method for new Bar implementations
def self.new(...)
SimpleBarImplementation.new(...)
end
def baz
raise NotImplementedError.new('Implementing Classes MUST redefine #baz')
end
end
private class SimpleBarImplementation
include Bar
def baz
...
end
end
end
It'd be really nice to be able to prevent monkey-patching of Foo::BarImpl. That way, people who rely on the library know that nobody has messed with it. Imagine if somebody changed the implementation of MD5 or SHA1 on you! I can call freeze on these classes, but I have to do it on a class-by-class basis, and other scripts might modify them before I finish securing my application if I'm not very careful about load order.
Java provides lots of other tools for defensive programming, many of which are not possible in Ruby. (See Josh Bloch's book for a good list.) Is this really a concern? Should I just stop complaining and use Ruby for lightweight things and not hope for "enterprise-ready" solutions?
(And no, core classes are not frozen by default in Ruby. See below:)
require 'md5'
# => true
MD5.frozen?
# => false
I don't think this is a concern.
Yes, the mythical "somebody" can replace the implementation of MD5 with something insecure. But in order to do that, the mythical somebody must actually be able to get his code into the Ruby process. And if he can do that, then he presumably could also inject his code into a Java process and e.g. rewrite the bytecode for the MD5 operation. Or just intercept the keypresses and not actually bother with fiddling with the cryptography code at all.
One of the typical concerns is: I'm writing this awesome library, which is supposed to be used like so:
require 'awesome'
# Do something awesome.
But what if someone uses it like so:
require 'evil_cracker_lib_from_russian_pr0n_site'
# Overrides crypto functions and sends all data to mafia
require 'awesome'
# Now everything is insecure because awesome lib uses
# cracker lib instead of builtin
And the simple solution is: don't do that! Educate your users that they shouldn't run untrusted code they downloaded from obscure sources in their security critical applications. And if they do, they probably deserve it.
To come back to your Java example: it's true that in Java you can make your crypto code private and final and what not. However, someone can still replace your crypto implementation! In fact, someone actually did: many open-source Java implementations use OpenSSL to implement their cryptographic routines. And, as you probably know, Debian shipped with a broken, insecure version of OpenSSL for years. So, all Java programs running on Debian for the past couple of years actually did run with insecure crypto!
Java provides lots of other tools for defensive programming
Initially I thought you were talking about normal defensive programming,
wherein the idea is to defend the program (or your subset of it, or your single function) from invalid data input.
That's a great thing, and I encourage everyone to go read that article.
However it seems you are actually talking about "defending your code from other programmers."
In my opinion, this is a completely pointless goal, as no matter what you do, a malicious programmer can always run your program under a debugger, or use dll injection or any number of other techniques.
If you are merely seeking to protect your code from incompetent co-workers, this is ridiculous. Educate your co-workers, or get better co-workers.
At any rate, if such things are of great concern to you, ruby is not the programming language for you. Monkeypatching is in there by design, and to disallow it goes against the whole point of the feature.
Check out Immutable by Garry Dolley.
You can prevent redefinition of individual methods.
I guess Ruby has that a feature - valued more over it being a security issue. Ducktyping too.
E.g. I can add my own methods to the Ruby String class rather than extending or wrapping it.
"Educate your co-workers, or get better co-workers" works great for a small software startup, and it works great for the big guns like Google and Amazon. It's ridiculous to think that every lowly developer contracted in for some small medical charts application in a doctor's office in a minor city.
I'm not saying we should build for the lowest common denominator, but we have to be realistic that there are lots of mediocre programmers out there who will pull in any library that gets the job done, paying no attention to security. How could they pay attention to security? Maybe the took an algorithms and data structures class. Maybe they took a compilers class. They almost certainly didn't take an encryption protocols class. They definitely haven't all read Schneier or any of the others out there who practically have to beg and plead with even very good programmers to consider security when building software.
I'm not worried about this:
require 'evil_cracker_lib_from_russian_pr0n_site'
require 'awesome'
I'm worried about awesome requiring foobar and fazbot, and foobar requiring has_gumption, and ... eventually two of these conflict in some obscure way that undoes an important security aspect.
One important security principle is "defense in depth" -- adding these extra layers of security help you from accidentally shooting yourself in the foot. They can't completely prevent it; nothing can. But they help.
If monkey patching is your concen, you can use the Immutable module (or one of similar function).
Immutable
You could take a look at Why the Lucky Stiff's "Sandbox"project, which you can use if you worry about potentially running unsafe code.
http://code.whytheluckystiff.net/sandbox/
An example (online TicTacToe):
http://www.elctech.com/blog/safely-exposing-your-app-to-a-ruby-sandbox
Raganwald has a recent post about this. In the end, he builds the following:
class Module
def anonymous_module(&block)
self.send :include, Module.new(&block)
end
end
class Acronym
anonymous_module do
fu = lambda { 'fu' }
bar = lambda { 'bar' }
define_method :fubar do
fu.call + bar.call
end
end
end
That exposes fubar as a public method on Acronyms, but keeps the internal guts (fu and bar) private and hides helper module from outside view.
If someone monkeypatched an object or a module, then you need to look at 2 cases: He added a new method. If he is the only one adding this meyhod (which is very likely), then no problems arise. If he is not the only one, you need to see if both methods do the same and tell the library developer about this severe problem.
If they change a method, you should start to research why the method was changed. Did they change it due to some edge case behaviour or did they actually fix a bug? especially in the latter case, the monkeypatch is a god thing, because it fixes a bug in many places.
Besides that, you are using a very dynamic language with the assumption that programmers use this freedom in a sane way. The only way to remove this assumption is not to use a dynamic language.

Resources