What is the most obfuscated code you've had to fix? - debugging

Most programmers will have had the experience of debugging/fixing someone else's code. Sometimes that "someone else's code" is so obfuscated it's bad enough trying to understand what it's doing.
What's the worst (most obfuscated) code you've had to debug/fix?
If you didn't throw it away and recode it from scratch, well why didn't you?

PHP OSCommerce is enough to say, it is obfuscated code...

a Java class
only static methods that manipulates DOM
8000 LOCs
long chain of methods that return null on "error": a.b().c().d().e()
very long methods (400/500 LOC each)
nested if, while, like:
if (...) {
for (...) {
if (...) {
if (...) {
while (...) {
if (...) {
cut-and-paste oriented programming
no exceptions, all exceptions are catched and "handled" using printStackTrace()
no unit tests
no documentation
I was tempted to throw away and recode... but, after 3 days of hard debugging,
I've added the magic if :-)

Spaghetti code PHP CMS system.

by default, programmers think someone else's code is obfuscated.
The worse I probably had to do was interpreting what variables i1, i2 j, k, t were in a simple method and they were not counters in 'for' loops.
In all other circumstances I guess the problem area was difficult which made the code look difficult.

I found this line in our codebase today and thought it was a nice example of sneaky obfuscation:
if (MULTICLICK_ENABLED.equals(propService.getProperty(PropertyNames.MULTICLICK_ENABLED))) {} else {
return false;
}
Just making sure I read the whole line. NO SKIMREADING.

When working on a GWT project, I would reach parts of GWT-compiled obfuscated JS code which wasn't mine.
Now good luck debugging real obfuscated code.

I can't remember the full code, but a single part of it remains burned into my memory as something I spend hours trying to understand:
do{
$tmp = shift unless shift;
$tmp;
}while($tmp);
I couldn't understand it at first, it looks so useless, then I printed out #_ for a list of arguments, a series of alternating boolean and function names, the code was used in conjunction with a library detection module that changed behaviour if a function was broken, but the code was so badly documented and made of things like that which made no sense without a complete understanding of the full code I gave up and rewrote the whole thing.
UPDATE from DVK:
And, lest someone claims this was because Perl is unreadable as opposed to coder being a golf master instead of good software developer, here's the same code in a slightly less obfuscated form (the really correct code wouldn't even HAVE alternating sub names and booleans in the first place :)
# This subroutine take a list of alternating true/false flags
# and subroutine names; and executes the named subroutines for which flag is true.
# I am also weird, otherwise I'd have simply have passed list of subroutines to execute :)
my #flags_and_sub_names_list = #_;
while ( #flags_and_sub_names_list ) {
my $flag = shift #flags_and_sub_names_list;
my $subName = shift #flags_and_sub_names_list;
next unless $flag && $subName;
&{ $subName }; # Call the named subroutine
}

I've had a case of a 300lines function performing input sanitization which missed a certain corner case. It was parsing certain situations manually using IndexOf and Substring plus a lot of inlined variables and constants (looks like the original coder didn't know anything about good practices), and no comment was provided. Throwing it away wasn't feasible due to time constraints and the fact that I didn't have the specification required so rewriting it would've meant understanding the original, but after understanding it fixing it was just quicker. I also added lots of comments, so whoever shall come after me won't feel the same pain taking a look at it...

The Perl statement:
select((select(s),$|=1)[0])
which, at the suggestion of the original author (Randal Schwartz himself, who said he disliked it but nothing else was available at the time), was replaced with something a little more understandable:
IO::Handle->autoflush
Beyond that one-liner, some of the Java JDBC libraries from IBM are obfuscated and all variables and functions are either combinations of the letter 'l' and '1' or single/double characters - very hard to track anything down until you get them all renamed. Needed to do this to track down why they worked fine in IBM's JRE but not Sun's.

If you're talking about HLL codes, once I was updating project written by a chinese and all comments were chinese (stored in ansii) and it was a horror to understand some code fragments, if you're talking about low level code there were MANY of them (obfuscated, mutated, vm-ed...).

I once had to reverse engineer a Java 1.1 framework that:
Extended event-driven SAX parser classes for every class, even those that didn't parse XML (the overridden methods were simply invoked ad hoc by other code)
Custom runtime exceptions were thrown in lieu of method invocations wherever possible. As a result, most of the business logic landed in a nested series of catch blocks.
If I had to guess, it was probably someone's "smart" idea that method invocations were expensive in Java 1.1, so throwing exceptions for non-exceptional flow control was somehow considered an optimization.
Went through about three bottles of eye drops.

I once found a time bomb that had been intentionally obfuscated.
When I had finally decoded what it was doing I mentioned it to the manager who said they knew about the time bomb but had left it in place because it was so ineffective and was interwoven with other code.
The time bomb was (presumably) supposed to go off after a certain date.
Instead, it had a bug in it so it only activated if someone was working after lunchtime on Dec 31st.
It had taken three years for that circumstance to occur since the guy who wrote the time bomb left the company.

Related

Is checking Perl function arguments worth it?

There's a lot of buzz about MooseX::Method::Signatures and even before that, modules such as Params::Validate that are designed to type check every argument to methods or functions. I'm considering using the former for all my future Perl code, both personal and at my place of work. But I'm not sure if it's worth the effort.
I'm thinking of all the Perl code I've seen (and written) before that performs no such checking. I very rarely see a module do this:
my ($a, $b) = #_;
defined $a or croak '$a must be defined!';
!ref $a or croak '$a must be a scalar!";
...
#_ == 2 or croak "Too many arguments!";
Perhaps because it's simply too much work without some kind of helper module, but perhaps because in practice we don't send excess arguments to functions, and we don't send arrayrefs to methods that expect scalars - or if we do, we have use warnings; and we quickly hear about it - a duck typing approach.
So is Perl type checking worth the performance hit, or are its strengths predominantly shown in compiled, strongly typed languages such as C or Java?
I'm interested in answers from anyone who has experience writing Perl that uses these modules and has seen benefits (or not) from their use; if your company/project has any policies relating to type checking; and any problems with type checking and performance.
UPDATE: I read an interesting article on the subject recently, called Strong Testing vs. Strong Typing. Ignoring the slight Python bias, it essentially states that type checking can be suffocating in some instances, and even if your program passes the type checks, it's no guarantee of correctness - proper tests are the only way to be sure.
If it's important for you to check that an argument is exactly what you need, it's worth it. Performance only matters when you already have correct functioning. It doesn't matter how fast you can get a wrong answer or a core dump. :)
Now, that sounds like a stupid thing to say, but consider some cases where it isn't. Do I really care what's in #_ here?
sub looks_like_a_number { $_[0] !~ /\D/ }
sub is_a_dog { eval { $_[0]->DOES( 'Dog' ) } }
In those two examples, if the argument isn't what you expect, you are still going to get the right answer because the invalid arguments won't pass the tests. Some people see that as ugly, and I can see their point, but I also think the alternative is ugly. Who wins?
However, there are going to be times that you need guard conditions because your situation isn't so simple. The next thing you have to pass your data to might expect them to be within certain ranges or of certain types and don't fail elegantly.
When I think about guard conditions, I think through what could happen if the inputs are bad and how much I care about the failure. I have to judge that by the demands of each situation. I know that sucks as an answer, but I tend to like it better than a bondage-and-discipline approach where you have to go through all the mess even when it doesn't matter.
I dread Params::Validate because its code is often longer than my subroutine. The Moose stuff is very attractive, but you have to realize that it's a way for you to declare what you want and you still get what you could build by hand (you just don't have to see it or do it). The biggest thing I hate about Perl is the lack of optional method signatures, and that's one of the most attractive features in Perl 6 as well as Moose.
I basically concur with brian. How much you need to worry about your method's inputs depends heavily on how much you are concerned that a) someone will input bad data, and b) bad data will corrupt the purpose of the method. I would also add that there is a difference between external and internal methods. You need to be more diligent about public methods because you're making a promise to consumers of your class; conversely you can be less diligent about internal methods as you have greater (theoretical) control over the code that accesses it, and have only yourself to blame if things go wrong.
MooseX::Method::Signatures is an elegant solution to adding a simple declarative way to explain the parameters of a method. Method::Signatures::Simple and Params::Validate are nice but lack one of the features I find most appealing about Moose: the Type system. I have used MooseX::Declare and by extension MooseX::Method::Signatures for several projects and I find that the bar to writing the extra checks is so minimal it's almost seductive.
Yes its worth it - defensive programming is one of those things that are always worth it.
The counterargument I've seen presented to this is that checking parameters on every single function call is redundant and a waste of CPU time. This argument's supporters favor a model in which all incoming data is rigorously checked when it first enters the system, but internal methods have no parameter checks because they should only be called by code which will pass them data which has already passed the checks at the system's border, so it is assumed to still be valid.
In theory, I really like the sound of that, but I can also see how easily it can fall like a house of cards if someone uses the system (or the system needs to grow to allow use) in a way that was unforeseen when the initial validation border is established. All it takes is one external call to an internal function and all bets are off.
In practice, I'm using Moose at the moment and Moose doesn't really give you the option to bypass validation at the attribute level, plus MooseX::Declare handles and validates method parameters with less fuss than unrolling #_ by hand, so it's pretty much a moot point.
I want to mention two points here.
The first are the tests, the second the performance question.
1) Tests
You mentioned that tests can do a lot and that tests are the only way
to be sure that your code is correct. In general i would say this is
absolutly correct. But tests itself only solves one problem.
If you write a module you have two problems or lets say two different
people that uses your module.
You as a developer and a user that uses your module. Tests helps with the
first that your module is correct and do the right thing, but it didn't
help the user that just uses your module.
For the later, i have one example. i had written a module using Moose
and some other stuff, my code ended always in a Segmentation fault.
Then i began to debug my code and search for the problem. I spend around
4 hours of time to find the error. In the end the problem was that i have
used Moose with the Array Trait. I used the "map" function and i didn't
provide a subroutine function, just a string or something else.
Sure this was an absolutly stupid error of mine, but i spend a long time to
debug it. In the end just a checking of the input that the argument is
a subref would cost the developer 10 seconds of time, and would cost me
and propably other a lot of more time.
I also know of other examples. I had written a REST Client to an interface
completly OOP with Moose. In the end you always got back Objects, you
can change the attributes but sure it didn't call the REST API for
every change you did. Instead you change your values and in the end you
call a update() method that transfers the data, and change the values.
Now i had a user that then wrote:
$obj->update({ foo => 'bar' })
Sure i got an error back, that update() does not work. But sure it didn't
work, because the update() method didn't accept a hashref. It only does
a synchronisation of the actual state of the object with the online
service. The correct code would be.
$obj->foo('bar');
$obj->update();
The first thing works because i never did a checking of the arguments. And i don't throw an error if someone gives more arguments then i expect. The method just starts normal like.
sub update {
my ( $self ) = #_;
...
}
Sure all my tests absolutely works 100% fine. But handling these errors that
are not errors cost me time too. And it costs the user propably a lot
of more time.
So in the end. Yes, tests are the only correct way to ensure that your code
works correct. But that doesn't mean that type checking is meaningless.
Type checking is there to help all your non-developers (on your module)
to use your module correctly. And saves you and others time finding
dump errors.
2) Performance
The short: You don't care for performance until you care.
That means until your module works to slow, Performance is always fast
enough and you don't need to care for this. If your module really works
to slow you need further investigations. But for these investigions
you should use a profiler like Devel::NYTProf to look what is slow.
And i would say. In 99% slowliness is not because you do type
checking, it is more your algorithm. You do a lot of computation, calling
functions to often etc. Often it helps if you do completly other solutions
use another better algorithm, do caching or something else, and the
performance hit is not your type checking. But even if the checking is the
performance hit. Then just remove it where it matters.
There is no reason to leave the type checking where performance don't
matters. Do you think type checking does matter in a case like above?
Where i have written a REST Client? 99% of performance issues here are
the amount of request that goes to the webservice or the time for such an
request. Don't using type checking or MooseX::Declare etc. would propably
speed up absolutly nothing.
And even if you see performance disadvantages. Sometimes it is acceptable.
Because the speed doesn't matter or sometimes something gives you a greater
value. DBIx::Class is slower then pure SQL with DBI, but DBIx::Class
gives you a lot for these.
Params::Validate works great,but of course checking args slows things down. Tests are mandatory(at least in the code I write).
Yes it's absolutely worth it, because it will help during development, maintenance, debugging, etc.
If a developer accidentally sends the wrong parameters to a method, a useful error message will be generated, instead of the error being propagated down to somewhere else.
I'm using Moose extensively for a fairly large OO project I'm working on. Moose's strict type checking has saved my bacon on a few occassions. Most importantly it has helped avoid situations where "undef" values are incorrectly being passed to the method. In just those instances alone it saved me hours of debugging time..
The performance hit is definitely there, but it can be managed. 2 hours of using NYTProf helped me find a few Moose Attributes that I was grinding too hard and I just refactored my code and got 4x performance improvement.
Use type checking. Defensive coding is worth it.
Patrick.
Sometimes. I generally do it whenever I'm passing options via hash or hashref. In these cases it's very easy to misremember or misspell an option name, and checking with Params::Check can save a lot of troubleshooting time.
For example:
sub revise {
my ($file, $options) = #_;
my $tmpl = {
test_mode => { allow => [0,1], 'default' => 0 },
verbosity => { allow => qw/^\d+$/, 'default' => 1 },
force_update => { allow => [0,1], 'default' => 0 },
required_fields => { 'default' => [] },
create_backup => { allow => [0,1], 'default' => 1 },
};
my $args = check($tmpl, $options, 1)
or croak "Could not parse arguments: " . Params::Check::last_error();
...
}
Prior to adding these checks, I'd forget whether the names used underscores or hyphens, pass require_backup instead of create_backup, etc. And this is for code I wrote myself--if other people are going to use it, you should definitely do some sort of idiot-proofing. Params::Check makes it fairly easy to do type checking, allowed value checking, default values, required options, storing option values to other variables and more.

Is it possible to write good and understandable code without any comments?

Can any one suggest what is the best way to write good code that is understandable without a single line of comments?
I once had a professor when I was in college tell me that any good code should never need any comments.
Her approach was a combination of very precise logic split out into small functions with very descriptive method/property/variable names. The majority of what she presented was, in fact, extremely readable with no comments. I try to do the same with everything I write...
Read Code Complete, 2nd Edition cover to cover. Perhaps twice.
To give some specifics:
Making code readable
Eliminating code repetition
Doing design/architecture before you write code
I like to 'humanise' code, so instead of:
if (starColour.red > 200 && starColour.blue > 200 && starColour.green > 200){
doSomething();
}
I'll do this:
bool starIsBright;
starIsBright = (starColour.red > 200 && starColour.blue > 200 && starColour.green > 200);
if(starIsBright){
doSomething();
}
In some cases - yes, but in many cases no. The Yes part is already answered by others - keep it simple, write it nicely, give it readable names, etc. The No part goes to when the problem you solve in code is not a code problem at all but rather domain specific problem or business logic problem. I've got no problem reading lousy code even if it doesn't have comments. It's annoying, but doable. But it's practically impossible to read some code without understanding why is it like this and what is it trying to solve. So things like :
if (starColour.red > 200 && starColour.blue > 200 && starColour.green > 200){
doSomething();
}
look nice, but could be quite meaningless in the context of what the program is actually doing. I'd rather have it like this:
// we do this according to the requirement #xxxx blah-blah..
if (starColour.red > 200 && starColour.blue > 200 && starColour.green > 200){
doSomething();
}
Well written code might eliminate the need for comments to explain what you're doing, but you'll still want comments to explain the why.
If you really want to then you would need to be very detailed in your variable names and methods names.
But in my opinion, there is no good way to do this. Comments serve a serious purpose in coding, even if you are the only one coding you still sometimes need to be reminded what part of the code you're looking at.
Yes, you can write code that doesn't need comments to describe what it does, but that may not be enough.
Just because a function is very clear in explaining what it does, does not, by itself, tell you why it is doing what it does.
As in everything, moderation is a good idea. Write code that is explanatory, and write comments that explain why it is there or what assumptions are being made.
I think that the concept of Fluent Interfaces is really a good example of this.
var bob = DB.GetCustomers().FromCountry("USA").WithName("Bob")
Clean Code by Robert C. Martin contains everything you need to write clean, understandable code.
Use descriptive variable names and descriptive method names. Use whitespace.
Make your code read like normal conversation.
Contrast the use of Matchers in Junit:
assertThat(x, is(3));
assertThat(x, is(not(4)));
assertThat(responseString, either(containsString("color")).or(containsString("colour")));
assertThat(myList, hasItem("3"));
with the traditional style of assertEquals:
assertEquals(3, x);
When I look at the assertEquals statement, it is not clear which parameter is "expected" and which is "actual".
When I look at assertThat(x, is(3)) I can read that in English as "Assert that x is 3" which is very clear to me.
Another key to writing self-documenting code is to wrap any bit of logic that is not clear in a method call with a clear name.
if( (x < 3 || x > 17) && (y < 8 || y > 15) )
becomes
if( xAndYAreValid( x, y ) ) // or similar...
I'm not sure writing code that is so expressive that you don't need comments is necessarily a great goal. Seems to me like another form of overoptimization. If I were on your team, I'd be pleased to see clear, concise code with just enough comments.
In most cases, yes, you can write code that is clear enough that comments become unnecessary noise.
The biggest problem with comments is there is no way to check their accuracy. I tend to agree with Uncle Bob Martin in chapter 4 of his book, Clean Code:
The proper use of comments is to compensate for our failure to express ourself in
code. Note that I used the word failure. I meant it. Comments are always failures. We must
have them because we cannot always figure out how to express ourselves without them,
but their use is not a cause for celebration.
So when you find yourself in a position where you need to write a comment, think it
through and see whether there isn’t some way to turn the tables and express yourself in
code. Every time you express yourself in code, you should pat yourself on the back. Every
time you write a comment, you should grimace and feel the failure of your ability of
expression.
Most comments are either needless redundancy, outright fallacy or a crutch used to explain poorly written code. I say most because there are certain scenarios where the lack of expressiveness lies with the language rather than the programmer.
For instance the copyright and license information typically found at the beginning of a source file. As far as I'm aware no known construct exists for this in any of the popular languages. Since a simple one or two line comment suffices, its unlikely that such a construct will be added.
The original need for most comments has been replaced over time by better technology or practices. Using a change journal or commenting out code has been supplanted with source control systems. Explanatory comments in long functions can be mitigated by simply writing shorter functions. etc.
You usually can turn your comment into a function name something like:
if (starColourIsGreaterThanThreshold(){
doSomething();
}
....
private boolean starColourIsGreaterThanThreshold() {
return starColour.red > THRESHOLD &&
starColour.blue > THRESHOLD &&
starColour.green > THRESHOLD
}
I think comments should express the why, perhaps the what, but as much as possible the code should define the how (the behavior).
Someone should be able to read the code and understand what it does (the how) from the code. What may not be obvious is why you would want such behavior and what this behavior contributes to the overall requirements.
The need to comment should give you pause, though. Maybe how you are doing it is too complicated and the need to write a comment shows that.
There is a third alternative to documenting code - logging. A method that is well peppered with logging statements can do a lot to explain the why, can touch on the what and may give you a more useful artifact than well named methods and variables regarding the behavior.
If you want to code entirely without comments and still have your code be followable, then you'll have to write a larger number of shorter methods. Methods will have to have descriptive names. Variables will also have to have descriptive names. One common method of doing this is to give variables the name of nouns and to give methods the names of verbal phrases. For example:
account.updateBalance();
child.givePacifier();
int count = question.getAnswerCount();
Use enums liberally. With an enum, you can replace most booleans and integral constants. For example:
public void dumpStackPretty(boolean allThreads) {
....
}
public void someMethod() {
dumpStackPretty(true);
}
vs
public enum WhichThreads { All, NonDaemon, None; }
public void dumpStackPretty(WhichThreads whichThreads) {
....
}
public void someMethod() {
dumpStackPretty(WhichThreads.All);
}
Descriptive names is your obvious first bet.
Secondly make sure each method does one thing and only one thing. If you have a public method that needs to do many things, split it up into several private methods and call those from the public method, in a way that makes the logic obvious.
Some time ago I had to create a method that calculated the correlation of two time series.
To calculate the correlation you also need the mean and standard deviation. So I had two private methods (well actually in this case they were public as they could be used for other purposes (but assuming they couldn't then they would be private)) for calculating A) the mean, B) the standard deviation.
This sort of splitting up of function into the smallest part that makes sense is probably the most important thing to make a code readable.
How do you decide where to break up methods. My way is, if the name is obvious e.g. getAddressFromPage it is the right size. If you have several contenders you are probably trying to do too much, if you can't think of a name that makes sense you method may not "do" enough - although the latter is much less likely.
I don't really think comments are a good idea in most cases. Comments don't get checked by the compiler so they so often are misleading or wrong as the code changes over time. Instead, I prefer self documenting, concise methods that don't need comments. It can be done, and I have been doing it this way for years.
Writing code without comments takes practice and discipline, but I find that the discipline pays off as the code evolves.
It may not be comments, but, to help someone better understand what it going on you may need some diagrams explaining how the program should work, as, if a person knows the big picture then it is easier to understand code.
But, if you are doing something complex then you may need some comments, for example, in a very math intensive program.
The other place I find comments useful and important, is to ensure that someone doesn't replace code with something that looks like it should work, but won't. In that case I leave the bad code in, and comment it out, with an explanation as to why it shouldn't be used.
So, it is possible to write code without comments, but only if you are limited in what types of applications you are writing, unless you can explain why a decision was made, somewhere, and not call it a comment.
For example, a random generator can be written many ways. If you pick a particular implementation it may be necessary to explain why you picked that particular generator, as the period may be sufficiently long for current requirements, but later the requirements may change and your generator may not be sufficient.
I believe it's possible, if you consider the fact that not everybody likes the same style. So in order to minimize comments, knowing your "readers" is the most important thing.
In "information systems" kind-of software, try using declarative sentence, try to approximate the code line to a line in english, and avoid "mathematical programming" (with the i,j and k for index, and the one-liners-to-do-a-lot) at all costs.
I think code can be self-documenting to a large degree, and I think it's crucial, but reading even well-written code can be like looking at cells of the human body with a microscope. It sometimes takes comments to really explain the big picture of how pieces of the system fit together, especially if it solves a really complex and difficult problem.
Think about special data structures. If all that computer scientists had ever published about data structures were well-written code, few would really understand the relative benefit of one data structure over another -- because Big-O runtime of any given operation is sometimes just not obvious from reading the code. That's where the math and amortized analysis presented in articles come in.

What is your personal approach/take on commenting?

Duplicate
What are your hard rules about commenting?
A Developer I work with had some things to say about commenting that were interesting to me (see below). What is your personal approach/take on commenting?
"I don't add comments to code unless
its a simple heading or there's a
platform-bug or a necessary
work-around that isn't obvious. Code
can change and comments may become
misleading. Code should be
self-documenting in its use of
descriptive names and its logical
organization - and its solutions
should be the cleanest/simplest way
to perform a given task. If a
programmer can't tell what a program
does by only reading the code, then
he's not ready to alter it.
Commenting tends to be a crutch for
writing something complex or
non-obvious - my goal is to always
write clean and simple code."
"I think there a few camps when it
comes to commenting, the
enterprisey-type who think they're
writing an API and some grand
code-library that will be used for
generations to come, the
craftsman-like programmer that thinks
code says what it does clearer than a
comment could, and novices that write
verbose/unclear code so as to need to
leave notes to themselves as to why
they did something."
There's a tragic flaw with the "self-documenting code" theory. Yes, reading the code will tell you exactly what it is doing. However, the code is incapable of telling you what it's supposed to be doing.
I think it's safe to say that all bugs are caused when code is not doing what it's supposed to be doing :). So if we add some key comments to provide maintainers with enough information to know what a piece of code is supposed to be doing, then we have given them the ability to fix a whole lot of bugs.
That leaves us with the question of how many comments to put in. If you put in too many comments, things become tedious to maintain and the comments will inevitably be out of date with the code. If you put in too few, then they're not particularly useful.
I've found regular comments to be most useful in the following places:
1) A brief description at the top of a .h or .cpp file for a class explaining the purpose of the class. This helps give maintainers a quick overview without having to sift through all of the code.
2) A comment block before the implementation of a non-trivial function explaining the purpose of it and detailing its expected inputs, potential outputs, and any oddities to expect when calling the function. This saves future maintainers from having to decipher entire functions to figure these things out.
Other than that, I tend to comment anything that might appear confusing or odd to someone. For example: "This array is 1 based instead of 0 based because of blah blah".
Well written, well placed comments are invaluable. Bad comments are often worse than no comments. To me, lack of any comments at all indicates laziness and/or arrogance on the part of the author of the code. No matter how obvious it is to you what the code is doing or how fantastic your code is, it's a challenging task to come into a body of code cold and figure out what the heck is going on. Well done comments can make a world of difference getting someone up to speed on existing code.
I've always liked Refactoring's take on commenting:
The reason we mention comments here is that comments often are used as a deodorant. It's surprising how often you look at thickly commented code and notice that the comments are there because the code is bad.
Comments lead us to bad code that has all the rotten whiffs we've discussed in the rest of this chapter. Our first action is to remove the bad smells by refactoring. When we're finished, we often find that the comments are superfluous.
As controversial as that is, it's rings true for the code I've read. To be fair, Fowler isn't saying to never comment, but to think about the state of your code before you do.
You need documentation (in some form; not always comments) for a local understanding of the code. Code by itself tells you what it does, if you read all of it and can keep it all in mind. (More on this below.) Comments are best for informal or semiformal documentation.
Many people say comments are a code smell, replaceable by refactoring, better naming, and tests. While this is true of bad comments (which are legion), it's easy to jump to concluding it's always so, and hallelujah, no more comments. This puts all the burden of local documentation -- too much of it, I think -- on naming and tests.
Document the contract of each function and, for each type of object, what it represents and any constraints on a valid representation (technically, the abstraction function and representation invariant). Use executable, testable documentation where practical (doctests, unit tests, assertions), but also write short comments giving the gist where helpful. (Where tests take the form of examples, they're incomplete; where they're complete, precise contracts, they can be as much work to grok as the code itself.) Write top-level comments for each module and each project; these can explain conventions that keep all your other comments (and code) short. (This supports naming-as-documentation: with conventions established, and a place we can expect to find subtleties noted, we can be confident more often that the names tell all we need to know.) Longer, stylized, irritatingly redundant Javadocs have their uses, but helped generate the backlash.
(For instance, this:
Perform an n-fold frobulation.
#param n the number of times to frobulate
#param x the x-coordinate of the center of frobulation
#param y the y-coordinate of the center of frobulation
#param z the z-coordinate of the center of frobulation
could be like "Frobulate n times around the center (x,y,z)." Comments don't have to be a chore to read and write.)
I don't always do as I say here; it depends on how much I value the code and who I expect to read it. But learning how to write this way made me a better programmer even when cutting corners.
Back on the claim that we document for the sake of local understanding: what does this function do?
def is_even(n): return is_odd(n-1)
Tests if an integer is even? If is_odd() tests if an integer is odd, then yes, that works. Suppose we had this:
def is_odd(n): return is_even(n-1)
The same reasoning says this is_odd() tests if an integer is odd. Put them together, of course, and neither works, even though each works if the other does. Change it a bit and we'd have code that does work, but only for natural numbers, while still locally looking like it works for integers. In microcosm that's what understanding a codebase is like: tracing dependencies around in circles to try to reverse-engineer assumptions the author could have explained in a line or two if they'd bothered. I hate the expense of spirit thoughtless coders have put me to this way over the past couple of decades: oh, this method looks like it has the side effect of farbuttling the warpcore... always? Well, if odd crobuncles desaturate, at least; do they? Better check all the crobuncle-handling code... which will pose its own challenges to understanding. Good documentation cuts this O(n) pointer-chasing down to O(1): e.g. knowing a function's contract and the contracts of the things it explicitly uses, the function's code should make sense with no further knowledge of the system. (Here, contracts saying is_even() and is_odd() work on natural numbers would tell us that both functions need to test for n==0.)
My only real rule is that comments should explain why code is there, not what it is doing or how it is doing it. Those things can change, and if they do the comments have to be maintained. The purpose the code exists in the first place shouldn't change.
the purpose of comments is to explain the context - the reason for the code; this, the programmer cannot know from mere code inspection. For example:
strangeSingleton.MoveLeft(1.06);
badlyNamedGroup.Ignite();
who knows what the heck this is for? but with a simple comment, all is revealed:
//when under attack, sidestep and retaliate with rocket bundles
strangeSingleton.MoveLeft(1.06);
badlyNamedGroup.Ignite();
seriously, comments are for the why, not the how, unless the how is unintuitive.
While I agree that code should be self-readable, I still see a lot of value in adding extensive comment blocks for explaining design decisions. For example "I did xyz instead of the common practice of abc because of this caveot ..." with a URL to a bug report or something.
I try to look at it as: If I'm dead and gone and someone straight out of college has to fix a bug here, what are they going to need to know?
In general I see comments used to explain poorly written code. Most code can be written in a way that would make comments redundant. Having said that I find myself leaving comments in code where the semantics aren't intuitive, such as calling into an API that has strange or unexpected behavior etc...
I also generally subscribe to the self-documenting code idea, so I think your developer friend gives good advice, and I won't repeat that, but there are definitely many situations where comments are necessary.
A lot of times I think it boils down to how close the implementation is to the types of ordinary or easy abstractions that code-readers in the future are going to be comfortable with or more generally to what degree the code tells the entire story. This will result in more or fewer comments depending on the type of programming language and project.
So, for example if you were using some kind of C-style pointer arithmetic in an unsafe C# code block, you shouldn't expect C# programmers to easily switch from C# code reading (which is probably typically more declarative or at least less about lower-level pointer manipulation) to be able to understand what your unsafe code is doing.
Another example is when you need to do some work deriving or researching an algorithm or equation or something that is not going to end up in your code but will be necessary to understand if anyone needs to modify your code significantly. You should document this somewhere and having at least a reference directly in the relevant code section will help a lot.
I don't think it matters how many or how few comments your code contains. If your code contains comments, they have to maintained, just like the rest of your code.
EDIT: That sounded a bit pompous, but I think that too many people forget that even the names of the variables, or the structures we use in the code, are all simply "tags" - they only have meaning to us, because our brains see a string of characters such as customerNumber and understand that it is a customer number. And while it's true that comments lack any "enforcement" by the compiler, they aren't so far removed. They are meant to convey meaning to another person, a human programmer that is reading the text of the program.
If the code is not clear without comments, first make the code a clearer statement of intent, then only add comments as needed.
Comments have their place, but primarily for cases where the code is unavoidably subtle or complex (inherent complexity is due to the nature of the problem being solved, not due to laziness or muddled thinking on the part of the programmer).
Requiring comments and "measuring productivity" in lines-of-code can lead to junk such as:
/*****
*
* Increase the value of variable i,
* but only up to the value of variable j.
*
*****/
if (i < j) {
++i;
} else {
i = j;
}
rather than the succinct (and clear to the appropriately-skilled programmer):
i = Math.min(j, i + 1);
YMMV
The vast majority of my commnets are at the class-level and method-level, and I like to describe the higher-level view instead of just args/return value. I'm especially careful to describe any "non-linearities" in the function (limits, corner cases, etc) that could trip up the unwary.
Typically I don't comment inside a method, except to mark "FIXME" items, or very occasionally some sort of "here be monsters" gotcha that I just can't seem to clean up, but I work very hard to avoid those. As Fowler says in Refactoring, comments tend to indicate smally code.
Comments are part of code, just like functions, variables and everything else - and if changing the related functionality the comment must also be updated (just like function calls need changing if function arguments change).
In general, when programming you should do things once in one place only.
Therefore, if what code does is explained by clear naming, no comment is needed - and this is of course always the goal - it's the cleanest and simplest way.
However, if further explanation is needed, I will add a comment, prefixed with INFO, NOTE, and similar...
An INFO: comment is for general information if someone is unfamiliar with this area.
A NOTE: comment is to alert of a potential oddity, such as a strange business rule / implementation.
If I specifically don't want people touching code, I might add a WARNING: or similar prefix.
What I don't use, and am specifically opposed to, are changelog-style comments - whether inline or at the head of the file - these comments belong in the version control software, not the sourcecode!
I prefer to use "Hansel and Gretel" type comments; little notes in the code as to why I'm doing it this way, or why some other way isn't appropriate. The next person to visit this code will probably need this info, and more often than not, that person will be me.
As a contractor I know that some people maintaining my code will be unfamiliar with the advanced features of ADO.Net I am using. Where appropriate, I add a brief comment about the intent of my code and a URL to an MSDN page that explains in more detail.
I remember learning C# and reading other people's code I was often frustrated by questions like, "which of the 9 meanings of the colon character does this one mean?" If you don't know the name of the feature, how do you look it up?! (Side note: This would be a good IDE feature: I select an operator or other token in the code, right click then shows me it's language part and feature name. C# needs this, VB less so.)
As for the "I don't comment my code because it is so clear and clean" crowd, I find sometimes they overestimate how clear their very clever code is. The thought that a complex algorithm is self-explanatory to someone other than the author is wishful thinking.
And I like #17 of 26's comment (empahsis added):
... reading the code will tell you exactly
what it is doing. However, the code is
incapable of telling you what it's
supposed to be doing.
I very very rarely comment. MY theory is if you have to comment it's because you're not doing things the best way possible. Like a "work around" is the only thing I would comment. Because they often don't make sense but there is a reason you are doing it so you need to explain.
Comments are a symptom of sub-par code IMO. I'm a firm believer in self documenting code. Most of my work can be easily translated, even by a layman, because of descriptive variable names, simple form, and accurate and many methods (IOW not having methods that do 5 different things).
Comments are part of a programmers toolbox and can be used and abused alike. It's not up to you, that other programmer, or anyone really to tell you that one tool is bad overall. There are places and times for everything, including comments.
I agree with most of what's been said here though, that code should be written so clear that it is self-descriptive and thus comments aren't needed, but sometimes that conflicts with the best/optimal implementation, although that could probably be solved with an appropriately named method.
I agree with the self-documenting code theory, if I can't tell what a peice of code is doing simply by reading it then it probably needs refactoring, however there are some exceptions to this, I'll add a comment if:
I'm doing something that you don't
normally see
There are major side effects or implementation details that aren't obvious, or won't be next year
I need to remember to implement
something although I prefer an
exception in these cases.
If I'm forced to go do something else and I'm having good ideas, or a difficult time with the code, then I'll add sufficient comments to tmporarily preserve my mental state
Most of the time I find that the best comment is the function or method name I am currently coding in. All other comments (except for the reasons your friend mentioned - I agree with them) feel superfluous.
So in this instance commenting feels like overkill:
/*
* this function adds two integers
*/
int add(int x, int y)
{
// add x to y and return it
return x + y;
}
because the code is self-describing. There is no need to comment this kind of thing as the name of the function clearly indicates what it does and the return statement is pretty clear as well. You would be surprised how clear your code becomes when you break it down into tiny functions like this.
When programming in C, I'll use multi-line comments in header files to describe the API, eg parameters and return value of functions, configuration macros etc...
In source files, I'll stick to single-line comments which explain the purpose of non-self-evident pieces of code or to sub-section a function which can't be refactored to smaller ones in a sane way. Here's an example of my style of commenting in source files.
If you ever need more than a few lines of comments to explain what a given piece of code does, you should seriously consider if what you're doing can't be done in a better way...
I write comments that describe the purpose of a function or method and the results it returns in adequate detail. I don't write many inline code comments because I believe my function and variable naming to be adequate to understand what is going on.
I develop on a lot of legacy PHP systems that are absolutely terribly written. I wish the original developer would have left some type of comments in the code to describe what was going on in those systems. If you're going to write indecipherable or bad code that someone else will read eventually, you should comment it.
Also, if I am doing something a particular way that doesn't look right at first glance, but I know it is because the code in question is a workaround for a platform or something like that, then I'll comment with a WARNING comment.
Sometimes code does exactly what it needs to do, but is kind of complicated and wouldn't be immediately obvious the first time someone else looked at it. In this case, I'll add a short inline comment describing what the code is intended to do.
I also try to give methods and classes documentation headers, which is good for intellisense and auto-generated documentation. I actually have a bad habit of leaving 90% of my methods and classes undocumented. You don't have time to document things when you're in the middle of coding and everything is changing constantly. Then when you're done you don't feel like going back and finding all the new stuff and documenting it. It's probably good to go back every month or so and just write a bunch of documentation.
Here's my view (based on several years of doctoral research):
There's a huge difference between commenting functions (sort of a black box use, like JavaDocs), and commenting actual code for someone who will read the code ("internal commenting").
Most "well written" code shouldn't require much "internal commenting" because if it performs a lot then it should be broken into enough function calls. The functionality for each of these calls is then captured in the function name and in the function comments.
Now, function comments are indeed the problem, and in some ways your friend is right, that for most code there is no economical incentive for complete specifications the way that popular APIs are documented. The important thing here is to identify what are the "directives": directives are those information pieces that directly affect clients, and require some direct action (and are often unexpected). For example, X must be invoked before Y, don't call this from outside a UI thread, be aware that this has a certain side effect, etc. These are the things that are really important to capture.
Since most people never read full function documentations, and skim what they do read, you can actually increase the chances of awareness by capturing only the directives rather than the whole description.
I comment as much as needed - then, as much as I will need it a year later.
We add comments which provide the API reference documentation for all public classes / methods / properties / etc... This is well worth the effort because XML Documentation in C# has the nice effect of providing IntelliSense to users of these public APIs. .NET 4.0's code contracts will enable us to improve further on this practice.
As a general rule, we do not document internal implementations as we write code unless we are doing something non-obvious. The theory is that while we are writing new implementations, things are changing and comments are more likely than not to be wrong when the dust settles.
When we go back in to work on an existing piece of code, we add comments when we realize that it's taking some thought to figure out what in the heck is going on. This way, we wind up with comments where they are more likely to be correct (because the code is more stable) and where they are more likely to be useful (if I'm coming back to a piece of code today, it seems more likely that I might come back to it again tomorrow).
My approach:
Comments bridge the gap between context / real world and code. Therefore, each and every single line is commented, in correct English language.
I DO reject code that doesn't observe this rule in the strictest possible sense.
Usage of well formatted XML - comments is self-evident.
Sloppy commenting means sloppy code!
Here's how I wrote code:
if (hotel.isFull()) {
print("We're fully booked");
} else {
Guest guest = promptGuest();
hotel.checkIn(guest);
}
here's a few comments that I might write for that code:
// if hotel is full, refuse checkin, otherwise
// prompt the user for the guest info, and check in the guest.
If your code reads like a prose, there is no sense in writing comments that simply repeats what the code reads since the mental processing needed for reading the code and the comments would be almost equal; and if you read the comments first, you will still need to read the code as well.
On the other hand, there are situations where it is impossible or extremely difficult to make the code looks like a prose; that's where comment could patch in.

What types of coding anti-patterns do you always refactor when you cross them?

I just refactored some code that was in a different section of the class I was working on because it was a series of nested conditional operators (?:) that was made a ton clearer by a fairly simple switch statement (C#).
When will you touch code that isn't directly what you are working on to make it more clear?
I once was refactoring and came across something like this code:
string strMyString;
try
{
strMyString = Session["MySessionVar"].ToString();
}
catch
{
strMyString = "";
}
Resharper pointed out that the .ToString() was redundant, so I took it out. Unfortunately, that ended up breaking the code. Whenever MySessionVar was null, it wasn't causing the NullReferenceException that the code relied on to bump it down to the catch block. I know, this was some sad code. But I did learn a good lesson from it. Don't rapidly go through old code relying on a tool to help you do the refactoring - think it through yourself.
I did end up refactoring it as follows:
string strMyString = Session["MySessionVar"] ?? "";
Update: Since this post is being upvoted and technically doesn't contain an answer to the question, I figured I should actually answer the question. (Ok, it was bothering me to the point that I was actually dreaming about it.)
Personally I ask myself a few questions before refactoring.
1) Is the system under source control? If so, go ahead and refactor because you can always roll back if something breaks.
2) Do unit tests exist for the functionality I am altering? If so, great! Refactor. The danger here is that the existence of unit tests don't indicate the accuracy and scope of said unit tests. Good unit tests should pick up any breaking changes.
3) Do I thoroughly understand the code I am refactoring? If there's no source control and no tests and I don't really understand the code I am changing, that's a red flag. I'd need to get more comfortable with the code before refactoring.
In case #3 I would probably spend the time to actually track all of the code that is currently using the method I am refactoring. Depending on the scope of the code this could be easy or impossible (ie. if it's a public API). If it comes down to being a public API then you really need to understand the original intent of the code from a business perspective.
I only refactor it if tests are already in place. If not, it's usually not worth my time to write tests for and refactor presumably working code.
This is a small, minor antipattern but it so irritates me that whenever I find it, I expunge it immediately. In C (or C++ or Java)
if (p)
return true;
else
return false;
becomes
return p;
In Scheme,
(if p #t #f)
becomes
p
and in ML
if p then true else false
becomes
p
I see this antipattern almost exclusively in code written by undergraduate students. I am definitely not making this up!!
Whenever I come across it and I don't think changing it will cause problems (e.g. I can understand it enough that I know what it does. e.g. the level of voodoo is low).
I only bother to change it if there is some other reason I'm modifying the code.
How far I'm willing to take it depends on how confident I am that I won't break anything and how extensive my own changes to the code are going to be.
This is a great situation to show off the benefits of unit tests.
If unit tests are in place, developers can bravely and aggressively refactor oddly written code they might come across. If it passes the unit tests and you've increased readability, then you've done your good deed for the day and can move on.
Without unit tests, simplifying complex code that's filled with voodoo presents a great risk of breaking the code and not even knowing you've introduced a new bug! So most developers will take the cautious route and move on.
For simple refactoring I try to clean up deeply nested control structures and really long functions (more than one screen worth of text). However its not a great idea to refactor code without a good reason (especially in a big team of developers). In general, unless the refactoring will make a big improvement in the code or fix an egregious sin I try to leave well enough alone.
Not refactoring per-say but just as a matter of general housekeeping I generally do this stuff when I start work on a module:
Remove stupid comments
Comments that say nothing more than the function signature already says
Comments that are pure idiocy like "just what it looks like"
Changelogs at the top of the file (we have version control for a reason)
Any API docs that are clearly out-of-sync with the code
Remove commented-out chunks of code
Add version control tags like $Id$ if they are missing
Fix whitespace issues (this can be annoying to others though because your name shows up for a lot of lines in a diff even if all you did was change whitespace)
Remove whitespace at the end of the lines
Change tabs->spaces (for that is our convention where I work)
If the refactor makes the code much easier to read, the most common for me would be duplicate code, e.g. an if/else that only differs by the first/last commands.
if($something) {
load_data($something);
} else {
load_data($something);
echo "Loaded";
do_something_else();
}
More than (arguably) three or four lines of duplicate code always makes me think about refactoring. Also, I tend to move code around a lot, extracting the code I predict to be used more frequently into a separate place - a class with its own well-defined purpose and responsibilites, or a static method of a static class (usually placed in my Utils.* namespace).
But, to answer your question, yes, there are lot of cases when making the code shorter does not necessarily mean making it well structued and readable. Using the ?? operator in C# is another example. What you also have to think about are the new features in your language of choice - e.g. LINQ can be used to do some stuff in a very elegant manner but also can make a very simple thing very unreadable and overly complex. You need to weigh these two thing very carefully, in the end it all boils down to your personal taste and, mostly, experience.
Well, this is another "it depends" answer, but I am afraid it has to be.
I almost always break >2 similar conditionals into a switch... most often with regards to enums. I will short a return instead of a long statement.
ex:
if (condition) {
//lots of code
//returns value
} else {
return null;
}
becomes:
if (!condition)
return null;
//lots of code..
//return value
breaking out early reduces extra indents, and reduces long bits of code... also as a general rule I don't like methods with more than 10-15 lines of code. I like methods to have a singular purpose, even if creating more private methods internally.
Usually I don't refactor the code if I'm just browsing it, not actively working on it.
But sometimes ReSharper points out some stuff I just can't resist to Alt+Enter. :)
I tend to refactor very long functions and methods if I understand the set of inputs and outputs to a block.
This helps readability no end.
I would only refactor the code that I come across and am not actively working on after going through the following steps:
Speak with the author of the code (not always possible) to figure out what that piece of code does. Even if it is obvious to me as to what the piece of code is doing, it always helps to understand the rationale behind why the author may have decided to do something in a certain way. Spending a couple of minutes talking about it would not only help the original author understand your point of view, it also builds trust within the team.
Know or find out what that piece of code is doing in order to test it after re-factoring (A build process with unit tests is very helpful here. It makes the whole thing quick and easy). Run the unit tests before and after the change and ensure nothing is breaking due to your changes.
Send out a heads up to the team (if working with others) and let them know of the upcoming change so nobody is surprised when the change actually occurs
Refactoring for the sake of it is one of the roots of all evil. Please don't ever do it. (Unit tests do somewhat mitigate this).
Is vast swaths of commented out code an antipattern? While not code specifically, (it's comments) I see a lot of this kind of behaviour with people who don't understand how to use source control, and who want to keep the old code around for later so they can more easily go back if a bug was introduced. Whenever I see vast swaths of code that are commented out, I almost always remove them in their entirety.
I try to refactor based on following factors:
Do I understand enough to know whats happening?
Can I easily revert back if this change breaks the code.
Will I have enough time to revert the change back if it breaks the build.
And sometimes, if I have enough time, I refactor to learn. As in, I know my change may break the code, but I dont know where and how. So I change it anyways to find out where its breaking and why. That way I learn more about the code.
My domain has been mobile software (cell phones) where most of the code resides on my PC and wont impact others. Since I also maintain the CI build system for my company I can run a complete product build (for all phones) on the refactored code to ensure it doesnt break anything else. But thats my personal luxury which you may not have.
I tend to refactor global constants and enumerations quite a bit if they can be deemed a low refactor risk. Maybe it's just be, but something like a ClientAccountStatus enum should be in or close to the ClientAccount class rather than being in a GlobalConstAndEnum class.
Deletion/updating of comments which are clearly wrong or clearly pointless.
Removing them is:
safe
version control means you can find them again
improves the quality of the code to others and yourself
It is about the only 100% risk free refactoring I know.
Note that doing this in structured comments like javadoc comments is a different matter. Changing these is not risk free, as such they are very likely to be fixed/removed but not dead certain guaranteed fixes as standard incorrect code comments would be.

What was the strangest coding standard rule that you were forced to follow? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
When I asked this question I got almost always a definite yes you should have coding standards.
What was the strangest coding standard rule that you were ever forced to follow?
And by strangest I mean funniest, or worst, or just plain odd.
In each answer, please mention which language, what your team size was, and which ill effects it caused you and your team.
I hate it when the use of multiple returns is banned.
reverse indentation. For example:
for(int i = 0; i < 10; i++)
{
myFunc();
}
and:
if(something)
{
// do A
}
else
{
// do B
}
Maybe not the most outlandish one you'll get, but I really really hate when I have to preface database table names with 'tbl'
Almost any kind of hungarian notation.
The problem with hungarian notation is that it is very often misunderstood. The original idea was to prefix the variable so that the meaning was clear. For example:
int appCount = 0; // Number of apples.
int pearCount = 0; // Number of pears.
But most people use it to determine the type.
int iAppleCount = 0; // Number of apples.
int iPearCount = 0; // Number of pears.
This is confusing, because although both numbers are integers, everybody knows, you can't compare apples with pears.
No ternary operator allowed where I currently work:
int value = (a < b) ? a : b;
... because not everyone "gets it". If you told me, "Don't use it because we've had to rewrite them when the structures get too complicated" (nested ternary operators, anyone?), then I'd understand. But when you tell me that some developers don't understand them... um... Sure.
To NEVER remove any code when making changes. We were told to comment all changes. Bear in mind we use source control. This policy didn't last long because developers were in an uproar about it and how it would make the code unreadable.
I once worked under the tyranny of the Mighty VB King.
The VB King was the pure master of MS Excel and VBA, as well as databases (Hence his surname : He played with Excel while the developers worked with compilers, and challenging him on databases could have detrimental effects on your career...).
Of course, his immense skills gave him an unique vision of development problems and project management solutions: While not exactly coding standards in the strictest sense, the VB King regularly had new ideas about "coding standards" and "best practices" he tried (and oftentimes succeeded) to impose on us. For example:
All C/C++ arrays shall start at index 1, instead of 0. Indeed, the use of 0 as first index of an array is obsolete, and has been superseded by Visual Basic 6's insightful array index management.
All functions shall return an error code: There are no exceptions in VB6, so why would we need them at all? (i.e. in C++)
Since "All functions shall return an error code" is not practical for functions returning meaningful types, all functions shall have an error code as first [in/out] parameter.
All our code will check the error codes (this led to the worst case of VBScript if-indentation I ever saw in my career... Of course, as the "else" clauses were never handled, no error was actually found until too late).
Since we're working with C++/COM, starting this very day, we will code all our DOM utility functions in Visual Basic.
ASP 115 errors are evil. For this reason, we will use On Error Resume Next in our VBScript/ASP code to avoid them.
XSL-T is an object oriented language. Use inheritance to resolve your problems (dumb surprise almost broke my jaw open this one day).
Exceptions are not used, and thus should be removed. For this reason, we will uncheck the checkbox asking for destructor call in case of exception unwinding (it took days for an expert to find the cause of all those memory leaks, and he almost went berserk when he found out they had willingly ignored (and hidden) his technical note about checking the option again, sent handfuls of weeks before).
catch all exceptions in the COM interface of our COM modules, and dispose them silently (this way, instead of crashing, a module would only appear to be faster... Shiny!... As we used the über error handling described above, it even took us some time to understand what was really happening... You can't have both speed and correct results, can you?).
Starting today, our code base will split into four branches. We will manage their synchronization and integrate all bug corrections/evolutions by hand.
All but the C/C++ arrays, VB DOM utility functions and XSL-T as OOP language were implemented despite our protests. Of course, over the time, some were discovered, ahem, broken, and abandoned altogether.
Of course, the VB King credibility never suffered for that: Among the higher management, he remained a "top gun" technical expert...
This produced some amusing side effects, as you can see by following the link What is the best comment in source code you have ever encountered?
Back in the 80's/90's, I worked for an aircraft simulator company that used FORTRAN. Our FORTRAN compiler had a limit of 8 characters for variable names. The company's coding standards reserved the first three of them for Hungarian-notation style info. So we had to try and create meaningful variable names with just 5 characters!
I worked at a place that had a merger between 2 companies. The 'dominant' one had a major server written in K&R C (i.e. pre-ANSI). They forced the Java teams (from both offices -- probably 20 devs total) to use this format, which gleefully ignored the 2 pillars of the "brace debate" and goes straight to crazy:
if ( x == y )
{
System.out.println("this is painful");
x = 0;
y++;
}
Forbidden:
while (true) {
Allowed:
for (;;) {
a friend of mine - we'll call him CodeMonkey - got his first job out of college [many years ago] doing in-house development in COBOL. His first program was rejected as 'not complying with our standards' because it used... [shudder!] nested IF statements
the coding standards banned the use of nested IF statements
now, CodeMonkey was not shy and was certain of his abilities, so he persisted in asking everyone up the chain and down the aisle why this rule existed. Most claimed they did not know, some made up stuff about 'readability', and finally one person remembered the original reason: the first version of the COBOL compiler they used had a bug and didn't handle nested IF statements correctly.
This compiler bug, of course, had been fixed for at least a decade, but no one had challenged the standards. [baaa!]
CodeMonkey was successful in getting the standards changed - eventually!
Once worked on a project where underscores were banned. And I mean totally banned. So in a c# winforms app, whenever we added a new event handler (e.g. for a button) we'd have to rename the default method name from buttonName_Click() to something else, just to satisfy the ego of the guy that wrote the coding standards. To this day I don't know what he had against the humble underscore
Totally useless database naming conventions.
Every table name has to start with a number. The numbers show which kind of data is in the table.
0: data that is used everywhere
1: data that is used by a certain module only
2: lookup table
3: calendar, chat and mail
4: logging
This makes it hard to find a table if you only know the first letter of its name.
Also - as this is a mssql database - we have to surround tablenames with square brackets everywhere.
-- doesn't work
select * from 0examples;
-- does work
select * from [0examples];
We were doing a C++ project and the team lead was a Pascal guy.
So we had a coding standard include file to redefine all that pesky C and C++ syntax:
#define BEGIN {
#define END }
but wait there's more!
#define ENDIF }
#define CASE switch
etc. It's hard to remember after all this time.
This took what would have been perfectly readable C++ code and made it illegible to anyone except the team lead.
We also had to use reverse Hungarian notation, i.e.
MyClass *class_pt // pt = pointer to type
UINT32 maxHops_u // u = uint32
although oddly I grew to like this.
At a former job:
"Normal" tables begin with T_
"System" tables (usually lookups) begin with TS_ (except when they don't because somebody didn't feel like it that day)
Cross-reference tables begin with TSX_
All field names begin with F_
Yes, that's right. All of the fields, in every single table. So that we can tell it's a field.
A buddy of mine encountered this rule while working at a government job. The use of ++ (pre or post) was completely banned. The reason: Different compilers might interpret it differently.
Half of the team favored four-space indentation; the other half favored two-space indentation.
As you can guess, the coding standard mandated three, so as to "offend all equally" (a direct quote).
Not being able to use Reflection as the manager claimed it involved too much 'magic'.
The very strangest one I had, and one which took me quite some time to overthrow, was when the owner of our company demanded that our new product be IE only. If it could work on FireFox, that was OK, but it had to be IE only.
This might not sound too strange, except for one little flaw. All of the software was for a bespoke server software package, running on Linux, and all client boxes that our customer was buying were Linux. Short of trying to figure out how to get Wine (in those days, very unreliable) up and running on all of these boxes and seeing if we could get IE running and training their admins how to debug Wine problems, it simply wasn't possible to meet the owner's request. The problem was that he was doing the Web design and simply didn't know how to make Web sites compliant with FireFox.
It probably won't shock you to know that that our company went bankrupt.
Using generic numbered identifier names
At my current work we have two rules which are really mean:
Rule 1: Every time we create a new field in a database table we have to add additional reserve fields for future use. These reserve fields are numbered (because no one knows which data they will hold some day) The next time we need a new field we first look for an unused reserve field.
So we end up with with customer.reserve_field_14 containing the e-mail address of the customer.
At one day our boss thought about introducing reserve tables, but fortunatly we could convince him not to do it.
Rule 2: One of our products is written in VB6 and VB6 has a limit of the total count of different identifier names and since the code is very large, we constantly run into this limit. As a "solution" all local variable names are numbered:
Lvarlong1
Lvarlong2
Lvarstr1
...
Although that effectively circumvents the identifier limit, these two rules combined lead to beautiful code like this:
...
If Lvarbool1 Then
Lvarbool2 = True
End If
If Lvarbool2 Or Lvarstr1 <> Lvarstr5 Then
db.Execute("DELETE FROM customer WHERE " _
& "reserve_field_12 = '" & Lvarstr1 & "'")
End If
...
You can imagine how hard it is to fix old or someone else's code...
Latest update: Now we are also using "reserve procedures" for private members:
Private Sub LSub1(Lvarlong1 As Long, Lvarstr1 As String)
If Lvarlong1 >= 0 Then
Lvarbool1 = LFunc1(Lvarstr1)
Else
Lvarbool1 = LFunc6()
End If
If Lvarbool1 Then
LSub4 Lvarstr1
End If
End Sub
EDIT: It seems that this code pattern is becoming more and more popular. See this The Daily WTF post to learn more: Astigmatism :)
Back in my C++ days we were not allowed to use ==,>=, <=,&&, etc. there were macros for this ...
if (bob EQ 7 AND alice LEQ 10)
{
// blah
}
this was obviously to deal with the "old accidental assignment in conditional bug", however we also had the rule "put constants before variables", so
if (NULL EQ ptr); //ok
if (ptr EQ NULL); //not ok
Just remembered, the simplest coding standard I ever heard was "Write code as if the next maintainer is a vicious psychopath who knows where you live."
Hungarian notation in general.
I've had a lot of stupid rules, but not a lot that I considered downright strange.
The sillyiest was on a NASA job I worked back in the early 90's. This was a huge job, with well over 100 developers on it. The experienced developers who wrote the coding standards decided that every source file should begin with a four letter acronym, and the first letter had to stand for the group that was responsible for the file. This was probably a great idea for the old FORTRAN 77 projects they were used to.
However, this was an Ada project, with a nice hierarchal library structure, so it made no sense at all. Every directory was full of files starting with the same letter, followed by 3 more nonsense leters, an underscore, and then part of the file name that mattered. All the Ada packages had to start with this same five-character wart. Ada "use" clauses were not allowed either (arguably a good thing under normal circumstances), so that meant any reference to any identifier that wasn't local to that source file also had to include this useless wart. There probably should have been an insurrection over this, but the entire project was staffed by junior programmers and fresh from college new hires (myself being the latter).
A typical assignment statement (already verbose in Ada) would end up looking something like this:
NABC_The_Package_Name.X := NABC_The_Package_Name.X +
CXYZ_Some_Other_Package_Name.Delta_X;
Fortunately they were at least enlightened enough to allow us more than 80 columns! Still, the facility wart was hated enough that it became boilerplate code at the top of everyone's source files to use Ada "renames" to get rid of the wart. There'd be one rename for each imported ("withed") package. Like this:
package Package_Name renames NABC_Package_Name;
package Some_Other_Package_Name renames CXYZ_Some_Other_Package_Name;
--// Repeated in this vein for an average of 10 lines or so
What the more creative among us took to doing was trying to use the wart to make an acutally sensible (or silly) package name. (I know what you are thinking, but explitives were not allowed and shame on you! That's disgusting). For example, I was in the Common code group, and I needed to make a package to interface with the Workstation group. After a brainstorming session with the Workstation guy, we decided to name our packages so that someone needing both would have to write:
with CANT_Interface_Package;
with WONT_Interface_Package;
When I started working at one place, and started entering my code into the source control, my boss suddenly came up to me, and asked me to stop committing so much. He told me it is discouraged to do more than 1 commit per-day for a developer because it litters the source control. I simply gaped at him...
Later I understood that the reason he even came up to me about it is because the SVN server would send him (and 10 more high executives) a mail for each commit someone makes. And by littering the source control I guessed he ment his mailbox.
Doing all database queries via stored procedures in Sql Server 2000. From complex multi-table queries to simple ones like:
select id, name from people
The arguments in favor of procedures were:
Performance
Security
Maintainability
I know that the procedure topic is quite controversial, so feel free to score my answer negatively ;)
There must be 165 unit tests (not necessarily automated) per 1000 lines of code. That works out at one test for roughly every 8 lines.
Needless to say, some of the lines of code are quite long, and functions return this pointers to allow chaining.
We had to sort all the functions in classes alphabetically, to make them "easier to find".
Never mind the ide had a drop down. That was too many clicks.
(same tech lead wrote an app to remove all comments from our source code).
In 1987 or so, I took a job with a company that hired me because I was one of a small handful of people who knew how to use Revelation. Revelation, if you've never heard of it, was essentially a PC-based implementation of the Pick operating system - which, if you've never heard of it, got its name from its inventor, the fabulously-named Dick Pick. Much can be said about the Pick OS, most of it good. A number of supermini vendors (Prime and MIPS, at least) used Pick, or their own custom implementations of it.
This company was a Prime shop, and for their in-house systems they used Information. (No, that was really its name: it was Prime's implementation of Pick.) They had a contract with the state to build a PC-based system, and had put about a year into their Revelation project before the guy doing all the work, who was also their MIS director, decided he couldn't do both jobs anymore and hired me.
At any rate, he'd established a number of coding standards for their Prime-based software, many of which derived from two basic conditions: 1) the use of 80-column dumb terminals, and 2) the fact that since Prime didn't have a visual editor, he'd written his own. Because of the magic portability of Pick code, he'd brought his editor down into Revelation, and had built the entire project on the PC using it.
Revelation, of course, being PC-based, had a perfectly good full-screen editor, and didn't object when you went past column 80. However, for the first several months I was there, he insisted that I use his editor and his standards.
So, the first standard was that every line of code had to be commented. Every line. No exceptions. His rationale for that was that even if your comment said exactly what you had just written in the code, having to comment it meant you at least thought about the line twice. Also, as he cheerfully pointed out, he'd added a command to the editor that formatted each line of code so that you could put an end-of-line comment.
Oh, yes. When you commented every line of code, it was with end-of-line comments. In short, the first 64 characters of each line were for code, then there was a semicolon, and then you had 15 characters to describe what your 64 characters did. In short, we were using an assembly language convention to format our Pick/Basic code. This led to things that looked like this:
EVENT.LIST[DATE.INDEX][-1] = _ ;ADD THE MOST RECENT EVENT
EVENTS[LEN(EVENTS)] ;TO THE END OF EVENT LIST
(Actually, after 20 years I have finally forgotten R/Basic's line-continuation syntax, so it may have looked different. But you get the idea.)
Additionally, whenever you had to insert multiline comments, the rule was that you use a flower box:
************************************************************************
** IN CASE YOU NEVER HEARD OF ONE, OR COULDN'T GUESS FROM ITS NAME, **
** THIS IS A FLOWER BOX. **
************************************************************************
Yes, those closing asterisks on each line were required. After all, if you used his editor, it was just a simple editor command to insert a flower box.
Getting him to relent and let me use Revelation's built-in editor was quite a battle. At first he was insistent, simply because those were the rules. When I objected that a) I already knew the Revelation editor b) it was substantially more functional than his editor, c) other Revelation developers would have the same perspective, he retorted that if I didn't train on his editor I wouldn't ever be able to work on the Prime codebase, which, as we both knew, was not going to happen as long as hell remained unfrozen over. Finally he gave in.
But the coding standards were the last to go. The flower-box comments in particular were a stupid waste of time, and he fought me tooth and nail on them, saying that if I'd just use the right editor maintaining them would be perfectly easy. (The whole thing got pretty passive-aggressive.) Finally I quietly gave in, and from then on all of the code I brought to code reviews had his precious flower-box comments.
One day, several months into the job, when I'd pretty much proven myself more than competent (especially in comparison with the remarkable parade of other coders that passed through that office while I worked there), he was looking over my shoulder as I worked, and he noticed I wasn't using flower-box comments. Oh, I said, I wrote a source-code formatter that converts my comments into your style when I print them out. It's easier than maintaining them in the editor. He opened his mouth, thought for a moment, closed it, went away, and we never talked about coding standards again. Both of our jobs got easier after that.
At my first job, all C programs, no matter how simple or complex, had only four functions. You had the main, which called the other three functions in turn. I can't remember their names, but they were something along the lines of begin(), middle(), and end(). begin() opened files and database connections, end() closed them, and middle() did everything else. Needless to say, middle() was a very long function.
And just to make things even better, all variables had to be global.
One of my proudest memories of that job is having been part of the general revolt that led to the destruction of those standards.
An externally-written C coding standard that had the rule 'don't rely on built in operator precedence, always use brackets'
Fair enough, the obvious intent was to ban:
a = 3 + 6 * 2;
in favour of:
a = 3 + (6 * 2);
Thing was, this was enforced by a tool that followed the C syntax rules that '=', '==', '.' and array access are operators. So code like:
a[i].x += b[i].y + d - 7;
had to be written as:
((a[i]).x) += (((b[i]).y + d) - 7);

Resources