Related
First, what to read about parsing and building AST?
How to create parser for a language (like SQL) that will build an AST and allow syntax errors?
For example, for "3+4*5":
+
/ \
3 *
/ \
4 5
And for "3+4*+" with syntax error, parser would guess that the user meant:
+
/ \
3 *
/ \
4 +
/ \
? ?
Where to start?
SQL:
SELECT_________________
/ \ \
. FROM JOIN
/ \ | / \
a city_name people address ON
|
=______________
/ \
.____ .
/ \ / \
p address_id a id
The standard answer to the question of how to build parsers (that build ASTs), is to read the standard texts on compiling. Aho and Ullman's "Dragon" Compiler book is pretty classic. If you haven't got the patience to get the best reference materials, you're going to have more trouble, because they provide theory and investigate subtleties. But here is my answer for people in a hurry, building recursive descent parsers.
One can build parsers with built-in error recovery. There are many papers on this sort of thing, a hot topic in the 1980s. Check out Google Scholar, hunt for "syntax error repair". The basic idea is that the parser, on encountering a parsing error, skips to some well-known beacon (";" a statement delimiter is pretty popular for C-like languages, which is why you got asked in a comment if your language has statement terminators), or proposes various input stream deletions or insertions to climb over the point of the syntax error. The sheer variety of such schemes is surprising. The key idea is generally to take into account as much information around the point of error as possible. One of the most intriguing ideas I ever saw had two parsers, one running N tokens ahead of the other, looking for syntax-error land-mines, and the second parser being feed error repairs based on the N tokens available before it encounters the syntax error. This lets the second parser choose to act differently before arriving at the syntax error. If you don't have this, most parser throw away left context and thus lose the ability to repair. (I never implemented such a scheme.)
The choice of things to insert can often be derived from information used to build the parser (often First and Follow sets) in the first place. This is relatively easy to do with L(AL)R parsers, because the parse tables contain the necessary information and are available to the parser at the point where it encounters an error. If you want to understand how to do this, you need to understand the theory (oops, there's that compiler book again) of how the parsers are constructed. (I have implemented this scheme successfully several times).
Regardless of what you do, syntax error repair doesn't help much, because it is almost impossible to guess what the writer of the parsed document actually intended. This suggests fancy schemes won't be really helpful. I stick to simple ones; people are happy to get an error report and some semi-graceful continuation of parsing.
A real problem with rolling your own parser for a real language, is that real languages are nasty messy things; people building real implementations get it wrong and frozen in stone because of existing code bases, or insist on bending/improving the language (standards are for wimps, goodies are for marketing) because its cool. Expect to spend a lot of time re-calibrating what you think the grammar is, against the ground truth of real code. As a general rule, if you want a working parser, better to get one that has a track record rather than roll it yourself.
A lesson most people that are hell-bent to build a parser don't get, is that if they want to do anything useful with the parse result or tree, they'll need a lot more basic machinery than just the parser. Check my bio for "Life After Parsing".
There are two things the parser could do:
Report the error and have the user try again.
Repair the error and proceed.
Generally speaking the first one is easier (and safer). There may not always be enough information for the parser to infer the intent when the syntax is wrong. Depending on the circumstances, it may be dangerous to proceed with a repair that makes the input syntactically correct but semantically wrong.
I've written a few hand-rolled recursive descent parsers for little languages. When writing code to interpret the grammar rules explicitly (as opposed to using a parser-generator), it's easy to detect errors, because the next token doesn't fit the production rule. Generated parsers tend to spit out a simplistic "expected $(TOKEN_TYPE) here" message, which isn't always useful to the user. With a hand-written parser, it's often easy to give a more specific diagnostic message, but it can be time consuming to cover every case.
If your goal is the report the problem but to keep parsing (so that you can see if there are additional problems), you can put a special AST node in the tree at the point of the error. This keeps the tree from falling apart.
You then have to resync to some point beyond the error in order to continue parsing. As Ira Baxter mentioned in his answer, you might look for a token, like ';', that separates statements. The correct token(s) to look for depends on the language you're parsing. Another possibility is to guess what the user meant (e.g., infer an extra token or a different token at the point the error was detected) and then continue. If you encounter another syntax error within the next few tokens, you could backtrack, make a different guess, and try again.
FxCop thought me (basically, from memory) that functions, classes and properties should be written in MajorCamelCase, while private variables should be in minorCamelCase.
I was talking about a reasonably popular project on IRC and quoted some code. One other guy, a fairly notorious troll who was also a half-op (gasp!) didn't seem to agree. Everything oughta be in the same casing, and he quite fervently favored MajorCamelCase, or even underscore_separation.
Ofcourse, he was just a troll so I reckoned I'd just keep doing it the way I already did. Before I learned the above guidelines, I hardly even had a coherent naming style.
He got me thinking, though -- does stuff like this really matter?
You need to make sure that your code is readable in the future. Please remember that you might want to pass the development of your application to someone else and this person will need to read and understand it. You could stop actively working on a project and return to it after a year - and be suprised that you have to read code carefully to understand how it works.
I believe it was Steve McConnell who said that specific naming style does not really matter (you could use anything you want as long as you are consistent) but this only applies when everyone working on the project agree with you.
In general it is better to adopt community-accepted coding styles where possible to facilitate code reuse and shorten learning curves.
If you don't care about long-term maintanability of your project (or consistency or readability) then no, casing (and coding conventions in general) don't really matter. Otherwise, they do matter. See this.
Your specific coding style doesn't matter (much), so long as it is consistent throughout the project.
This improves readability and understanding, as if an identifier is named in a particular way, the reader can (hopefully) be confident as to what that naming style implies.
As regards CamelCase v underscores, etc: again, it's down to your coding convention. One approach which uses both is to apply a prefix with underscore to indicate the module in which the function, or file-scope/global variable, is used, e.g. Config_Update(), Status_Get().
Duplicate
What are your hard rules about commenting?
A Developer I work with had some things to say about commenting that were interesting to me (see below). What is your personal approach/take on commenting?
"I don't add comments to code unless
its a simple heading or there's a
platform-bug or a necessary
work-around that isn't obvious. Code
can change and comments may become
misleading. Code should be
self-documenting in its use of
descriptive names and its logical
organization - and its solutions
should be the cleanest/simplest way
to perform a given task. If a
programmer can't tell what a program
does by only reading the code, then
he's not ready to alter it.
Commenting tends to be a crutch for
writing something complex or
non-obvious - my goal is to always
write clean and simple code."
"I think there a few camps when it
comes to commenting, the
enterprisey-type who think they're
writing an API and some grand
code-library that will be used for
generations to come, the
craftsman-like programmer that thinks
code says what it does clearer than a
comment could, and novices that write
verbose/unclear code so as to need to
leave notes to themselves as to why
they did something."
There's a tragic flaw with the "self-documenting code" theory. Yes, reading the code will tell you exactly what it is doing. However, the code is incapable of telling you what it's supposed to be doing.
I think it's safe to say that all bugs are caused when code is not doing what it's supposed to be doing :). So if we add some key comments to provide maintainers with enough information to know what a piece of code is supposed to be doing, then we have given them the ability to fix a whole lot of bugs.
That leaves us with the question of how many comments to put in. If you put in too many comments, things become tedious to maintain and the comments will inevitably be out of date with the code. If you put in too few, then they're not particularly useful.
I've found regular comments to be most useful in the following places:
1) A brief description at the top of a .h or .cpp file for a class explaining the purpose of the class. This helps give maintainers a quick overview without having to sift through all of the code.
2) A comment block before the implementation of a non-trivial function explaining the purpose of it and detailing its expected inputs, potential outputs, and any oddities to expect when calling the function. This saves future maintainers from having to decipher entire functions to figure these things out.
Other than that, I tend to comment anything that might appear confusing or odd to someone. For example: "This array is 1 based instead of 0 based because of blah blah".
Well written, well placed comments are invaluable. Bad comments are often worse than no comments. To me, lack of any comments at all indicates laziness and/or arrogance on the part of the author of the code. No matter how obvious it is to you what the code is doing or how fantastic your code is, it's a challenging task to come into a body of code cold and figure out what the heck is going on. Well done comments can make a world of difference getting someone up to speed on existing code.
I've always liked Refactoring's take on commenting:
The reason we mention comments here is that comments often are used as a deodorant. It's surprising how often you look at thickly commented code and notice that the comments are there because the code is bad.
Comments lead us to bad code that has all the rotten whiffs we've discussed in the rest of this chapter. Our first action is to remove the bad smells by refactoring. When we're finished, we often find that the comments are superfluous.
As controversial as that is, it's rings true for the code I've read. To be fair, Fowler isn't saying to never comment, but to think about the state of your code before you do.
You need documentation (in some form; not always comments) for a local understanding of the code. Code by itself tells you what it does, if you read all of it and can keep it all in mind. (More on this below.) Comments are best for informal or semiformal documentation.
Many people say comments are a code smell, replaceable by refactoring, better naming, and tests. While this is true of bad comments (which are legion), it's easy to jump to concluding it's always so, and hallelujah, no more comments. This puts all the burden of local documentation -- too much of it, I think -- on naming and tests.
Document the contract of each function and, for each type of object, what it represents and any constraints on a valid representation (technically, the abstraction function and representation invariant). Use executable, testable documentation where practical (doctests, unit tests, assertions), but also write short comments giving the gist where helpful. (Where tests take the form of examples, they're incomplete; where they're complete, precise contracts, they can be as much work to grok as the code itself.) Write top-level comments for each module and each project; these can explain conventions that keep all your other comments (and code) short. (This supports naming-as-documentation: with conventions established, and a place we can expect to find subtleties noted, we can be confident more often that the names tell all we need to know.) Longer, stylized, irritatingly redundant Javadocs have their uses, but helped generate the backlash.
(For instance, this:
Perform an n-fold frobulation.
#param n the number of times to frobulate
#param x the x-coordinate of the center of frobulation
#param y the y-coordinate of the center of frobulation
#param z the z-coordinate of the center of frobulation
could be like "Frobulate n times around the center (x,y,z)." Comments don't have to be a chore to read and write.)
I don't always do as I say here; it depends on how much I value the code and who I expect to read it. But learning how to write this way made me a better programmer even when cutting corners.
Back on the claim that we document for the sake of local understanding: what does this function do?
def is_even(n): return is_odd(n-1)
Tests if an integer is even? If is_odd() tests if an integer is odd, then yes, that works. Suppose we had this:
def is_odd(n): return is_even(n-1)
The same reasoning says this is_odd() tests if an integer is odd. Put them together, of course, and neither works, even though each works if the other does. Change it a bit and we'd have code that does work, but only for natural numbers, while still locally looking like it works for integers. In microcosm that's what understanding a codebase is like: tracing dependencies around in circles to try to reverse-engineer assumptions the author could have explained in a line or two if they'd bothered. I hate the expense of spirit thoughtless coders have put me to this way over the past couple of decades: oh, this method looks like it has the side effect of farbuttling the warpcore... always? Well, if odd crobuncles desaturate, at least; do they? Better check all the crobuncle-handling code... which will pose its own challenges to understanding. Good documentation cuts this O(n) pointer-chasing down to O(1): e.g. knowing a function's contract and the contracts of the things it explicitly uses, the function's code should make sense with no further knowledge of the system. (Here, contracts saying is_even() and is_odd() work on natural numbers would tell us that both functions need to test for n==0.)
My only real rule is that comments should explain why code is there, not what it is doing or how it is doing it. Those things can change, and if they do the comments have to be maintained. The purpose the code exists in the first place shouldn't change.
the purpose of comments is to explain the context - the reason for the code; this, the programmer cannot know from mere code inspection. For example:
strangeSingleton.MoveLeft(1.06);
badlyNamedGroup.Ignite();
who knows what the heck this is for? but with a simple comment, all is revealed:
//when under attack, sidestep and retaliate with rocket bundles
strangeSingleton.MoveLeft(1.06);
badlyNamedGroup.Ignite();
seriously, comments are for the why, not the how, unless the how is unintuitive.
While I agree that code should be self-readable, I still see a lot of value in adding extensive comment blocks for explaining design decisions. For example "I did xyz instead of the common practice of abc because of this caveot ..." with a URL to a bug report or something.
I try to look at it as: If I'm dead and gone and someone straight out of college has to fix a bug here, what are they going to need to know?
In general I see comments used to explain poorly written code. Most code can be written in a way that would make comments redundant. Having said that I find myself leaving comments in code where the semantics aren't intuitive, such as calling into an API that has strange or unexpected behavior etc...
I also generally subscribe to the self-documenting code idea, so I think your developer friend gives good advice, and I won't repeat that, but there are definitely many situations where comments are necessary.
A lot of times I think it boils down to how close the implementation is to the types of ordinary or easy abstractions that code-readers in the future are going to be comfortable with or more generally to what degree the code tells the entire story. This will result in more or fewer comments depending on the type of programming language and project.
So, for example if you were using some kind of C-style pointer arithmetic in an unsafe C# code block, you shouldn't expect C# programmers to easily switch from C# code reading (which is probably typically more declarative or at least less about lower-level pointer manipulation) to be able to understand what your unsafe code is doing.
Another example is when you need to do some work deriving or researching an algorithm or equation or something that is not going to end up in your code but will be necessary to understand if anyone needs to modify your code significantly. You should document this somewhere and having at least a reference directly in the relevant code section will help a lot.
I don't think it matters how many or how few comments your code contains. If your code contains comments, they have to maintained, just like the rest of your code.
EDIT: That sounded a bit pompous, but I think that too many people forget that even the names of the variables, or the structures we use in the code, are all simply "tags" - they only have meaning to us, because our brains see a string of characters such as customerNumber and understand that it is a customer number. And while it's true that comments lack any "enforcement" by the compiler, they aren't so far removed. They are meant to convey meaning to another person, a human programmer that is reading the text of the program.
If the code is not clear without comments, first make the code a clearer statement of intent, then only add comments as needed.
Comments have their place, but primarily for cases where the code is unavoidably subtle or complex (inherent complexity is due to the nature of the problem being solved, not due to laziness or muddled thinking on the part of the programmer).
Requiring comments and "measuring productivity" in lines-of-code can lead to junk such as:
/*****
*
* Increase the value of variable i,
* but only up to the value of variable j.
*
*****/
if (i < j) {
++i;
} else {
i = j;
}
rather than the succinct (and clear to the appropriately-skilled programmer):
i = Math.min(j, i + 1);
YMMV
The vast majority of my commnets are at the class-level and method-level, and I like to describe the higher-level view instead of just args/return value. I'm especially careful to describe any "non-linearities" in the function (limits, corner cases, etc) that could trip up the unwary.
Typically I don't comment inside a method, except to mark "FIXME" items, or very occasionally some sort of "here be monsters" gotcha that I just can't seem to clean up, but I work very hard to avoid those. As Fowler says in Refactoring, comments tend to indicate smally code.
Comments are part of code, just like functions, variables and everything else - and if changing the related functionality the comment must also be updated (just like function calls need changing if function arguments change).
In general, when programming you should do things once in one place only.
Therefore, if what code does is explained by clear naming, no comment is needed - and this is of course always the goal - it's the cleanest and simplest way.
However, if further explanation is needed, I will add a comment, prefixed with INFO, NOTE, and similar...
An INFO: comment is for general information if someone is unfamiliar with this area.
A NOTE: comment is to alert of a potential oddity, such as a strange business rule / implementation.
If I specifically don't want people touching code, I might add a WARNING: or similar prefix.
What I don't use, and am specifically opposed to, are changelog-style comments - whether inline or at the head of the file - these comments belong in the version control software, not the sourcecode!
I prefer to use "Hansel and Gretel" type comments; little notes in the code as to why I'm doing it this way, or why some other way isn't appropriate. The next person to visit this code will probably need this info, and more often than not, that person will be me.
As a contractor I know that some people maintaining my code will be unfamiliar with the advanced features of ADO.Net I am using. Where appropriate, I add a brief comment about the intent of my code and a URL to an MSDN page that explains in more detail.
I remember learning C# and reading other people's code I was often frustrated by questions like, "which of the 9 meanings of the colon character does this one mean?" If you don't know the name of the feature, how do you look it up?! (Side note: This would be a good IDE feature: I select an operator or other token in the code, right click then shows me it's language part and feature name. C# needs this, VB less so.)
As for the "I don't comment my code because it is so clear and clean" crowd, I find sometimes they overestimate how clear their very clever code is. The thought that a complex algorithm is self-explanatory to someone other than the author is wishful thinking.
And I like #17 of 26's comment (empahsis added):
... reading the code will tell you exactly
what it is doing. However, the code is
incapable of telling you what it's
supposed to be doing.
I very very rarely comment. MY theory is if you have to comment it's because you're not doing things the best way possible. Like a "work around" is the only thing I would comment. Because they often don't make sense but there is a reason you are doing it so you need to explain.
Comments are a symptom of sub-par code IMO. I'm a firm believer in self documenting code. Most of my work can be easily translated, even by a layman, because of descriptive variable names, simple form, and accurate and many methods (IOW not having methods that do 5 different things).
Comments are part of a programmers toolbox and can be used and abused alike. It's not up to you, that other programmer, or anyone really to tell you that one tool is bad overall. There are places and times for everything, including comments.
I agree with most of what's been said here though, that code should be written so clear that it is self-descriptive and thus comments aren't needed, but sometimes that conflicts with the best/optimal implementation, although that could probably be solved with an appropriately named method.
I agree with the self-documenting code theory, if I can't tell what a peice of code is doing simply by reading it then it probably needs refactoring, however there are some exceptions to this, I'll add a comment if:
I'm doing something that you don't
normally see
There are major side effects or implementation details that aren't obvious, or won't be next year
I need to remember to implement
something although I prefer an
exception in these cases.
If I'm forced to go do something else and I'm having good ideas, or a difficult time with the code, then I'll add sufficient comments to tmporarily preserve my mental state
Most of the time I find that the best comment is the function or method name I am currently coding in. All other comments (except for the reasons your friend mentioned - I agree with them) feel superfluous.
So in this instance commenting feels like overkill:
/*
* this function adds two integers
*/
int add(int x, int y)
{
// add x to y and return it
return x + y;
}
because the code is self-describing. There is no need to comment this kind of thing as the name of the function clearly indicates what it does and the return statement is pretty clear as well. You would be surprised how clear your code becomes when you break it down into tiny functions like this.
When programming in C, I'll use multi-line comments in header files to describe the API, eg parameters and return value of functions, configuration macros etc...
In source files, I'll stick to single-line comments which explain the purpose of non-self-evident pieces of code or to sub-section a function which can't be refactored to smaller ones in a sane way. Here's an example of my style of commenting in source files.
If you ever need more than a few lines of comments to explain what a given piece of code does, you should seriously consider if what you're doing can't be done in a better way...
I write comments that describe the purpose of a function or method and the results it returns in adequate detail. I don't write many inline code comments because I believe my function and variable naming to be adequate to understand what is going on.
I develop on a lot of legacy PHP systems that are absolutely terribly written. I wish the original developer would have left some type of comments in the code to describe what was going on in those systems. If you're going to write indecipherable or bad code that someone else will read eventually, you should comment it.
Also, if I am doing something a particular way that doesn't look right at first glance, but I know it is because the code in question is a workaround for a platform or something like that, then I'll comment with a WARNING comment.
Sometimes code does exactly what it needs to do, but is kind of complicated and wouldn't be immediately obvious the first time someone else looked at it. In this case, I'll add a short inline comment describing what the code is intended to do.
I also try to give methods and classes documentation headers, which is good for intellisense and auto-generated documentation. I actually have a bad habit of leaving 90% of my methods and classes undocumented. You don't have time to document things when you're in the middle of coding and everything is changing constantly. Then when you're done you don't feel like going back and finding all the new stuff and documenting it. It's probably good to go back every month or so and just write a bunch of documentation.
Here's my view (based on several years of doctoral research):
There's a huge difference between commenting functions (sort of a black box use, like JavaDocs), and commenting actual code for someone who will read the code ("internal commenting").
Most "well written" code shouldn't require much "internal commenting" because if it performs a lot then it should be broken into enough function calls. The functionality for each of these calls is then captured in the function name and in the function comments.
Now, function comments are indeed the problem, and in some ways your friend is right, that for most code there is no economical incentive for complete specifications the way that popular APIs are documented. The important thing here is to identify what are the "directives": directives are those information pieces that directly affect clients, and require some direct action (and are often unexpected). For example, X must be invoked before Y, don't call this from outside a UI thread, be aware that this has a certain side effect, etc. These are the things that are really important to capture.
Since most people never read full function documentations, and skim what they do read, you can actually increase the chances of awareness by capturing only the directives rather than the whole description.
I comment as much as needed - then, as much as I will need it a year later.
We add comments which provide the API reference documentation for all public classes / methods / properties / etc... This is well worth the effort because XML Documentation in C# has the nice effect of providing IntelliSense to users of these public APIs. .NET 4.0's code contracts will enable us to improve further on this practice.
As a general rule, we do not document internal implementations as we write code unless we are doing something non-obvious. The theory is that while we are writing new implementations, things are changing and comments are more likely than not to be wrong when the dust settles.
When we go back in to work on an existing piece of code, we add comments when we realize that it's taking some thought to figure out what in the heck is going on. This way, we wind up with comments where they are more likely to be correct (because the code is more stable) and where they are more likely to be useful (if I'm coming back to a piece of code today, it seems more likely that I might come back to it again tomorrow).
My approach:
Comments bridge the gap between context / real world and code. Therefore, each and every single line is commented, in correct English language.
I DO reject code that doesn't observe this rule in the strictest possible sense.
Usage of well formatted XML - comments is self-evident.
Sloppy commenting means sloppy code!
Here's how I wrote code:
if (hotel.isFull()) {
print("We're fully booked");
} else {
Guest guest = promptGuest();
hotel.checkIn(guest);
}
here's a few comments that I might write for that code:
// if hotel is full, refuse checkin, otherwise
// prompt the user for the guest info, and check in the guest.
If your code reads like a prose, there is no sense in writing comments that simply repeats what the code reads since the mental processing needed for reading the code and the comments would be almost equal; and if you read the comments first, you will still need to read the code as well.
On the other hand, there are situations where it is impossible or extremely difficult to make the code looks like a prose; that's where comment could patch in.
Let's say you've inherited a C# codebase that uses one class with 200 static methods to provide core functionality (such as database lookups). Of the many nightmares in that class, there's copious use of Hungarian notation (the bad kind).
Would you refactor the variable names to remove the Hungarian notation, or would you leave them alone?
If you chose to change all the variables to remove Hungarian notation, what would be your method?
Refactor -- I find Hungarian notation on that scale really interferes with the natural readability of the code, and the exercise is a good way of getting familiar with what's there.
However, if there are other team members who know the code base you would need consensus on the refactoring, and if any of the variables are exposed outside of the one project then you will have to leave them alone.
Just leave it alone. There are better uses of your time.
Right click on the variable name, Refactor -> Rename.
There are VS add-ins that do this as well, but the built-in method works fine for me.
What would I do? Assuming that I just have to maintain the code and not rewrite it any significant way? Leave it well alone. And When I do add code, go with the existing style, meaning, use that ugly Hungarian notation (as dirty as that makes me feel.)
But, hey, if you really have a hankerin' fer refactorin' then just do a little at a time. Every time you work on it spend ten minutes renaming variables. Tidying things up a little. After a few months you might find it's clean as a whistle....
Don't forget that there are two kinds of Hungarian Notation.
The original Charles Simonyi HN, later known as App's Hungarian and the later abomination called System Hungarian after some peckerhead (it's a technical term) totally misread Simonyi's original paper.
Unfortunately, System HN was propagated by Petzold and others to become the more dominant abortion that it is rightfully recognised as today.
Read Joel's excellent article about the intent of the original Apps Hungarian Notation and be sorry for what got lost in the rush.
If what you've got is App's Hungarian you will probably want to keep it after reading both the original Charles Simonyi article and the Joel article.
If you've landed in a steaming pile of System Hungarian?
All bets are off!
Whew! (said while holding nose) (-:
if you're feeling lucky and just want the Hungarian to go away, isolate the Hungarian prefixes that are used and try a search and replace in file to replace them with nothing, then do a clean and rebuild. If the number of errors is small, just fix it. If the number of errors is huge, go back and break it up into logical (by domain) classes first, then rename individually (the IDE will help)
I used to use it religiously back in the VB6 days, but stopped when VB.NET came out because that's what the new VB guidelines said. Other developers didn't. So, we’ve got a lot of old code with it. When I do maintenance on code I remove the notation from the functions/methods/sub I touch. I wouldn't remove it all at once unless you've got really good unit tests for everything and can run them to prove that nothing's broken.
How much are you going to break by doing this? That's an important question to ask yourself. If there are a lot of other pieces of code that use that library, then you might just be creating work for folks (maybe you) by going through the renaming exercise.
I'd put it on the list of things to do when refactoring. At least then everyone expects you to be breaking the library (temporarily).
That said, I totally get frustrated with poorly named methods and variables, so I can relate.
I wouldn't make a project out of it. I'd use the refactoring tools in VS (actually, I'd use Resharper's, but VS's work just fine) and fix all the variables in any method I was called upon to modify. Or if I had to make larger-scale changes, I'd refactor the variable names in any method I was called upon to understand.
If you have a legitimate need to remove and change it I would use either the built in refactoring tools, or something like Resharper.
However, I would agree with Chris Conway to a certain standpoint and ask you WHY, yes, it is annoying, but at the same time, a lot of the time the "if it aint't broke done't fix it" method is really the best way to go!
Only change it when you directly use it. And make sure you have a testbench ready to apply to ensure it still works.
I agree that the best way to phase out hungarian notation is to refactor code as you modify it. The greatest benefit of doing this kind of refactoring is that you should be writing unit tests around the code you're modifying so that you have a safety net instead of crossing your fingers and hoping that you don't break existing functionality. Once you have these unit tests in place, you are free to change the code to your heart's content.
I'd say a bigger problem is that you have a single class with 200(!) methods!
If this is a much depended on / much changed class then it might be worth refactoring to make it more usable.
In this, Resharper is an absolute must (you could use the built in refactoring stuff, but Resharper is way better).
Start finding a group of related methods, and then refactor these out into a nice small cohesive class. Update to conform to your latest code standards.
Compile & run your test suite.
Have energy for more? Extract another class.
Worn out - no trouble; come back and do some more tomorrow. In just a few days you'll have conquered the beast.
I agree with #Booji -- do it manually, on a per-routine basis when you're already visiting the code for some other good reason. Then, you'll get the most common ones out of the way, and who cares about the rest.
I was thinking of asking a similar question, only in my case, the offending code is my own. I have a very old habit of using "the bad kind" of Hungarian from my FoxPro days (which had weak typing and unusual scoping) — a habit I've only recently kicked.
It's hard — it means accepting an inconsistent style in your code base. It was only a week ago I finally said "screw it" and began a parameter name without the letter "p". The cognitive dissonance I initially felt has given way to a feeling of liberty. The world did not come to an end.
The way I've been going about this problem is changing one variable at a time as I come across them, then perform more sweeping changes when you come back to do more in-depth changes. If you're anything like me, the different nomenclature of your variables will drive you bat-shiat crazy for a while, but you'll slowly become used to it. The key is to chip away at it a little bit at a time until you have everything to where it needs to be.
Alternatively, you could jettison your variables altogether and just have every function return 42.
It sounds to me like the bigger problem is that 200-method God Object class. I'd suggest that refactoring just to remove the Hungarian notation is a low-value, high-risk activity in of itself. Unless there's a copious set of automated unit tests around that class to give you some confidence in your refactoring, I think you should leave it well and truly alone.
I guess it's unlikely that such a set of tests exists, because a developer following TDD practices would (hopefully) have naturally avoided building a god object in the first place - it would be very difficult to write comprehensive tests for.
Eliminating the god object and getting a unit test base in place is of higher value, however. My advice would be to look for opportunities to refactor the class itself - perhaps when a suitable business requirement/change comes along that necessitates a change to that code (and thus hopefully comes with some system & regression testing bought and paid for). You might not be able to justify the effort of refactoring the whole thing in one go, but you can do it piece by piece as the opportunity comes along, and test-drive the changes. In this way you can slowly convert the spaghetti code into a cleaner code base with comprehensive unit tests, bit by bit.
And you can eliminate the Hungarian as you go, if you like.
I am actually doing the same thing here for an application extension. My approach has been to use VIM mappings to search for specific Hungarian notation prefixes and then delete them and fix capitalization as appropriate.
Examples (goes in vimrc):
"" Hungarian notation conversion helpers
"" get rid of str prefixes and fix caps e.g. strName -> name
map ,bs /\Wstr[A-Z]^Ml3x~
map ,bi /\Wint[A-Z]^Ml3x~
"" little more complex to clean up m_p type class variables
map ,bm /\Wm_p\?[A-Z]^M:.s/\(\W\)m_p\?/\1_/^M/\W_[A-Z]^Mll~
map ,bp /\Wp[A-Z]^Mlx~
If you're gonna break code just for the sake of refactoring, I would seriously consider leaving i alone, specially, if you are going to affect other people in your team who may be depending on that code.
If your team is OK with this refactoring, and investing your time in doing this (which may be a time-saver in the future, if it means the code is more readable/maintainable), use Visual Studio (or whatever IDE you are using) to help you refactor the code.
However, if a big change like this is not a risk your team/boss is willing to take, I would suggest a somewhat unorthodox, half-way approach. Instead of doing all your refactoring in a single sweep, why not refactor sections of code (more specifically, functions) that need to be touched during normal maintenance? Over time, this slow refactoring will bring the code up to a cleaner state, at which point you can finish the refactoring process with a final sweep.
Use this java tool to remove HN:
Or just use "replace"/"replace all" with regex like below to replace "c_strX" to "x":
I love Hungarian notation. Don't understand why you would want to get rid of it.
I'm slowly moving from PHP5 to Python on some personal projects, and I'm currently loving the experience. Before choosing to go down the Python route I looked at Ruby. What I did notice from the ruby community was that monkey-patching was both common and highly-regarded. I also came across a lot of horror stories regarding the trials of debugging ruby s/w because someone included a relatively harmless library to do a little job but which patched some heavily used core object without telling anyone.
I chose Python for (among other reasons) its cleaner syntax and the fact that it could do everything Ruby can. Python is making OO click much better than PHP ever has, and I'm reading more and more on OO principles to enhance this better understanding.
This evening I've been reading about Robert Martin's SOLID principles:
Single responsibility principle,
Open/closed principle,
Liskov substitution principle,
Interface segregation principle, and
Dependency inversion principle
I'm currently up to O: SOFTWARE ENTITIES (CLASSES, MODULES, FUNCTIONS, ETC.) SHOULD BE OPEN FOR EXTENSION, BUT CLOSED FOR MODIFICATION.
My head's in a spin over the conflict between ensuring consistency in OO design and the whole monkey-patching thing. I understand that its possible to do monkey-patching in Python. I also understand that being "pythonic" is to follow common, well-tested, oop best-practices & principles.
What I'd like to know is the community's opinion on the two opposing subjects; how they interoperate, when its best to use one over the other, whether the monkey-patching should be done at all... hopefully you can provide a resolution to the matter for me.
There's a difference between monkey-patching (overwriting or modifying pre-existing methods) and simple addition of new methods. I think the latter is perfectly fine, and the former should be looked at suspiciously, but I'm still in favour of keeping it.
I've encountered quite a few those problems where a third party extension monkeypatches the core libraries and breaks things, and they really do suck. Unfortunately, they all invariably seem stem from the the third party extension developers taking the path of least resistance, rather than thinking about how to actually build their solutions properly.
This sucks, but it's no more the fault of monkey patching than it's the fault of knife makers that people sometimes cut themselves.
The only times I've ever seen legitimate need for monkey patching is to work around bugs in third party or core libraries. For this alone, it's priceless, and I really would be disappointed if they removed the ability to do it.
Timeline of a bug in a C# program we had:
Read strange bug reports and trace problem to a minor bug in a CLR library.
Invest days coming up with a workaround involving catching exceptions in strange places and lots of hacks which compromises the code a lot
Spend days extricating hacky workaround when Microsoft release a service pack
Timeline of a bug in a rails program we had:
Read strange bug reports and trace problem to a minor bug in a ruby standard library
Spend 15 minutes performing minor monkey-patch to remove bug from ruby library, and place guards around it to trip if it's run on the wrong version of ruby.
Carry on with normal coding.
Simply delete monkeypatch later when next version of ruby is released.
The bugfixing process looks similar, except with monkeypatching, it's a 15 minute solution, and a 5-second 'extraction' whereas without it, pain and suffering ensues.
PS: The following example is "technically" monkeypatching, but is it "morally" monkeypatching? I'm not changing any behaviour - this is more or less just doing AOP in ruby...
class SomeClass
alias original_dostuff dostuff
def dostuff
# extra stuff, eg logging, opening a transaction, etc
original_dostuff
end
end
In my view, monkeypatching is useful to have but something that can be abused. People tend to discover it and feel like it should be used in every situation, where perhaps a mixin or other construct may be more appropriate.
I don't think it's something that you should outlaw, it's just something that the Ruby guys like to use. You can do similar things with Python but the community has taken the stance that things should be simpler and more obvious.
Monkey patching is not ruby-explicit, its done all over javascript too, with negative (IMO) effects.
My personal opinion is monkey patching should only be done to
a) Add functionality to an old version of a language which is available in the new version of the language which you need.
b) When there is no other "logical" place for it.
There are many many easy ways to make monkey patching really awful, such as the ability to change how basic functions such as ADDITION work.
My stance is, if you can avoid it, do so.
If you can avoid it in a nice way, kudos to you.
If you can't avoid it, get the opinion of 200 people because you probably just have not thought about it hard enough.
My pet hate is mootools extending the function object. Yes, you can do this. Instead of people just learning how javascript works:
setTimeout(function(){
foo(args);
}, 5000 );
There was added a new method to every function object, ( yes, im not joking ) so that functions now have their own functions.
foo.delay( 5000 , args );
Which had the additional effect of this sort of crap being valid:
foo.delay.delay( 500, [ 500, args ] );
And on like that ad infinitum.
The result? You no longer have a library, and a language, your langauge bows to the library and if the library happens to be in scope, you no longer have a language, and you cant just do things the way that they were done when you learn the language, and instead have to learn a new subset of commands just to not have it fall flat on its face ( at the cost of excessive slowdowns! )
may i note that foo.delay also returned an object, with its own methods, so you could do
x = foo.delay( 500, args );
x.clear();
and even
x.clear.delay(10);
which may sound overly useful, ... but you have to take into consideration the massive overhead used to make this viable.
clearTimeout(x);
SO HARD!
(Disclaimer: its been a while since I used moo, and have tried to forget it, and function names/structure may be incorrect. This is not an API reference. Please check their site for details ( sorry, their API reference sucks! ))
Mokeypatching is generally wrong. Create a proper subclass and add the methods.
I've used monkeypatching once in production code.
The issue is that REST uses GET, POST, PUT and DELETE. But the Django test client only offers GET and POST. I've monkeypatched methods for PUT (like POST) and DELETE (like GET).
Because of the tight binding between Django client and the Django test driver, it seemed easiest to monkeypatch it to support full REST testing.
You may find enlightening this discussion about Ruby's open classes and the Open-Closed Principle.
Even though I like Ruby, I feel monkey-patching is a tool of last resort to get things done. All things being equal, I prefer using traditional OO techniques with a sprinkle of functional programming goodness.
In my eyes, monkey-patching is one form of AOP. The article Aspect-Oriented Design Principles: Lessons from Object-Oriented Design (PDF) gives some ideas of how SOLID and other OOP principles can be applied to AOP.
My first thought is that monkey-patching violates OCP, since clients of a class should be able to expect that class to work consistently.
Monkey-patching is just plain wrong, IMHO. I've not come across the open/closed principle you mention before, but it's a principle I've long held myself, I agree with it 100%. I think of monkey-patching as a code-smell on a larger scale, a coding-philosophy-smell, as it were.