Exceptions for flow of control - algorithm

There is an interesting post over here about this, in relation to cross-application flow of control.
Well, recently, I've come across an interesting problem. Generating the nth value in a potentially (practically) endless recursive sequence. This particular algorithm WILL be in atleast 10-15 stack references deep at the point that it succeeds. My first thought was to throw a SuccessException that looked something like this (C#):
class SuccessException : Exception
{
public string Value
{ get; set; }
public SuccessException(string value)
: base()
{
Value = value;
}
}
Then do something like this:
try
{
Walk_r(tree);
}
catch (SuccessException ex)
{
result = ex.Value;
}
Then my thoughts wandered back here, where I've heard over and over to never use Exceptions for flow control. Is there ever an excuse? And how would you structure something like this, if you were to implement it?

In this case I would be looking at your Walk_r method, you should have something that returns a value, throwing an exception to indicate success, is NOT a common practice, and at minimum is going to be VERY confusing to anyone that sees the code. Not to mention the overhead associated with exceptions.

walk_r should simply return the value when it is hit. It's is a pretty standard recursion example. The only potential problem I see is that you said it is potentially endless, which will have to be compensated for in the walk_r code by keeping count of the recursion depth and stopping at some maximum value.
The exception actually makes the coding very strange since the method call now throws an exception to return the value, instead of simply returning 'normally'.
try
{
Walk_r(tree);
}
catch (SuccessException ex)
{
result = ex.Value;
}
becomes
result = Walk_r(tree);

I'm going to play devil's advocate here and say stick with the exception to indicate success. It might be expensive to throw/catch but that may be insignificant compared with the cost of the search itself and possibly less confusing than an early exit from the method.

It's not a very good idea to throw exceptions as a part of an algorithm, especially in .net. In some languages/platforms, exceptions are pretty efficient when thrown, and they usually are, when an iterable gets exhausted for instance.

Why not just return the resulting value? If it returns anything at all, assume it is successful. If it fails to return a value, then it means the loop failed.
If you must bring back from a failure, then I'd recommend you throw an exception.

The issue with using exceptions is that tey (in the grand scheme of things) are very inefficient and slow. It would surely be as easy to have a if condition within the recursive function to just return as and when needed. To be honest, with the amount of memory on modern PC's its unlikely (not impossible though) that you'll get a stack overflow with only a small number of recursive calls (<100).
If the stack is a real issue, then it might become necessary to be 'creative' and implement a 'depth limited search strategy', allow the function to return from the recursion and restart the search from the last (deepest) node.
To sum up: Exceptions should only be used in exceptional circumstances, the success of a function call i don't believe qualifies as such.

Using exceptions in normal program flow in my book is one of the worst practises ever.
Consider the poor sap who is hunting for swallowed exceptions and is running a debugger set to stop whenever an exception happens. That dude is now getting mad.... and he has an axe. :P

Related

panic for multilevel returns

Is there a better way to return from long sequences of recursive function calls?
I currently panic with a marker value like this:
type exitNow int
...
panic(exitnow(0))
to return multiple levels at once. At the root function a call to recover does general error handling (turning panics into errors) and handles exitNow as a special case.
This works fine I just want to know if there is a better way.
I already use a bool return value for a related purpose but using another one for this would be a pain. (every call to every function would need an if statment)
If it helps any this is part of the implimentation of a recusive decent parser.
I use this approach myself in my parsers. I don't panic with an integer value though. I use the actual current error as the panic message. The top level call doing the recover() simply appends some file/line/column information and then returns it as a regular error.
This method, and just returning errors from all functions are the only ways to do this in Go. The panic approach is a great deal more effective for the parser case, as it makes the lexer rules considerably simpler to implement (and read) as there are no if err != nil { return } parts littered everywhere.

Clone detection algorithm

I'm writing an algorithm that detects clones in source code. E.g. if there is a block like:
for(int i = o; i <5; i++){
doSomething(abc);
}
...and if this block is repeated somewhere else in the source code it will be detected as a clone. The method I am using at the moment is to create hashes for lines/blocks and compare them with hashes of other lines/blocks in the same source to see if there are any matches.
Now, if the same block as above was to be repeated somewhere with only the argument of doSomething different, it would not be detected as a clone even though it would appear very much like a clone to you and me. My algorithm detects exact matches but doesn't detect matching blocks where only the argument is different.
Could anyone suggest any ways of getting around this issue? Thanks!
Here's a super-simple way, which might go too far in erasing information (i.e., might produce too many false positives): replace every identifier that isn't a keyword with some fixed name. So you'd get
for (int DUMMY = DUMMY; DUMMY<5; DUMMY++) {
DUMMY(DUMMY);
}
(assuming you really meant o rather than 0 in the initialization part of the for-loop).
If you get a huge number of false positives with this, you could then post-process them by, for instance, looking to see what fraction of the DUMMYs actually correspond to the same identifier in both halves of the match, or at least to identifiers that are consistent between the two.
To do much better you'll probably need to parse the code to some extent. That would be a lot more work.
Well if you're going todo something else then you're going to have to parse to code at least a bit. For example you could detect methods and then ignore the method arguments in your hash. Anyway I think it's always true that you need your program to understand the code better than 'just text blocks', and that might get awefuly complicated.

General programming - calling a non void method but not using value

This is general programming, but if it makes a difference, I'm using objective-c. Suppose there's a method that returns a value, and also performs some actions, but you don't care about the value it returns, only the stuff that it does. Would you just call the method as if it was void? Or place the result in a variable and then delete it or forget about it? State your opinion, what you would do if you had this situation.
A common example of this is printf, which returns an int... but you rarely see this:
int val = printf("Hello World");
Yeah just call the method as if it was void. You probably do it all the time without noticing it. The assignment operator '=' actually returns a value, but it's very rarely used.
It depends on the environment (the language, the tools, the coding standard, ...).
For example in C, it is perfectly possible to call a function without using its value. With some functions like printf, which returns an int, it is done all the time.
Sometimes not using a value will cause a warning, which is undesirable. Assigning the value to a variable and then not using it will just cause another warning about an unused variable. For this case the solution is to cast the result to void by prefixing the call with (void), e.g.
(void) my_function_returning_a_value_i_want_to_ignore().
There are two separate issues here, actually:
Should you care about returned value?
Should you assign it to a variable you're not going to use?
The answer to #2 is a resounding "NO" - unless, of course, you're working with a language where that would be illegal (early Turbo Pascal comes to mind). There's absolutely no point in defining a variable only to throw it away.
First part is not so easy. Generally, there is a reason value is returned - for idempotent functions the result is function's sole purpose; for non-idempotent it usually represents some sort of return code signifying whether operation was completed normally. There are exceptions, of course - like method chaining.
If this is common in .Net (for example), there's probably an issue with the code breaking CQS.
When I call a function that returns a value that I ignore, it's usually because I'm doing it in a test to verify behavior. Here's an example in C#:
[Fact]
public void StatService_should_call_StatValueRepository_for_GetPercentageValues()
{
var statValueRepository = new Mock<IStatValueRepository>();
new StatService(null, statValueRepository.Object).GetValuesOf<PercentageStatValue>();
statValueRepository.Verify(x => x.GetStatValues());
}
I don't really care about the return type, I just want to verify that a method was called on a fake object.
In C it is very common, but there are places where it is ok to do so and other places where it really isn't. Later versions of GCC have a function attribute so that you can get a warning when a function is used without checking the return value:
The warn_unused_result attribute causes a warning to be emitted if a caller of the function with this attribute does not use its return value. This is useful for functions where not checking the result is either a security problem or always a bug, such as realloc.
int fn () __attribute__ ((warn_unused_result));
int foo ()
{
if (fn () < 0) return -1;
fn ();
return 0;
}
results in warning on line 5.
Last time I used this there was no way of turning off the generated warning, which causes problems when you're compiling 3rd-party code you don't want to modify. Also, there is of course no way to check if the user actually does something sensible with the returned value.

Is there a case where parameter validation may be considered redundant?

The first thing I do in a public method is to validate every single parameter before they get any chance to get used, passed around or referenced, and then throw an exception if any of them violate the contract. I've found this to be a very good practice as it lets you catch the offender the moment the infraction is committed but then, quite often I write a very simple getter/indexer such as this:
private List<Item> m_items = ...;
public Item GetItemByIdx( int idx )
{
if( (idx < 0) || (idx >= m_items.Count) )
{
throw new ArgumentOutOfRangeException( "idx", "Invalid index" );
}
return m_items[ idx ];
}
In this case the index parameter directly relates to the indexes in the list, and I know for a fact (e.g. documentation) that the list itself will do exactly the same and will throw the same exception. Should I remove this verification or I better leave it alone?
I wanted to know what you guys think, as I'm now in the middle of refactoring a big project and I've found many cases like the above.
Thanks in advance.
It's not just a matter of taste, consider
if (!File.Exists(fileName)) throw new ArgumentException("...");
var s = File.OpenText(fileName);
This looks similar to your example but there are several reasons (concurrency, access rights) why the OpenText() method could still fail, even with a FileNotFound error. So the Exists-check is just giving a false feeling of security and control.
It is a mind-set thing, when you are writing the GetItemByIdx method it probably looks quite sensible. But if you look around in a random piece of code there are usually lots of assumptions you could check before proceeding. It's just not practical to check them all, over and over. We have to be selective.
So in a simple pass-along method like GetItemByIdx I would argue against redundant checks. But as soon as the function adds more functionality or if there is a very explicit specification that says something about idx that argument turns around.
As a rule of thumb an exception should be thrown when a well defined condition is broken and that condition is relevant at the current level. If the condition belongs to a lower level, then let that level handle it.
I would only do parameter verification where it would lead to some improvement in code behavior. Since you know, in this case, that the check will be performed by the List itself, then your own check is redundant and provides no extra value, so I wouldn't bother.
It's true that possibly you duplicated work that's already been done in the API, but it's there now. If your error handling framework works and is solid, and isn't causing performance issues (profiling IYF) then I reckon leave it, and gradually phase it out if you have time. It doesn't sound like a top priority!

What to put in the IF block and what to put in the ELSE block?

This is a minor style question, but every bit of readability you add to your code counts.
So if you've got:
if (condition) then
{
// do stuff
}
else
{
// do other stuff
}
How do you decide if it's better like that, or like this:
if (!condition) then
{
// do other stuff
{
else
{
// do stuff
}
My heuristics are:
Keep the condition positive (less
mental calculation when reading it)
Put the most common path into the
first block
I prefer to put the most common path first, and I am a strong believer in nesting reduction so I will break, continue, or return instead of elsing whenever possible. I generally prefer to test against positive conditions, or invert [and name] negative conditions as a positive.
if (condition)
return;
DoSomething();
I have found that by drastically reducing the usage of else my code is more readable and maintainable and when I do have to use else its almost always an excellent candidate for a more structured switch statement.
Two (contradictory) textbook quotes:
Put the shortest clause of an if/else
on top
--Allen Holub, "Enough Rope to Shoot Yourself in the Foot", p52
Put the normal case after the if rather than after the else
--Steve McConnell, "Code Complete, 2nd ed.", p356
I prefer the first one. The condition should be as simple as possible and it should be fairly obvious which is simpler out of condition and !condition
It depends on your flow. For many functions, I'll use preconditions:
bool MyFunc(variable) {
if (variable != something_i_want)
return false;
// a large block of code
// ...
return true;
}
If I need to do something each case, I'll use an if (positive_clause) {} else {} format.
If the code is to check for an error condition, I prefer to put that code first, and the "successful" code second; conceptually, this keeps a function call and its error-checking code together, which makes sense to me because they are related. For example:
if (!some_function_that_could_fail())
{
// Error handling code
}
else
{
// Success code
}
I agree with Oli on using a positive if clause when possible.
Just please never do this:
if (somePositiveCondition)
else {
//stuff
}
I used to see this a lot at one place I worked and used to wonder if one of the coders didn't understand how not works...
When I am looking at data validation, I try to make my conditions "white listing" - that is, I test for what I will accept:
if DataIsGood() then
DoMyNormalStuff
else
TakeEvasiveAction
Rather than the other way around, which tends to degenerate into:
if SomeErrorTest then
TakeSomeEvasiveAction
else if SomeOtherErrorCondition then
CorrectMoreStupidUserProblems
else if YetAnotherErrorThatNoOneThoughtOf then
DoMoreErrorHandling
else
DoMyNormalStuff
I know this isn't exactly what you're looking for, but ... A lot of developers use a "guard clause", that is, a negative "if" statement that breaks out of the method as soon as possible. At that point, there is no "else" really.
Example:
if (blah == false)
{
return; // perhaps with a message
}
// do rest of code here...
There are some hard-core c/c++/assembly guys out there that will tell you that you're destroying your CPU!!! (in many cases, processors favor the "true" statement and try to "prefetch" the next thing to do... so theoretically any "false" condition will flush the pipe and will go microseconds slower).
In my opinion, we are at the point where "better" (more understandable) code wins out over microseconds of CPU time.
I think that for a single variable the not operator is simple enough and naming issues start being more relevant.
Never name a variable not_X, if in need use a thesaurus and find an opposite. I've seen plenty of awful code like
if (not_dead) {
} else {
}
instead of the obvious
if (alive) {
} else {
}
Then you can sanely use (very readable, no need to invert the code blocks)
if (!alive) {
} else {
}
If we're talking about more variables I think the best rule is to simplify the condition. After a while projects tend to get conditions like:
if (dead || (!dead && sleeping)) {
} else {
}
Which translates to
if (dead || sleeping) {
} else {
}
Always pay attention to what conditions look like and how to simplify them.
Software is knowledge capture. You're encoding someone's knowledge of how to do something.
The software should fit what's "natural" for the problem. When in doubt, ask someone else and see what people actually say and do.
What about the situation where the "common" case is do nothing? What then
if( common ) {
// pass
}
else {
// great big block of exception-handling folderol
}
Or do you do this?
if( ! common ) {
// great big block of except-handling folderol
}
The "always positive" rule isn't really what you want first. You want to look at rules more like the following.
Always natural -- it should read like English (or whatever the common language in your organization is.)
Where possible, common cases first -- so they appear common.
Where possible use positive logic; negative logic can be used where it's commonly said that way or where the common case is a do-nothing.
If one of the two paths is very short (1 to 10 lines or so) and the other is much longer, I follow the Holub rule mentioned here and put the shorter piece of code in the if. That makes it easier to see the if/else flow on one screen when reviewing the code.
If that is not possible, then I structure to make the condition as simple as possible.
For me it depends on the condition, for example:
if (!PreserveData.Checked)
{ resetfields();}
I tend to talk to my self with what I want the logic to be and code it to the little voice in my head.
You can usually make the condition positive without switching around the if / else blocks.
Change
if (!widget.enabled()) {
// more common
} else {
// less common
}
to
if (widget.disabled()) {
// more common
} else {
// less common
}
Intel Pentium branch prediction pre-fetches instructions for the "if" case. If it instead follows the "else" branch: it has the flush the instruction pipeline, causing a stall.
If you care a lot about performance: put the most likely outcome in the 'if' clause.
Personally i write it as
if (expected)
{
//expected path
}
else
{
//fallback other odd case
}
If you have both true and false conditions then I'd opt for a positive conditional - This reduces confusion and in general I believe makes your code easier to read.
On the other hand, if you're using a language such as Perl, and particularly if your false condition is either an error condition or the most common condition, you can use the 'unless' structure, which executes the code block unless the condition is true (i.e. the opposite of if):
unless ($foo) {
$bar;
}
First of all, let's put aside situations when it is better to avoid using "else" in the first place (I hope everyone agrees that such situations do exist and determining such cases probably should be a separate topic).
So, let's assume that there must be an "else" clause.
I think that readability/comprehensibility imposes at least three key requirements or rules, which unfortunately often compete with each other:
The shorter is the first block (the "if" block) the easier is it to grasp the entire "if-else" construct. When the "if" block is long enough, it becomes way too easy to overlook existence of "else" block.
When the "if" and "else" paths are logically asymmetric (e.g. "normal processing" vs. "error processing"), in a standalone "if-else" construct it does not really matter much which path is first and which is second. However, when there are multiple "if-else" constructs in proximity to each other (including nesting), and when all those "if-else" constructs have asymmetry of the same kind - that's when it is very important to arrange those asymmetric paths consistently.
Again, it can be "if ... normal path ... else ... abnormal path" for all, or "if ... abnormal path ... else ... normal path" for all, but it should not be a mix of these two variants.
With all other conditions equal, putting the normal path first is probably more natural for most human beings (I think it's more about psychology than aesthetics :-).
An expression that starts with a negation usually is less readable/comprehensible than an expression that doesn't.
So, we have these three competing requirements/rules, and the real question is: which of them are more important than others. For Allen Holub the rule #1 is probably the most important one. For Steve McConnell - it is the rule #2. But I don't think that you can really choose only one of these rules as a single quideline.
I bet you've already guessed my personal priorities here (from the way I ordered the rules above :-).
My reasons are simple:
The rule #1 is unconditional and impossible to circumvent. If one of the blocks is so long that it runs off the screen - it must become the "else" block. (No, it is not a good idea to create a function/method mechanically just to decrease the number of lines in an "if" or "else" block! I am assuming that each block already has a logically justifiable minimum amount of lines.)
The rule #2 involves a lot of conditions: multiple "if-else" constructs, all having asymmetry of the same kind, etc. So it just does not apply in many cases.
Also, I often observe the following interesting phenomenon: when the rule #2 does apply and when it is used properly, it actually does not conflict with the rule #1! For example, whenever I have a bunch of "if-else" statements with "normal vs. abnormal" asymmetry, all the "abnormal" paths are shorter than "normal" ones (or vice versa). I cannot explain this phenomenon, but I think that it's just a sign of good code organization. In other words, whenever I see a situation when rules #1 and #2 are in conflict, I start looking for "code smells" and more often than not I do find some; and after refactoring - tada! no more painful choosing between rule #1 and rule #2, :-)
Finally, the rule #3 hase the smallest scope and therefore is the least critical.
Also, as mentined here by other colleagues, it is often very easy to "cheat" with this rule (for example, to write "if(disabled),,," instead of "if(!enabled)...").
I hope someone can make some sense of this opus...
As a general rule, if one is significantly larger than the other, I make the larger one the if block.
put the common path first
turn negative cheking into positive ones (!full == empty)
I always keep the most likely first.
In Perl I have an extra control structure to help with that. The inverse of if.
unless (alive) {
go_to_heaven;
} else {
say "MEDIC";
}
You should always put the most likely case first. Besides being more readable, it is faster. This also applies to switch statements.
I'm horrible when it comes to how I set up if statements. Basically, I set it up based on what exactly I'm looking for, which leads everything to be different.
if (userinput = null){
explodeViolently();
} else {
actually do stuff;
}
or perhaps something like
if (1+1=2) {
do stuff;
} else {
explodeViolently();
}
Which section of the if/else statement actually does things for me is a bad habit of mine.
I generally put the positive result (so the method) at the start so:
if(condition)
{
doSomething();
}
else
{
System.out.println("condition not true")
}
But if the condition has to be false for the method to be used, I would do this:
if(!condition)
{
doSomething();
}
else
{
System.out.println("condition true");
}
If you must have multiple exit points, put them first and make them clear:
if TerminatingCondition1 then
Exit
if TerminatingCondition2 then
Exit
Now we can progress with the usual stuff:
if NormalThing then
DoNormalThing
else
DoAbnormalThing

Resources