I was wondering what is the difference between the IF statement vs the IIF() function in Tableau. And then I found this page, which explained the syntax difference.
Difference between iif and if
But someone (over a seminar) also told me that IIF() has better performance. Is that true?
Statement vs. Function
The key difference between IF and IIF is that the former is
a statement
IF test THEN value END
IF test THEN value ELSE else END
And the latter is a function
IIF(test, then, else, [unknown])
Another difference, as has been alluded to in the cited link
in the opening question, is that the latter also supports the
notion of a seperate handing for the "unknown" case.
Here is the documentation from Tableau
IIF(test, then, else, [unknown])
. . .
A boolean comparison may also yield the value UNKNOWN (neither TRUE
nor FALSE), usually due to the presence of Null values in test. The
final argument to IIF is returned in the event of an UNKNOWN result
for the comparison. If this argument is left out, Null is returned.
In the IF statement, the handling of the unknown case is lumped into the ELSE block. From the same document
from Tableau cited above
The IF THEN ELSE function evaluates a sequence of test conditions and
returns the value for the first condition that is true. If no condition is true, the ELSE value is returned.
About the Claim About Performance Difference
As for the claim about the performance difference between the two constructs,
I investigated on the claim:
I have conducted quite some search, and I could not found evidence support that claim, either officially by Tableau or by another users.
I conducted a small (anecdotal) experiment and I could not observe any performance
difference (my test involved a dataset of 300MB from a CSV file and has
3.6M rows in it, both methods performed about as fast as each other).
The Tableau employee that told me about the claim seemed to have back-peddled after
I asked her the same question again for confirmation.
So I would classify the claim to be unsubstantiated.
Here is the difference:
IF(Condition, "truevalue", IIF(Condition,"truevalue","falsevalue")) - This statement if it see null value will return "falsevalue".
IIF(Condition, "truevalue", IIF(Condition,"truevalue","falsevalue"), "no value") - This function will see null value and return "no value".
Related
I'm designing a pseudocode version of a programme thingy I made, in which one of the sections is someone inputting a number to select an option. When someone inputs a number, a value from a list is output. I thought using an 'IF' statement nested within a 'CASE' statement would make that task run more efficiently, but I'm not sure if that would still conform to the acceptable 'CASE' statement format. This is what I was envisioning for the first option:
**
CASE category OF
'1' : PRINT "Members who have chosen to work as volunteers,"
IF MemberInfo[2] = 'yes'
PRINT "MemberInfo[0], MemberInfo[1]"
**
The following numbers in the main 'CASE' statement would then follow the same format. Is this okay, or should I just make various 'IF' statements?
It does not makes to use CASE here as we use CASE when there are multiple options to choose from. Given its just one condition and action based on that an 'if' is more appropriate.
An if nested within CASE is not a good programming structure. Go with either CASE or IF.
Input a number as category
If category is a number
then
print list
end if
Sure, that makes sense. Real code can do it, so why not pseudocode?
(But if the other cases have the same format, there's probably a better way to do it: maybe a map lookup or using the category more dynamically, depending on what's changing between each case statement, and what stays the same.)
I'm dealing with a collection of VBScript code (Microsoft Deployment Toolkit) and I frequently see the follow idiom when comparing a string to see if it has a given value:
If (oEnvironment.Item("IsOSUpgrade") <> "" and oEnvironment.Item("IsOSUpgrade") = "1") then
oEnvironment.Item is a property that I imagine could return null/nothing/empty (haven't wrapped my head fully around the subtle differences).
Does the first comparison serve any purpose? I'm guessing it does but don't understand what it would be. For surely if the the equality comparison returns True then the inequality comparison would as well, right? What am I missing? Something with null/nothing/empty?
You're right: if the second condition is true, the first will always be true. I think the first condition is there just for completeness and makes sure that "IsOSUpgrade" isn't empty/null.
Don't worry, you're not missing anything ;-)
I have a VB6 function, which executes an SQL delete. The function returns a boolean depending on whether or not the deletion was successful:
Public Function Delete(ByVal RecordID As Integer) As Boolean
On Error GoTo ErrorGenerated //Execute SQL delete
Delete = True
Exit Function
ErrorGenerated: Delete = False
End Function
I read somewhere that it is better to return an integer, which dictates whether or not the deletion was successful. However, there can only be two outcomes from running the function from what I can see i.e. deleted or not deleted (not deleted if an error is thrown). Is it better to return an integer?
I'd suggest your best bet is to return an enumerated type; each value for the enumeration can then explain to the caller what the problem is in a clear and unambiguous way, and new error reasons can be added later as required without breaking anything. Something like...
Public Enum DB_ERRS
Success
NoConnection
FailedForThisReason
FailedForThatReason
FailedForOtherReason
Failed
End Enum
Then all your database access functions could return a value of this type...
Public Function Delete(ByVal RecordID As Integer) As DB_ERRS
On Error GoTo ErrorGenerated
Execute SQL delete
Delete = Success
Exit Function
ErrorGenerated:
If Err.Number = this Then
Delete = FailedForThisReason
Else
Delete = Failed
End If
End Function
Intellisense will even help you fill them in.
This is rather subjective.
One would say, return a boolean because it's as simple as it gets.
Another one would say, return an integer, because later you might want to add a third status, such as "archived," and it would break existing code.
And someone else would say, Ditch that C-style return codes. Create a sub that doesn't return anything, and raise an exception in case you need to indicate failure.
I personally prefer exceptions. But it's up to you to decide.
In terms of size, an integer is a 32-bit signed integer, while the boolean data type doesn't really have a defined size. However, it also depends on the context from where you've read about using integers over booleans.
For SOME, the difference is irrelevant when using it as a return value from functions.
However, it can be something of a preference in stored procedures if you're also considering the return value from the stored procedure. The evaluation of booleans (when converted to numbers) may lead to it being treated like a bit (0 and 1).. In any case, it's more of a subjective approach. Integers allow more flexibility, while booleans offer limitation and simplicity. Which is better? I think it's almost entirely up to you, your preference, your coding standards, your company's coding standards, or whatnot..
Just to share a link on data types :
http://msdn.microsoft.com/en-us/library/aa383751(v=vs.85).aspx
I'll throw my opinion in. I personally think that returning a boolean value is the right thing to do. Do you really care why it failed to delete? Not normally, there are only a few reasons why a delete could fail in the first place (file locked or lack of permissions). If you need to return the reason for failure so it can be handled differently in some way, then yes, return an integer. Now personally, I don't like magic numbers, so I would never return an integer and would return an enum value instead.
Related Questions: Benefits of using short-circuit evaluation, Why would a language NOT use Short-circuit evaluation?, Can someone explain this line of code please? (Logic & Assignment operators)
There are questions about the benefits of a language using short-circuit code, but I'm wondering what are the benefits for a programmer? Is it just that it can make code a little more concise? Or are there performance reasons?
I'm not asking about situations where two entities need to be evaluated anyway, for example:
if($user->auth() AND $model->valid()){
$model->save();
}
To me the reasoning there is clear - since both need to be true, you can skip the more costly model validation if the user can't save the data.
This also has a (to me) obvious purpose:
if(is_string($userid) AND strlen($userid) > 10){
//do something
};
Because it wouldn't be wise to call strlen() with a non-string value.
What I'm wondering about is the use of short-circuit code when it doesn't effect any other statements. For example, from the Zend Application default index page:
defined('APPLICATION_PATH')
|| define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application'));
This could have been:
if(!defined('APPLICATION_PATH')){
define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application'));
}
Or even as a single statement:
if(!defined('APPLICATION_PATH'))
define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application'));
So why use the short-circuit code? Just for the 'coolness' factor of using logic operators in place of control structures? To consolidate nested if statements? Because it's faster?
For programmers, the benefit of a less verbose syntax over another more verbose syntax can be:
less to type, therefore higher coding efficiency
less to read, therefore better maintainability.
Now I'm only talking about when the less verbose syntax is not tricky or clever in any way, just the same recognized way of doing, but in fewer characters.
It's often when you see specific constructs in one language that you wish the language you use could have, but didn't even necessarily realize it before. Some examples off the top of my head:
anonymous inner classes in Java instead of passing a pointer to a function (way more lines of code).
in Ruby, the ||= operator, to evaluate an expression and assign to it if it evaluates to false or is null. Sure, you can achieve the same thing by 3 lines of code, but why?
and many more...
Use it to confuse people!
I don't know PHP and I've never seen short-circuiting used outside an if or while condition in the C family of languages, but in Perl it's very idiomatic to say:
open my $filehandle, '<', 'filename' or die "Couldn't open file: $!";
One advantage of having it all in one statement is the variable declaration. Otherwise you'd have to say:
my $filehandle;
unless (open $filehandle, '<', 'filename') {
die "Couldn't open file: $!";
}
Hard to claim the second one is cleaner in that case. And it'd be wordier still in a language that doesn't have unless
I think your example is for the coolness factor. There's no reason to write code like that.
EDIT: I have no problem with doing it for idiomatic reasons. If everyone else who uses a language uses short-circuit evaluation to make statement-like entities that everyone understands, then you should too. However, my experience is that code of that sort is rarely written in C-family languages; proper form is just to use the "if" statement as normal, which separates the conditional (which presumably has no side effects) from the function call that the conditional controls (which presumably has many side effects).
Short circuit operators can be useful in two important circumstances which haven't yet been mentioned:
Case 1. Suppose you had a pointer which may or may not be NULL and you wanted to check that it wasn't NULL, and that the thing it pointed to wasn't 0. However, you must not dereference the pointer if it's NULL. Without short-circuit operators, you would have to do this:
if (a != NULL) {
if (*a != 0) {
⋮
}
}
However, short-circuit operators allow you to write this more compactly:
if (a != NULL && *a != 0) {
⋮
}
in the certain knowledge that *a will not be evaluated if a is NULL.
Case 2. If you want to set a variable to a non-false value returned from one of a series of functions, you can simply do:
my $file = $user_filename ||
find_file_in_user_path() ||
find_file_in_system_path() ||
$default_filename;
This sets the value of $file to $user_filename if it's present, or the result of find_file_in_user_path(), if it's true, or … so on. This is seen perhaps more often in Perl than C, but I have seen it in C.
There are other uses, including the rather contrived examples which you cite above. But they are a useful tool, and one which I have missed when programming in less complex languages.
Related to what Dan said, I'd think it all depends on the conventions of each programming language. I can't see any difference, so do whatever is idiomatic in each programming language. One thing that could make a difference that comes to mind is if you had to do a series of checks, in that case the short-circuiting style would be much clearer than the alternative if style.
What if you had a expensive to call (performance wise) function that returned a boolean on the right hand side that you only wanted called if another condition was true (or false)? In this case Short circuiting saves you many CPU cycles. It does make the code more concise because of fewer nested if statements. So, for all the reasons you listed at the end of your question.
The truth is actually performance. Short circuiting is used in compilers to eliminate dead code saving on file size and execution speed. At run-time short-circuiting does not execute the remaining clause in the logical expression if their outcome does not affect the answer, speeding up the evaluation of the formula. I am struggling to remember an example. e.g
a AND b AND c
There are two terms in this formula evaluated left to right.
if a AND b evaluates to FALSE then the next expression AND c can either be FALSE AND TRUE or FALSE AND FALSE. Both evaluate to FALSE no matter what the value of c is. Therefore the compiler does not include AND c in the compiled format hence short-circuiting the code.
To answer the question there are special cases when the compiler cannot determine whether the logical expression has a constant output and hence would not short-circuit the code.
Think of it this way, if you have a statement like
if( A AND B )
chances are if A returns FALSE you'll only ever want to evaluate B in rare special cases. For this reason NOT using short ciruit evaluation is confusing.
Short circuit evaluation also makes your code more readable by preventing another bracketed indentation and brackets have a tendency to add up.
Which (if any) of the following will give the smallest performance hit? Or is the difference so small that I should use the most readable?
In the same page I've noted 3 styles used by previous maintainers:
Method 1:
If (strRqMethod = "Forum" or strRqMethod = "URL" or strRqMethod = "EditURL" or strRqMethod = "EditForum") Then
...
End If
Method 2:
Select Case strRqMethod
Case "Reply", "ReplyQuote", "TopicQuote"
'This is the only case in this statement...'
...
End Select
Method 3:
If InArray("Edit,EditTopic,Reply,ReplyQuote,Topic,TopicQuote",strRqMethod) Then
...
End If
.
.
.
'Elsewhere in the code'
function InArray(strArray,strValue)
if strArray <> "" and strArray <> "0" then
if (instr("," & strArray & "," ,"," & strValue & ",") > 0) then
InArray = True
else
InArray = False
end if
else
InArray = False
end if
end function
Moving away from Classic ASP/VBScript is not an option, so those comments need not bother to post.
You can benchmark this yourself to get the best results, as some performance will differ depending on the size of the input string.
However, I will say from a maintenance perspective, the second one is a bit easier to read/understand.
Well Method 3 is clearly going to perform worse than the other two.
Between Method 1 and Method 2 the difference is going to be marginal. Its worth remembering that VBScript doesn't do boolean expression short cutting hence in Method 1 strRqMethod will be compared with all strings even if it matches the first one. The Case statement in Method 2 at least has the option not to do that and likely will stop comparing when the first match is found in the set.
Utimately I would choose Method 2 not because I think it might be faster but because it expresses the intent of the code in the clearest way.
Educated guess:
Performance-wise, first two approaches are roughly equivalent; third method is very likely slower, even if it gets inlined.
Furthermore the differential between the first two are likely in the micro-seconds range, so you can safely consider this to be a bone fide case of premature optimization...
Since we're on the topic of OR-ed boolean evaluation, a few things to know:
Most compilers/interpreters will evaluate boolean expressions with "short circuit optimization", which means that at the first true condition found, the subsequent OR-ed conditions are NOT evaluated (since they wouldn't change the outcome). It is therefore a good idea to list the condition in [rough] decreasing order of probability, i.e. listing all the common cases first. (Also note that short circuit evaluation is also used with AND-ed expressions, but of course in the reverse, i.e. at the first false condition, the evalation stops, hence suggesting to write the expression with the most likely conditions to fail first).
Comparing strings is such a common task that most languages have this done in a very optimized fashion, at a very low level of the language. Most any trick we can think to improve this particular task is typically less efficient than the native operator.
As long as this is not done 100.000 (in other words: a lof of) times in a loop, it makes no difference. Although it is parsed code, we may still assume that the parsing is done swift and quickly enough not to make a difference.
I found severe performance problems only when you are concatenating a lot of strings - like I once found out when running a page, adding debug code to a global string to be able to dispay the debug only at the bottom of the page. The longer the page was, the more code it ran, the more debug code I added, and the longer the time it took to display the page. Since this page was doing some database access, I presumed it was somewhere in that code that the delay occured, only to found out that it was just the debug statements (to be honest, I had a log of debug string concatenated).