Why is short-circuiting not the default behavior in VB? - vb6

VB has operators AndAlso and OrElse, that perform short-circuiting logical conjunction.
Why is this not the default behavior of And and Or expressions since short-circuiting is useful in every case.
Strangely, this is contrary to most languages where && and || perform short-circuiting.

Because the VB team had to maintain backward-compatibility with older code (and programmers!)
If short-circuiting was the default behavior, bitwise operations would get incorrectly interpreted by the compiler.
The Ballad of AndAlso and OrElse by Panopticon Central
Our first thought was that logical operations are much more common than bitwise operations, so we should make And and Or be logical operators and add new bitwise operators named BitAnd, BitOr, BitXor and BitNot (the last two being for completeness). However, during one of the betas it became obvious that this was a pretty bad idea. A VB user who forgets that the new operators exist and uses And when he means BitAnd and Or when he means BitOr would get code that compiles but produces "bad" results.

I do not find short-circuiting to be useful in every case. I use it only when required. For instance, when checking two different and unconnected variables, it would not be required:
If x > y And y > z Then
End If
As the article by Paul Vick illustrates (see link provided by Ken Browning above), the perfect scenario in which short-circuiting is useful is when an object has be checked for existence first and then one of its properties is to be evaluated.
If x IsNot Nothing AndAlso x.Someproperty > 0 Then
End If
So, in my opinion both syntactical options are very much required.

Explicit short-circuit makes sure that the left operand is evaluated first.
In some languages other than VB, logical operators may perform an implicit short circuit but may evaluate the right operator first (depending for instance on the complexity of the expressions at left and at right of the logical operator).

Related

Ruby: target-less 'case', compared to 'if'

(I have asked this question already at Ruby Forum, but it didn't draw any answer, so I'm crossposting it now)
From my understanding, the following pieces of code are equivalent under
Ruby 1.9 and higher:
# (1)
case
when x < y
foo
when a > b
bar
else
baz
end
# (2)
if x < y
foo
elsif a > b
bar
else
baz
end
So far I would have always used (2), out of a habit. Could someone think
of a particular reason, why either (1) or (2) is "better", or is it just
a matter of taste?
CLARIFICATION: Some users have objected, that this question would just be "opinion-based", and hence not suited to this forum. I therefore think that I did not make myself clear enough: I don't want to start a discussion on personal programming style. The reason why I brought up this topic is this:
I was surprised, that Ruby offered two very different syntaxes (target-less case, and if-elsif) for, as it seems to me, the exactly same purpose, in particular since the if-elsif syntax is the one virtually every programmer is familiar. I wouldn't even consider 'target-less if' as "syntactic sugar", because it doesn't allow me to express the programming logic more consisely then 'if-elsif'.
So I wonder in what situation I might want to use the 'target-less case' construct. Does it give a performance advantage? Is it different from if-elsif in some subtle way which I just don't notice?
ADDITIONAL FINDINGS regarding the implementation of target-less case:
Olivier Poulin has pointed out, that a target-less case statement would explicitly use the === operator against the value "true", which would cause a (tiny) perfomance penalty of the 'case' compared to 'if' (and one more reason why I don't see why someone might want to use it).
However, when checking the documentation of the case statement for Ruby 1.9 and Ruby 2.0, I found that they describe it differently, but both at least suggest that === might NOT be used in this case. In the case of Ruby 1.9:
Case statements consist of an optional condition, which is in the position of an argument to case, and zero or more when clauses. The first when clause to match the condition (or to evaluate to Boolean truth, if the condition is null) “wins”
Here it says, that if the condition (i.e. what comes after 'case') is null (i.e. does not exist), the first 'when' clause which evaluates to true is the one being executed. No reference to === here.
In Ruby 2.0, the wording is completely different:
The case expression can be used in two ways. The most common way is to compare an object against multiple patterns. The patterns are matched using the +===+ method [.....]. The other way to use a case expression is like an if-elsif expression: [example of target-less case is given here].
It hence says that === is used in the "first" way (case with target), while the target-less case "is like" if-elsif. No mentioning of === here.
Midwire ran a few benchmarks and concluded that if/elsif is faster
than case because case “implicitly compares using the more expensive
=== operator”.
Here is where I got this quote. It compares if/elsif statements to case.
It is very thorough and explores the differences in the instruction sequence, definitely will give you an idea on which is better.
The main thing i pulled from the post though, is that both if/else if and case have no huge differences, both can usually be used interchangeably.
Some major differences can present themselves depending on how many cases you have.
n = 1 (last clause matches)
if: 7.4821e-07
threequal_if: 1.6830500000000001e-06
case: 3.9176999999999997e-07
n = 15 (first clause matches)
if: 3.7357000000000003e-07
threequal_if: 5.0263e-07
case: 4.3348e-07
As you can see, if the last clause is matched,if/elsif runs much slower than case, while if the first clause is matched, it's the other way around.
This difference comes from the fact that if/elsif uses branchunless, while case uses branchif in their instruction sequences.
Here is a test I did on my own with a target-less case vs if/elsif statements (using "=="). The first time is case, while the second time is if/elsif.
First test, 5 when statements, 5 if/elsif, the first clause is true for both.
Time elapsed 0.052023 milliseconds
Time elapsed 0.031467999999999996 milliseconds
Second test, 5 when statements, 5 if/elsif, the last(5th) clause is true for both.
Time elapsed 0.001224 milliseconds
Time elapsed 0.028578 milliseconds
As you can see, just as we saw before, when the first clause is true, if/elsif perform better than case, while case has a massive performance advantage when the last clause is true.
CONCLUSION
Running more tests has shown that it probably comes down to probability. If you think the answer is going to come earlier in your list of clauses, use if/elsif, otherwise case seems to be faster.
The main thing that this has shown is that both case and if/elsif are equally efficient and that using one over the other comes down to probability and taste.

most readable way in XPath to write "is value X a member of sequence S"?

XPath 2.0 has some new functions and syntax, relative to 1.0, that work with sequences. Some of theset don't really add to what the language could already do in 1.0 (with node sets), but they make it easier to express the desired logic in ways that are more readable. This increases the chances of the programmer getting the code correct -- and keeping it that way. For example,
empty(s) is equivalent to not(s), but its intent is much clearer when you want to test whether a sequence is empty.
Correction: the effective boolean value of a sequence is in general more complicated than that. E.g. empty((0)) != not((0)). This applies to exists(s) vs. s in a boolean context as well. However, there are domains of s where empty(s) is equivalent to not(s), so the two could be used interchangeably within those domains. But this goes to show that the use of empty() can make a non-trivial difference in making code easier to understand.
Similarly, exists(s) is equivalent to boolean(s) that already existed in XPath 1.0 (or just s in a boolean context), but again is much clearer about the intent.
Quantified expressions; e.g. "some $x in expression satisfies test($x)" would be equivalent to boolean(expression[test(.)]) (although the new syntax is more flexible, in that you don't need to worry about losing the context item because you have the variable to refer to it by).
Similarly, "every $x in expression satisfies test($x)" would be equivalent to not(expression[not(test(.))]) but is more readable.
These functions and syntax were evidently added at no small cost, solely to serve the goal of writing XPath that is easier to map to how humans think. This implies, as experienced developers know, that understandable code is significantly superior to code that is difficult to understand.
Given all that ... what would be a clear and readable way to write an XPath test expression that asks
Does value X occur in sequence S?
Some ways to do it: (Note: I used X and S notation here to indicate the value and the sequence, but I don't mean to imply that these subexpressions are element name tests, nor that they are simple expressions. They could be complicated.)
X = S: This would be one of the most unreadable, since it requires the reader to
think about which of X and S are sequences vs. single values
understand general comparisons, which are not obvious from the syntax
However, one advantage of this form is that it allows us to put the topic (X) before the comment ("is a member of S"), which, I think, helps in readability.
See also CMS's good point about readability, when the syntax or names make the "cardinality" of X and S obvious.
index-of(S, X): This one is clear about what's intended as a value and what as a sequence (if you remember the order of arguments to index-of()). But it expresses more than we need to: it asks for the index, when all we really want to know is whether X occurs in S. This is somewhat misleading to the reader. An experienced developer will figure out what's intended, with some effort and with understanding of the context. But the more we rely on context to understand the intent of each line, the more understanding the code becomes a circular (spiral) and potentially Sisyphean task! Also, since index-of() is designed to return a list of all the indexes of occurrences of X, it could be more expensive than necessary: a smart processor, in order to evaluate X = S, wouldn't necessarily have to find all the contents of S, nor enumerate them in order; but for index-of(S, X), correct order would have to be determined, and all contents of S must be compared to X. One other drawback of using index-of() is that it's limited to using eq for comparison; you can't, for example, use it to ask whether a node is identical to any node in a given sequence.
Correction: This form, used as a conditional test, can result in a runtime error: Effective boolean value is not defined for a sequence of two or more items starting with a numeric value. (But at least we won't get wrong boolean values, since index-of() can't return a zero.) If S can have multiple instances of X, this is another good reason to prefer form 3 or 6.
exists(index-of(X, S)): makes the intent clearer, and would help the processor eliminate the performance penalty if the processor is smart enough.
some $m in S satisfies $m eq X: This one is very clear, and matches our intent exactly. It seems long-winded compared to 1, and that in itself can reduce readability. But maybe that's an acceptable price for clarity. Keep in mind that X and S could potentially be complex expressions themselves -- they're not necessarily just variable references. An advantage is that since the eq operator is explicit, you can replace it with is or any other comparison operator.
S[. eq X]: clearer than 1, but shares the semantic drawbacks of 2: it computes all members of S that are equal to X. Actually, this could return a false negative (incorrect effective boolean value), if X is falsy. E.g. (0, 1)[. eq 0] returns 0 which is falsy, even though 0 occurs in (0, 1).
exists(S[. eq X]): Clearer than 1, 2, 3, and 5. Not as clear as 4, but shorter. Avoids the drawbacks of 5 (or at least most of them, depending on the processor smarts).
I'm kind of leaning toward the last one, at this point: exists(S[. eq X])
What about you... As a developer coming to a complex, unfamiliar XSLT or XQuery or other program that uses XPath 2.0, and wanting to figure out what that program is doing, which would you find easiest to read?
Apologies for the long question. Thanks for reading this far.
Edit: I changed = to eq wherever possible in the above discussion, to make it easier to see where a "value comparison" (as opposed to a general comparison) was intended.
For what it's worth, if names or context make clear that X is a singleton, I'm happy to use your first form, X = S -- for example when I want to check an attribute value against a set of possible values:
<xsl:when test="#type = ('A', 'A+', 'A-', 'B+')" />
or
<xsl:when test="#type = $magic-types"/>
If I think there is a risk of confusion, then I like your sixth formulation. The less frequently I have to remember the rules for calculating an effective boolean value, the less frequently I make a mistake with them.
I prefer this one:
count(distinct-values($seq)) eq count(distinct-values(($x, $seq)))
When $x is itself a sequence, this expression implements the (value-based) subset of relation between two sets of values, that are represented as sequences. This implementation of subset of has just linear time complexity -- vs many other ways of expressing this, that have O(N^2)) time complexity.
To summarize, the question whether a single value belongs to a set of values is a special case of the question whether one set of values is a subset of another. If we have a good implementation of the latter, we can simply use it for answering the former.
The functx library has a nice implementation of this function, so you can use
functx:is-node-in-sequence($X, $Y)
(this particular function can be found at http://www.xqueryfunctions.com/xq/functx_is-node-in-sequence.html)
The whole functx library is available for both XQuery (http://www.xqueryfunctions.com/) and XSLT (http://www.xsltfunctions.com/)
Marklogic ships the functx library with their core product; other vendors may also.
Another possibility, when you want to know whether node X occurs in sequence S, is
exists((X) intersect S)
I think that's pretty readable, and concise. But it only works when X and the values in S are nodes; if you try to ask
exists(('bob') intersect ('alice', 'bob'))
you'll get a runtime error.
In the program I'm working on now, I need to compare strings, so this isn't an option.
As Dimitri notes, the occurrence of a node in a sequence is a question of identity, not of value comparison.

What languages don't define execution order for multi-statement for statements?

Some C-like languages allow multiple statements in the update part of a for statement, e.g.
for (int i = 0; i < limit; i++, butter--, syrup--, pancakesDevoured++)
Java explicitly defines the order for such statements in JLS 14.14.1.2 and 15.8.3 of ECMA-334 (C#) says "the expressions of the for-iterator, if any, are evaluated in
sequence" which I'm reading as left-to-right.
What languages, if any, allow multiple statements in the update part of a for loop but either don't define an ordering for such statements or use an order other than left to right?
edit: removed the C tag since that started a sequence point discussion and there's plenty of that already.
Technically, that's a single expression, not one or more statements, though the distinction isn't super important.
Good old C makes only limited promises about what order expressions like this will be evaluated in. In your example, the order doesn't matter. But consider this expression:
a[i++] = i
If i was 1 before evaluating this expression, should a[1] now equal 1 or 2? As I understand the C specification, the behavior of this expression is undefined, meaning that what you get depends on which compiler you use.
Thanks to #mizo for a readable reference on sequence points, and to #ChrisDodd for pointing out that if you're making simple independent assignments that are separated by the comma operator, C and C++ do fully specify a left-to-right evaluation order.

Efficiently store and evaluate a large number of boolean expressions

I have a huge set (20000) of boolean expressions. They consist of AND, OR and NOT operators and a large number of boolean variables A1, A2, A3 ... (about 1000). Most expression contain only 5, maybe 20 of these variables.
Given an assignment of the variables (A1 = true, A2 = false, A3 = false ...) I have to find those expressions that evaluate to false.
The same set of expressions will be evaluated for multiple (10-100) assignments
For this purpose:
How should I store the expressions on disk so I can load and parse them fast (I currently have them either as some specialized DSL or as a more or less normalized (and dead slow) relational data structure, but I can change that)
Is there a fast algorithm / data structure for evaluating such expressions that I can use?
Do implementations on the JVM exist?
You may want to look at converting your expressions into Conjunctive Normal Form and combining like terms. You then can have a two-way mapping of an expression to a set of terms, any of which evaluating to false implies that the whole expression evaluates false. For each assignment of variables, start with a set of expressions, evaluate CNF terms until one evaluates to false. If that term is false, then all expressions involving that term will also be false, so those expressions can also be removed from the set.
Whether such an approach fits your case can't be said without looking at the expressions - with 1000 variables and 20000 expressions, it might not be that they have many CNF terms in common.
Outside of Java, and for much larger numbers of expressions, DNF is possibly more useful, since its implementation on the GPU is obvious.
The SOP answer to this is to store the expressions as strings in RPN (Reverse Polish Notation) and then write a simple Stack Machine parser to evaluate them.
Generally, an RPN string can be evaluated almost as fast as an already in-memory AST (Abstract Symbol Tree). And the stack machine parser is dead easy to write.
You seem attached to Java, but have you considered feeding these things to a language that has an eval() function? It would probably reduce the problem to saving an expression in a file and evaluating it. Note that if you don't trust the (source of the) expressions, this has security implications!
Jython comes to mind, but there are probably several that would make very short work of this.
If you're married to java, you could probably implement a recursive descent parser for boolean algebra. But that's quite a bit more involved.
UPDATE: The following site has code that might help.
Convert your list of expressions into source code for a function that when called with the value of the variables will evaluate all the functions and return an indication of which expressions evaluate to false. compile the function then call it for your different variable values.
I have done similar and used Python. The only parsing and interpretation I had to write was to translate the input boolean operators, '&', '|', '~' into their Python equivalents.
Your problem size seems quite OK for a Python solution.
You could build an index where for each variable you record two sets of expressions, those where the variable occurs positively and those where it occurs negatively. Depending on the values of the variables you collect those expressions which could become false due to this variable (positive occurrences if the variables is set to false and vice versa). Edit: These are just candidates, you still need to evaluate them to find out if they really become false.
Whether this helps compared to just evaluating all your expressions depends on the structure of your expressions and how many evaluate to false.
Try to convert them into CNF and use MiniSat to check whether the expression evaluates to true or false

What is the operator precedence order in Visual Basic 6.0?

What is the operator precedence order in Visual Basic 6.0 (VB6)?
In particular, for the logical operators.
Arithmetic Operation Precedence Order
^
- (unary negation)
*, /
\
Mod
+, - (binary addition/subtraction)
&
Comparison Operation Precedence Order
=
<>
<
>
<=
>=
Like, Is
Logical Operation Precedence Order
Not
And
Or
Xor
Eqv
Imp
Source: Sams Teach Yourself Visual Basic 6 in 24 Hours — Appendix A: Operator Precedence
It depends on whether or not you're in the debugger. Really. Well, sort of.
Parentheses come first, of course. Then arithmateic (+,-,*,/, etc). Then comparisons (>, <, =, etc). Then the logical operators. The trick is the order of execution within a given precedence level is not defined. Given the following expression:
If A < B And B < C Then
you are guaranteed the < inequality operators will both be evaluated before the logical And comparison. But you are not guaranteed which inequality comparison will be executed first.
IIRC, the debugger executes left to right, but the compiled application executes right to left. I could have them backwards (it's been a long time), but the important thing is they're different. The actual precedence doesn't change, but the order of execution might.
Use parentheses
EDIT: That's my advice for new code! But Oscar is reading someone else's code, so must figure it out somehow. I suggest the VBA manual topic Operator Precedence. VBA is 99% equivalent to VB6 - and expression evaluation is 100% equivalent. I have pasted the logical operator information here.
Logical operators are evaluated in the following order of precedence:
Not
And
Or
Xor
Eqv
Imp
The topic also explains precedence for comparison and arithmetic operators.
I would suggest once you have figured out the precendence, you put in parentheses unless there is some good reason not to edit the code.

Resources