Is 'X = someFunction() + 2` a statement or an expression? - expression

I read that an expression is anything that gives some value, like 2 + X, while a statement is any instruction to the computer to execute something, like print("hi").
What about the following line of code?
X = someFunction() + 2
someFunction() returns some numerical value (I think a lot of languages wouldn't compile this code if it didn't), and thus someFunction() + 2 is 'something that yields some value' - aka an expression.
But, someFunction() is code to be executed, thus a statement.
My question:
There are often lines of code that equal some value, but are also an instruction to be executed. What are these lines of code considered?

In certain computer languages called "functional languages", everything--including the code that prints "hi"--is an expression. At the other extreme, you can write code in machine language (so you, not a compiler, are deciding exactly what sequence of bytes should compose the executable program), and at that level practically everything (even adding 2 to something) is an "instruction to the computer to execute something".
I've used a lot of different computer languages, and a far as I can recall, in each case there was documentation somewhere defining what makes a statement in that particular language (if indeed the language even has a concept of "statement"). The definition is based on syntax, not so much on what the code does.
For example, in C or C++, if you write
{ x + 2; }
then technically the "x + 2;" is a statement. It is a useless statement that doesn't do anything, but syntactically, it is a statement nevertheless. In fact, one way to write a statement in C is to just append a semicolon to an expression (http://msdn.microsoft.com/en-us/library/1t054cy7.aspx). You don't even need the expression; a semicolon by itself can be a statement (http://msdn.microsoft.com/en-us/library/h7zyw61x.aspx).
By the way, in C++, the '+' in an expression such as (x + 2) may actually be a function call. So if you say anything that calls a function is a statement, then (x + 2) would be, or at least could be, a statement in C++. But I don't know any authority who defines it that way.

It varies by language, but ultimately: a statement is anything you can't embed inside another (simple, i.e. not a block) statement.
In C, your example is an expression, because you can do this:
while (X = someFunction() + 2) {
// ...
}
But in Python, the same thing is a syntax error, because = can only be a statement:
# nope!
while X = someFunction() + 2:
pass
In most languages, any expression can also be used as a statement by itself, though this may or may not be useful.
Calling a statement an "instruction to execute something" is a poor way to think about it, though. All code is an instruction to execute something.
A statement is more like a single complete thought. It's really just part of the syntax; depending on the language/compiler/runtime, a statement or expression may become very many machine instructions, or several statements might be reduced to just one instruction.

tldr; It is a statement when it is parsed as statement, and an expression when it is parsed as an expression. The rules of which depend upon the particular language in question.
Expressions and statements should not be confused with "what actually happens" underneath, but merely as describing the syntax constructs of a language's grammar.
Because the grammar and parsing rules [generally] depend on the program as a whole, taking part of an expression and using it as a statement, where such is allowed, does not indicate that it is a statement, much less when it appears in an expression context.
As for the particular example given, it depends on programming language and where the construct appears. Some languages support assignments as expressions, while others do not.
For instance, consider this JavaScript (see Appendix A of ES5 for the grammar rules).
{ x = y = f() + 2 }
In this case, the block is a statement (BlockStatement) and x = .. is also considered a "statement" (although it is really an Expression via Statement -> ExpressionStatement) while y = .. is an expression. Likewise, f() is an expression (technically, f is also an expression in JavaScript) and 2 is an expression and f() + 2 is an expression.
However, the following is invalid Pascal because Pascal's syntax does not support := (assignment) in an expression and an assignment is always a statement.
X := Y := F() + 2
Some languages also forbid general expressions as statements, which further throws off the notion that, in y = EXPR, it is correct to consider EXPR a valid statement. The following is invalid C#, but is dubiously valid in JavaScript and many other languages.
{ f() + 2; }

I would say that the "someFunction() + 2" is an expression being evaluated. Then I would say that "x = someFunction() + 2" is a statement, because most languages would generally evaluate the function's return plus two, and then assign that value to x.

Related

Ruby case/when vs if/elsif

The case/when statements remind me of try/catch statements in Python, which are fairly expensive operations. Is this similar with the Ruby case/when statements? What advantages do they have, other than perhaps being more concise, to if/elsif Ruby statements? When would I use one over the other?
The case expression is not at all like a try/catch block. The Ruby equivalents to try and catch are begin and rescue.
In general, the case expression is used when you want to test one value for several conditions. For example:
case x
when String
"You passed a string but X is supposed to be a number. What were you thinking?"
when 0
"X is zero"
when 1..5
"X is between 1 and 5"
else
"X isn't a number we're interested in"
end
The case expression is orthogonal to the switch statement that exists in many other languages (e.g. C, Java, JavaScript), though Python doesn't include any such thing. The main difference with case is that it is an expression rather than a statement (so it yields a value) and it uses the === operator for equality, which allows us to express interesting things like "Is this value a String? Is it 0? Is it in the range 1..5?"
Ruby's begin/rescue/end is more similar to Python's try/catch (assuming Python's try/catch is similar to Javascript, Java, etc.). In both of the above the code runs, catches errors and continues.
case/when is like C's switch and ignoring the === operator that bjhaid mentions operates very much like if/elseif/end. Which you use is up to you, but there are some advantages to using case when the number of conditionals gets long. No one likes /if/elsif/elsif/elsif/elsif/elsif/end :-)
Ruby has some other magical things involving that === operator that can make case nice, but I'll leave that to the documentation which explains it better than I can.

Pythonesque blocks and postfix expressions

In JavaScript,
f = function(x) {
return x + 1;
}
(5)
seems at a glance as though it should assign f the successor function, but actually assigns the value 6, because the lambda expression followed by parentheses is interpreted by the parser as a postfix expression, specifically a function call. Fortunately this is easy to fix:
f = function(x) {
return x + 1;
};
(5)
behaves as expected.
If Python allowed a block in a lambda expression, there would be a similar problem:
f = lambda(x):
return x + 1
(5)
but this time we can't solve it the same way because there are no semicolons. In practice Python avoids the problem by not allowing multiline lambda expressions, but I'm working on a language with indentation-based syntax where I do want multiline lambda and other expressions, so I'm trying to figure out how to avoid having a block parse as the start of a postfix expression. Thus far I'm thinking maybe each level of the recursive descent parser should have a parameter along the lines of 'we have already eaten a block in this statement so don't do postfix'.
Are there any existing languages that encounter this problem, and how do they solve it if so?
Python has semicolons. This is perfectly valid (though ugly and not recommended) Python code: f = lambda(x): x + 1; (5).
There are many other problems with multi-line lambdas in otherwise standard Python syntax though. It is completely incompatible with how Python handles indentation (whitespace in general, actually) inside expressions - it doesn't, and that's the complete opposite of what you want. You should read the numerous python-ideas thread about multi-line lambdas. It's somewhere between very hard to impossible.
If you want arbitrarily complex compound statements inside lambdas you can't use the existing rules for multi-line expressions even if you made all statements expressions. You'd have to change the indentation handling (see the language reference for how it works right now) so that expressions can also contain blocks. This is hard to do without breaking perfectly fine Python code, and will certainly result in a language many Python programmers will consider worse in several regards: Harder to understand, more complex to implement, permits some stupid errors, etc.
Most languages don't solve this exact problem at all. Most candidates (Scala, Ruby, Lisps, and variants of these three) have explicit end-of-block tokens. I know of two languages that have the same problem, one of which (Haskell) has been mentioned by another answer. Coffeescript also uses indentation without end-of-block tokens. It parses the transliteration of your example correctly. However, I could not find any specification of how or why it does this (and I won't dig through the parser source code). Both differ significantly from Python in syntax as well as design philosophy, so their solution is of little (if any) use for Python.
In Haskell, there is an implicit semicolon whenever you start a line with the same indentation as a previous one, assuming the parser is in a layout-sensitive mode.
More specifically, after a token is encountered that signals the start of a (layout-sensitive) block, the indentation level of the first token of the first block item is remembered. Each line that is indented more continues the current block item; each line that is indented the same starts a new block item, and the first line that is indented less implies the closure of the block.
How your last example would be treated depends on whether the f = is a block item in some block or not. If it is, then there will be an implicit semicolon between the lambda expression and the (5), since the latter is indented the same as the former. If it is not, then the (5) will be treated as continuing whatever block item the f = is a part of, making it an argument to the lamda function.
The details are a bit messier than this; look at the Haskell 2010 report.

Multiple statements in mathematica function

I wanted to know, how to evaluate multiple statements in a function in Mathematica.
E.g.
f[x_]:=x=x+5 and then return x^2
I know this much can be modified as (x+5)^2 but originally I wanted to read data from the file in the function and print the result after doing some data manipulation.
If you want to group several commands and output the last use the semicolon (;) between them, like
f[y_]:=(x=y+5;x^2)
Just don't use a ; for the last statement.
If your set of commands grows bigger you might want to use scoping structures like Module or Block.
You are looking for CompoundExpression (short form ;):
f[x_]:= (thing = x+5 ; thing^2)
The parentheses are necessary due to the very low precedence of ;.
As Szabolcs called me on, you cannot write:
f[x_]:= (x = x+5 ; x^2)
See this answer for an explanation and alternatives.
Leonid, who you should listen to, says that thing should be localized. I didn't do this above because I wanted to emphasize CompoundExpression as a specific fit for your "and then" construct. As it is written, this will affect the global value of thing which may or may not be what you actually want to do. If it is not, see both the answer linked above, and also:
Mathematica Module versus With or Block - Guideline, rule of thumb for usage?
Several people have mentioned already that you can use CompoundExpression:
f[x_] := (y=x+5; y^2)
However, if you use the same variable x in the expression as in the argument,
f[x_] := (x=x+5; x^2)
then you'll get errors when evaluating the function with a number. This is because := essentially defines a replacement of the pattern variables from the lhs, i.e. f[1] evaluates to the (incorrect) (1 = 1+5; 1^2).
So, as Sjoerd said, use Module (or Block sometimes, but this one has caveats!) to localize a function-variable:
f[x_] := Module[{y}, y=x+5; y^2]
Finally, if you need a function that modified its arguments, then you can set the attribute HoldAll:
Clear[addFive]
SetAttributes[addFive, HoldAll]
addFive[x_] := (x=x+5)
Then use it as
a = 3;
addFive[a]
a

OCaml delimiters and scopes

I'm learning OCaml and although I have years of experience with imperative programming languages (C, C++, Java) I'm getting some problems with delimiters between declarations or expressions in OCaml syntax.
Basically I understood that I have to use ; to concatenate expressions and the value returned by the sequence will be the one of last expression used, so for example if I have
exp1; exp2; exp3
it will be considered as an expression that returns the value of exp3. Starting from this I could use
let t = something in exp1; exp2; exp3
and it should be ok, right?
When am I supposed to use the double semicol ;;? What does it exactly mean?
Are there other delimiters that I must use to avoid syntax errors?
I'll give you an example:
let rec satisfy dtmc state pformula =
match (state, pformula) with
(state, `Next sformula) ->
let s = satisfy_each dtmc sformula
and adder a state =
let p = 0.;
for i = 0 to dtmc.matrix.rows do
p <- p +. get dtmc.matrix i state.index
done;
a +. p
in
List.fold_left adder 0. s
| _ -> []
It gives me syntax error on | but I don't get why.. what am I missing? This is a problem that occurs often and I have to try many different solutions until it suddently works :/
A side question: declaring with let instead that let .. in will define a var binding that lasts whenever after it has been defined?
What I basically ask is: what are the delimiters I have to use and when I have to use them. In addition are there differences I should consider while using the interpreter ocaml instead that the compiler ocamlc?
Thanks in advance!
The ;; delimiter terminates a top-level entity. In the ocaml toplevel (interpreter), it signals to the interpreter that a particular piece of input is finished and should be evaluated.
In programs to be compiled with ocamlc or ocamlopt, you don't need it near as often, as consecutive top-level let (without in), module, type, exception, and similar statements automatically signal the beginning of a new "phrase". If you include a top-level expression in a module that is to be evaluated only for its side-effects (such as generating some output or registering a module), you'll need a ;; before it to tell the compiler to stop compiling the previous phrase and start compiling a new thing. Otherwise, if the previous thing is a let, it will assume that the new expression is part of the let. For example:
let msg = "Hello, world";; (* we need ;; here *)
print_endline msg;; (* ;; is optional here, unless we have another expression *)
When you do and don't need ;; is somewhat subtle, so I usually terminate all my module-level entities with it so I don't have to worry about when it is and isn't needed.
; is used to separate sequential "statements" within a single expression. So foo; bar is a single sequential expression composed of foo and bar, while foo;; bar is only valid at the top level of a module and signifies two expressions.
On let without in: that construct is only valid in a module definition and variables so bound will be bound through the end of the module. Often, this is just the end of the file; if you have nested modules, however, its scope can be more limited. It does not work inside another expression or definition such as a function definition, unless it is within a local module definition.
let p = 0.;
This is the error. The ; needs to be an in. You can't use let without in only to define global functions, you can't use it inside an expression.
A side question: declaring with let instead that let .. in will define a var binding that lasts whenever after it has been defined?
You can only ever use one or the other (except in the interactive interpreter where you are allowed to mix expressions and definitions). When defining a global function or value, you need let without in. Inside an expression you need let with in.
;; is used to terminate input and start interpreting in ocaml REPL, it has no special meaning when compiling with ocamlc or ocamlopt.
You cannot assign to arbitrary value with <- operator, you have to use ref type for mutable variables:
let p = ref 0. in
for i = 0 to dtmc.matrix.rows do
p := !p +. get dtmc.matrix i state.index
done;
a +. !p

Why use short-circuit code?

Related Questions: Benefits of using short-circuit evaluation, Why would a language NOT use Short-circuit evaluation?, Can someone explain this line of code please? (Logic & Assignment operators)
There are questions about the benefits of a language using short-circuit code, but I'm wondering what are the benefits for a programmer? Is it just that it can make code a little more concise? Or are there performance reasons?
I'm not asking about situations where two entities need to be evaluated anyway, for example:
if($user->auth() AND $model->valid()){
$model->save();
}
To me the reasoning there is clear - since both need to be true, you can skip the more costly model validation if the user can't save the data.
This also has a (to me) obvious purpose:
if(is_string($userid) AND strlen($userid) > 10){
//do something
};
Because it wouldn't be wise to call strlen() with a non-string value.
What I'm wondering about is the use of short-circuit code when it doesn't effect any other statements. For example, from the Zend Application default index page:
defined('APPLICATION_PATH')
|| define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application'));
This could have been:
if(!defined('APPLICATION_PATH')){
define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application'));
}
Or even as a single statement:
if(!defined('APPLICATION_PATH'))
define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application'));
So why use the short-circuit code? Just for the 'coolness' factor of using logic operators in place of control structures? To consolidate nested if statements? Because it's faster?
For programmers, the benefit of a less verbose syntax over another more verbose syntax can be:
less to type, therefore higher coding efficiency
less to read, therefore better maintainability.
Now I'm only talking about when the less verbose syntax is not tricky or clever in any way, just the same recognized way of doing, but in fewer characters.
It's often when you see specific constructs in one language that you wish the language you use could have, but didn't even necessarily realize it before. Some examples off the top of my head:
anonymous inner classes in Java instead of passing a pointer to a function (way more lines of code).
in Ruby, the ||= operator, to evaluate an expression and assign to it if it evaluates to false or is null. Sure, you can achieve the same thing by 3 lines of code, but why?
and many more...
Use it to confuse people!
I don't know PHP and I've never seen short-circuiting used outside an if or while condition in the C family of languages, but in Perl it's very idiomatic to say:
open my $filehandle, '<', 'filename' or die "Couldn't open file: $!";
One advantage of having it all in one statement is the variable declaration. Otherwise you'd have to say:
my $filehandle;
unless (open $filehandle, '<', 'filename') {
die "Couldn't open file: $!";
}
Hard to claim the second one is cleaner in that case. And it'd be wordier still in a language that doesn't have unless
I think your example is for the coolness factor. There's no reason to write code like that.
EDIT: I have no problem with doing it for idiomatic reasons. If everyone else who uses a language uses short-circuit evaluation to make statement-like entities that everyone understands, then you should too. However, my experience is that code of that sort is rarely written in C-family languages; proper form is just to use the "if" statement as normal, which separates the conditional (which presumably has no side effects) from the function call that the conditional controls (which presumably has many side effects).
Short circuit operators can be useful in two important circumstances which haven't yet been mentioned:
Case 1. Suppose you had a pointer which may or may not be NULL and you wanted to check that it wasn't NULL, and that the thing it pointed to wasn't 0. However, you must not dereference the pointer if it's NULL. Without short-circuit operators, you would have to do this:
if (a != NULL) {
if (*a != 0) {
⋮
}
}
However, short-circuit operators allow you to write this more compactly:
if (a != NULL && *a != 0) {
⋮
}
in the certain knowledge that *a will not be evaluated if a is NULL.
Case 2. If you want to set a variable to a non-false value returned from one of a series of functions, you can simply do:
my $file = $user_filename ||
find_file_in_user_path() ||
find_file_in_system_path() ||
$default_filename;
This sets the value of $file to $user_filename if it's present, or the result of find_file_in_user_path(), if it's true, or … so on. This is seen perhaps more often in Perl than C, but I have seen it in C.
There are other uses, including the rather contrived examples which you cite above. But they are a useful tool, and one which I have missed when programming in less complex languages.
Related to what Dan said, I'd think it all depends on the conventions of each programming language. I can't see any difference, so do whatever is idiomatic in each programming language. One thing that could make a difference that comes to mind is if you had to do a series of checks, in that case the short-circuiting style would be much clearer than the alternative if style.
What if you had a expensive to call (performance wise) function that returned a boolean on the right hand side that you only wanted called if another condition was true (or false)? In this case Short circuiting saves you many CPU cycles. It does make the code more concise because of fewer nested if statements. So, for all the reasons you listed at the end of your question.
The truth is actually performance. Short circuiting is used in compilers to eliminate dead code saving on file size and execution speed. At run-time short-circuiting does not execute the remaining clause in the logical expression if their outcome does not affect the answer, speeding up the evaluation of the formula. I am struggling to remember an example. e.g
a AND b AND c
There are two terms in this formula evaluated left to right.
if a AND b evaluates to FALSE then the next expression AND c can either be FALSE AND TRUE or FALSE AND FALSE. Both evaluate to FALSE no matter what the value of c is. Therefore the compiler does not include AND c in the compiled format hence short-circuiting the code.
To answer the question there are special cases when the compiler cannot determine whether the logical expression has a constant output and hence would not short-circuit the code.
Think of it this way, if you have a statement like
if( A AND B )
chances are if A returns FALSE you'll only ever want to evaluate B in rare special cases. For this reason NOT using short ciruit evaluation is confusing.
Short circuit evaluation also makes your code more readable by preventing another bracketed indentation and brackets have a tendency to add up.

Resources