OCaml delimiters and scopes - syntax

I'm learning OCaml and although I have years of experience with imperative programming languages (C, C++, Java) I'm getting some problems with delimiters between declarations or expressions in OCaml syntax.
Basically I understood that I have to use ; to concatenate expressions and the value returned by the sequence will be the one of last expression used, so for example if I have
exp1; exp2; exp3
it will be considered as an expression that returns the value of exp3. Starting from this I could use
let t = something in exp1; exp2; exp3
and it should be ok, right?
When am I supposed to use the double semicol ;;? What does it exactly mean?
Are there other delimiters that I must use to avoid syntax errors?
I'll give you an example:
let rec satisfy dtmc state pformula =
match (state, pformula) with
(state, `Next sformula) ->
let s = satisfy_each dtmc sformula
and adder a state =
let p = 0.;
for i = 0 to dtmc.matrix.rows do
p <- p +. get dtmc.matrix i state.index
done;
a +. p
in
List.fold_left adder 0. s
| _ -> []
It gives me syntax error on | but I don't get why.. what am I missing? This is a problem that occurs often and I have to try many different solutions until it suddently works :/
A side question: declaring with let instead that let .. in will define a var binding that lasts whenever after it has been defined?
What I basically ask is: what are the delimiters I have to use and when I have to use them. In addition are there differences I should consider while using the interpreter ocaml instead that the compiler ocamlc?
Thanks in advance!

The ;; delimiter terminates a top-level entity. In the ocaml toplevel (interpreter), it signals to the interpreter that a particular piece of input is finished and should be evaluated.
In programs to be compiled with ocamlc or ocamlopt, you don't need it near as often, as consecutive top-level let (without in), module, type, exception, and similar statements automatically signal the beginning of a new "phrase". If you include a top-level expression in a module that is to be evaluated only for its side-effects (such as generating some output or registering a module), you'll need a ;; before it to tell the compiler to stop compiling the previous phrase and start compiling a new thing. Otherwise, if the previous thing is a let, it will assume that the new expression is part of the let. For example:
let msg = "Hello, world";; (* we need ;; here *)
print_endline msg;; (* ;; is optional here, unless we have another expression *)
When you do and don't need ;; is somewhat subtle, so I usually terminate all my module-level entities with it so I don't have to worry about when it is and isn't needed.
; is used to separate sequential "statements" within a single expression. So foo; bar is a single sequential expression composed of foo and bar, while foo;; bar is only valid at the top level of a module and signifies two expressions.
On let without in: that construct is only valid in a module definition and variables so bound will be bound through the end of the module. Often, this is just the end of the file; if you have nested modules, however, its scope can be more limited. It does not work inside another expression or definition such as a function definition, unless it is within a local module definition.

let p = 0.;
This is the error. The ; needs to be an in. You can't use let without in only to define global functions, you can't use it inside an expression.
A side question: declaring with let instead that let .. in will define a var binding that lasts whenever after it has been defined?
You can only ever use one or the other (except in the interactive interpreter where you are allowed to mix expressions and definitions). When defining a global function or value, you need let without in. Inside an expression you need let with in.

;; is used to terminate input and start interpreting in ocaml REPL, it has no special meaning when compiling with ocamlc or ocamlopt.
You cannot assign to arbitrary value with <- operator, you have to use ref type for mutable variables:
let p = ref 0. in
for i = 0 to dtmc.matrix.rows do
p := !p +. get dtmc.matrix i state.index
done;
a +. !p

Related

ocaml: Basic syntax for function of several arguments

I am learning OCaml and I'm a complete beginner at this point. I'm trying to get used to the syntax and I just spent 15 minutes debugging a stupid syntax error.
let foo a b = "bar";;
let biz = foo 2. -1.;;
I was getting an error This expression has type 'a -> string but an expression was expected of type int. I resolved the error, but it prompted me to learn what is the best way to handle this syntax peculiarity.
Basically OCaml treats what I intended as the numeric constant -1. as two separate tokens: - and 1. and I end up passing just 1 argument to foo. In other languages I'm familiar with this doesn't happen because arguments are separated with a comma (or in Scheme there are parentheses).
What is the usual way to handle this syntax peculiarity in OCaml? Is it surrounding the number with parentheses (foo 2. (-1.)) or there is some other way?
There is an unary minus operator ~-. that can be used to avoid this issue: foo ~-.1. (and its integer counterpart ~-) but it is generally simpler to add parentheses around the problematic expression.

fat arrow in Idris

I hope this question is appropriate for this site, it's just about the choice of concrete syntax in Idris compared to Haskell, since both are very similar. I guess it's not that important, but I'm very curious about it. Idris uses => for some cases where Haskell uses ->. So far I've seen that Idris only uses -> in function types and => for other things like lambdas and case _ of. Did this choice come from realizing that it's useful in practice to have a clear syntactical distinction between these use cases? Is it just an arbitrary cosmetic choice and I'm overthinking it?
Well, in Haskell, type signatures and values are in different namespaces, so something defined in one is at no risk of clashing with something in the other. In Idris, types and values occupy the same namespace, which is why you don't see e.g. data Foo = Foo as you would in Haskell, but rather, data Foo = MkFoo - the type is called Foo, and the constructor is called MkFoo, as there is already a value (the type Foo), bound to the name Foo, e.g. data Pair = MkPair http://docs.idris-lang.org/en/latest/tutorial/typesfuns.html#tuples
So it's probably for the best it didn't try to use the arrow used to construct the type of functions, with the arrow used for lambdas - those are rather different things. You can combine them with e.g. the (Int -> Int) (\x => x).
I think it is because they interpret the -> symbol differently.
From Wikipedia:
A => B means if A is true then B is also true; if A is false then nothing is said about B
which seems right for case expressions, and
-> may mean the same as =>, or it may have the meaning for functions given below
which is
f: X -> Y means the function f maps the set X into the set Y
So my guess is that Idris just uses -> for the narrow second meaning, i.e. for mapping one type to another in type signatures, whereas Haskell uses the broader interpretation, where it means the same as =>.

Is 'X = someFunction() + 2` a statement or an expression?

I read that an expression is anything that gives some value, like 2 + X, while a statement is any instruction to the computer to execute something, like print("hi").
What about the following line of code?
X = someFunction() + 2
someFunction() returns some numerical value (I think a lot of languages wouldn't compile this code if it didn't), and thus someFunction() + 2 is 'something that yields some value' - aka an expression.
But, someFunction() is code to be executed, thus a statement.
My question:
There are often lines of code that equal some value, but are also an instruction to be executed. What are these lines of code considered?
In certain computer languages called "functional languages", everything--including the code that prints "hi"--is an expression. At the other extreme, you can write code in machine language (so you, not a compiler, are deciding exactly what sequence of bytes should compose the executable program), and at that level practically everything (even adding 2 to something) is an "instruction to the computer to execute something".
I've used a lot of different computer languages, and a far as I can recall, in each case there was documentation somewhere defining what makes a statement in that particular language (if indeed the language even has a concept of "statement"). The definition is based on syntax, not so much on what the code does.
For example, in C or C++, if you write
{ x + 2; }
then technically the "x + 2;" is a statement. It is a useless statement that doesn't do anything, but syntactically, it is a statement nevertheless. In fact, one way to write a statement in C is to just append a semicolon to an expression (http://msdn.microsoft.com/en-us/library/1t054cy7.aspx). You don't even need the expression; a semicolon by itself can be a statement (http://msdn.microsoft.com/en-us/library/h7zyw61x.aspx).
By the way, in C++, the '+' in an expression such as (x + 2) may actually be a function call. So if you say anything that calls a function is a statement, then (x + 2) would be, or at least could be, a statement in C++. But I don't know any authority who defines it that way.
It varies by language, but ultimately: a statement is anything you can't embed inside another (simple, i.e. not a block) statement.
In C, your example is an expression, because you can do this:
while (X = someFunction() + 2) {
// ...
}
But in Python, the same thing is a syntax error, because = can only be a statement:
# nope!
while X = someFunction() + 2:
pass
In most languages, any expression can also be used as a statement by itself, though this may or may not be useful.
Calling a statement an "instruction to execute something" is a poor way to think about it, though. All code is an instruction to execute something.
A statement is more like a single complete thought. It's really just part of the syntax; depending on the language/compiler/runtime, a statement or expression may become very many machine instructions, or several statements might be reduced to just one instruction.
tldr; It is a statement when it is parsed as statement, and an expression when it is parsed as an expression. The rules of which depend upon the particular language in question.
Expressions and statements should not be confused with "what actually happens" underneath, but merely as describing the syntax constructs of a language's grammar.
Because the grammar and parsing rules [generally] depend on the program as a whole, taking part of an expression and using it as a statement, where such is allowed, does not indicate that it is a statement, much less when it appears in an expression context.
As for the particular example given, it depends on programming language and where the construct appears. Some languages support assignments as expressions, while others do not.
For instance, consider this JavaScript (see Appendix A of ES5 for the grammar rules).
{ x = y = f() + 2 }
In this case, the block is a statement (BlockStatement) and x = .. is also considered a "statement" (although it is really an Expression via Statement -> ExpressionStatement) while y = .. is an expression. Likewise, f() is an expression (technically, f is also an expression in JavaScript) and 2 is an expression and f() + 2 is an expression.
However, the following is invalid Pascal because Pascal's syntax does not support := (assignment) in an expression and an assignment is always a statement.
X := Y := F() + 2
Some languages also forbid general expressions as statements, which further throws off the notion that, in y = EXPR, it is correct to consider EXPR a valid statement. The following is invalid C#, but is dubiously valid in JavaScript and many other languages.
{ f() + 2; }
I would say that the "someFunction() + 2" is an expression being evaluated. Then I would say that "x = someFunction() + 2" is a statement, because most languages would generally evaluate the function's return plus two, and then assign that value to x.

Pythonesque blocks and postfix expressions

In JavaScript,
f = function(x) {
return x + 1;
}
(5)
seems at a glance as though it should assign f the successor function, but actually assigns the value 6, because the lambda expression followed by parentheses is interpreted by the parser as a postfix expression, specifically a function call. Fortunately this is easy to fix:
f = function(x) {
return x + 1;
};
(5)
behaves as expected.
If Python allowed a block in a lambda expression, there would be a similar problem:
f = lambda(x):
return x + 1
(5)
but this time we can't solve it the same way because there are no semicolons. In practice Python avoids the problem by not allowing multiline lambda expressions, but I'm working on a language with indentation-based syntax where I do want multiline lambda and other expressions, so I'm trying to figure out how to avoid having a block parse as the start of a postfix expression. Thus far I'm thinking maybe each level of the recursive descent parser should have a parameter along the lines of 'we have already eaten a block in this statement so don't do postfix'.
Are there any existing languages that encounter this problem, and how do they solve it if so?
Python has semicolons. This is perfectly valid (though ugly and not recommended) Python code: f = lambda(x): x + 1; (5).
There are many other problems with multi-line lambdas in otherwise standard Python syntax though. It is completely incompatible with how Python handles indentation (whitespace in general, actually) inside expressions - it doesn't, and that's the complete opposite of what you want. You should read the numerous python-ideas thread about multi-line lambdas. It's somewhere between very hard to impossible.
If you want arbitrarily complex compound statements inside lambdas you can't use the existing rules for multi-line expressions even if you made all statements expressions. You'd have to change the indentation handling (see the language reference for how it works right now) so that expressions can also contain blocks. This is hard to do without breaking perfectly fine Python code, and will certainly result in a language many Python programmers will consider worse in several regards: Harder to understand, more complex to implement, permits some stupid errors, etc.
Most languages don't solve this exact problem at all. Most candidates (Scala, Ruby, Lisps, and variants of these three) have explicit end-of-block tokens. I know of two languages that have the same problem, one of which (Haskell) has been mentioned by another answer. Coffeescript also uses indentation without end-of-block tokens. It parses the transliteration of your example correctly. However, I could not find any specification of how or why it does this (and I won't dig through the parser source code). Both differ significantly from Python in syntax as well as design philosophy, so their solution is of little (if any) use for Python.
In Haskell, there is an implicit semicolon whenever you start a line with the same indentation as a previous one, assuming the parser is in a layout-sensitive mode.
More specifically, after a token is encountered that signals the start of a (layout-sensitive) block, the indentation level of the first token of the first block item is remembered. Each line that is indented more continues the current block item; each line that is indented the same starts a new block item, and the first line that is indented less implies the closure of the block.
How your last example would be treated depends on whether the f = is a block item in some block or not. If it is, then there will be an implicit semicolon between the lambda expression and the (5), since the latter is indented the same as the former. If it is not, then the (5) will be treated as continuing whatever block item the f = is a part of, making it an argument to the lamda function.
The details are a bit messier than this; look at the Haskell 2010 report.

Multiple statements in mathematica function

I wanted to know, how to evaluate multiple statements in a function in Mathematica.
E.g.
f[x_]:=x=x+5 and then return x^2
I know this much can be modified as (x+5)^2 but originally I wanted to read data from the file in the function and print the result after doing some data manipulation.
If you want to group several commands and output the last use the semicolon (;) between them, like
f[y_]:=(x=y+5;x^2)
Just don't use a ; for the last statement.
If your set of commands grows bigger you might want to use scoping structures like Module or Block.
You are looking for CompoundExpression (short form ;):
f[x_]:= (thing = x+5 ; thing^2)
The parentheses are necessary due to the very low precedence of ;.
As Szabolcs called me on, you cannot write:
f[x_]:= (x = x+5 ; x^2)
See this answer for an explanation and alternatives.
Leonid, who you should listen to, says that thing should be localized. I didn't do this above because I wanted to emphasize CompoundExpression as a specific fit for your "and then" construct. As it is written, this will affect the global value of thing which may or may not be what you actually want to do. If it is not, see both the answer linked above, and also:
Mathematica Module versus With or Block - Guideline, rule of thumb for usage?
Several people have mentioned already that you can use CompoundExpression:
f[x_] := (y=x+5; y^2)
However, if you use the same variable x in the expression as in the argument,
f[x_] := (x=x+5; x^2)
then you'll get errors when evaluating the function with a number. This is because := essentially defines a replacement of the pattern variables from the lhs, i.e. f[1] evaluates to the (incorrect) (1 = 1+5; 1^2).
So, as Sjoerd said, use Module (or Block sometimes, but this one has caveats!) to localize a function-variable:
f[x_] := Module[{y}, y=x+5; y^2]
Finally, if you need a function that modified its arguments, then you can set the attribute HoldAll:
Clear[addFive]
SetAttributes[addFive, HoldAll]
addFive[x_] := (x=x+5)
Then use it as
a = 3;
addFive[a]
a

Resources