Using 'define' in Scheme - scheme

I'm new to Scheme and was just curious about 'define'. I've seen things like:
(define (square x) (* x x))
which makes sense [Function name 'square' input parameter 'x']. However, I found some example code from the 90's and am trying to make sense of:
(define (play-loop-iter strat0 strat1 count history0 history1 limit) (~Code for function~)
Except for the function name, are all of those input parameters?

Short answer - yes, all the symbols after the first one are parameters for the procedure (the first one being the procedures's name). Also it's good to point out that this:
(define (f x y)
(+ x y))
Is just syntactic sugar for this, and both forms are equivalent:
(define f
(lambda (x y)
(+ x y)))
In general - you use the special form define for binding a name to a value, that value can be any data type available, including in particular functions (lambdas).
A bit more about parameters and procedure definitions - it's good to know that the . notation can be used for defining procedures with a variable number of arguments, for example:
(define (f . x) ; here `x` is a list with all the parameters
(apply + x))
(f 1 2 3 4 5) ; 0 or more parameters can be passed
=> 15
And one final trick with define (not available in all interpreters, but works in Racket). A quick shortcut for defining procedures that return procedures, like this one:
(define (f x)
(lambda (y)
(+ x y)))
... Which is equivalent to this, shorter syntax:
(define ((f x) y)
(+ x y))
((f 1) 2)
=> 3

Yes, strat0 through limit are the parameters of the play-loop-iter function.

The general form for define is:
(define (desired-name-of-procedure item-1 item-2 item-3 ... item-n)
(; what to do with the items))
Another way to explain the behaviour of define, is in terms of "means of combination", and "means of abstraction".
[A] The means of combination in simple terms:
The syntax (item-1 item-2 item-3 ... ... item-n) is the fundamental means of combination provided by Scheme (and Lisp in general.)
All code is a list represented using the above pattern
The very first (leftmost) item is always treated as an operator
Parentheses enforce the application of the operator... The leftmost item is required to accept all the items that follow, as arguments
[B] means of abstraction is simply; a way to name things.
An example will demonstrate how this all folds into the idea of the define primitive...
Example--Arriving at define in a bottom-up way
Consider this expression:
(lambda (x y) (* x y))
In plain English, the above expression translates to "Create a nameless procedure that accepts two arguments, and returns the value of the their product". Note that this generates a nameless procedure.
More accurately, in terms of means of combination, Scheme provides us the keyword lambda as a primitive operator that creates user-defined procedures.
The leftmost item--lambda--is passed items (x y) and (* x y) as arguments, and the operator-application rule forces lambda to do something with the items.
The way lambda is defined internally causes it to parse the list (x y), and treat x and y as arguments to pass to the list (* x y), which lambda assumes is the user's definition of what to do when arguments x and y are encountered. Any value assigned to x and y will be processed in accordance with the rule (* x y).
Enter, means of abstraction...
Suppose I wanted to refer to this type of multiplication at several places in my program, I might tweak the above lambda expression like this:
(define mul-two-things (lambda (x y) (* x y)))
define takes mul-two-things and the lambda expression as arguments, and "binds" them together. Now Scheme knows that mul-two-things should be associated with a procedure to take two arguments and return their product.
As it happens, the requirement of naming procedures is so very common and provides so much power of expression, that Scheme provides a cleaner-looking shortcut to do it.
Like #oscar-lopez says, define is the "special form" Scheme provides, to name things. And as far as Scheme's Interpreter is concerned, both the following definitions are identical:
(define (mul-two-things x y) (* x y))
(define mul-two-things (lambda (x y) (* x y))

Related

When does a program "intertwine definition and use"?

Footnote #28 of SICP says the following:
Embedded definitions must come first in a procedure body. The management is not responsible for the consequences of running programs that intertwine definition and use.
What exactly does this mean? I understand:
"definitions" to be referring to both procedure creations and value assignments.
"a procedure body" to be as defined in section 1.1.4. It's the code that comes after the list of parameters in a procedure's definition.
"Embedded definitions must come first in a procedure body" to mean 'any definitions that are made and used in a procedure's body must come before anything else'.
"intertwine definition and use" to mean 'calling a procedure or assigned value before it has been defined;'
However, this understanding seems to contradict the answers to this question, which has answers that I can summarise as 'the error that your quote is referring to is about using definitions at the start of a procedure's body that rely on definitions that are also at the start of that body'. This has me triply confused:
That interpretation clearly contradicts what I've said above, but seems to have strong evidence - compiler rules - behind it.
SICP seems happy to put definition in a body with other definitions that use them. Just look at the sqrt procedure just above the footnote!
At a glance, it looks to me that the linked question's author's real error was treating num-prod like a value in their definition of num rather than as a procedure. However, the author clearly got it working, so I'm probably wrong.
So what's really going on? Where is the misunderstanding?
In a given procedure's definition / code,
"internal definitions" are forms starting with define.
"a procedure body" is all the other forms after the forms which start with define.
"embedded definitions must come first in a procedure body" means all the internal define forms must go first, then all the other internal forms. Once a non-define internal form appears, there can not appear an internal define form after that.
"no intertwined use" means no use of any name before it's defined. Imagine all internal define forms are gathered into one equivalent letrec, and follow its rules.
Namely, we have,
(define (proc args ...)
;; internal, or "embedded", definitions
(define a1 ...init1...)
(define a2 ...init2...)
......
(define an ...initn...)
;; procedure body
exp1 exp2 .... )
Any ai can be used in any of the initj expressions, but only inside a lambda expression.(*) Otherwise it would refer to the value of ai while aj is being defined, and that is forbidden, because any ai names are considered not yet defined while any of the initj expressions are being evaluated.
(*) Remember that (define (foo x) ...x...) is the same as (define foo (lambda (x) ...x...)). That's why the definitions in that sqrt procedure in the book you mention are OK -- they are all lambda expressions, and any name's use inside a lambda expression will only actually refer to that name's value when the lambda expression's value -- a lambda function -- will be called, not when that lambda expression is being evaluated, producing that lambda function which is its value.
The book is a bit vague at first with the precise semantics of its language but in Scheme the above code is equivalent to
(define proc
(lambda (args ...)
;; internal, or "embedded", definitions
(letrec ( (a1 ...init1...)
(a2 ...init2...)
......
(an ...initn...) )
;; procedure body
exp1 exp2 ....
)))
as can be seen explained, for instance, here, here or here.
For example,
;; or, equivalently,
(define (my-proc x) (define my-proc
(lambda (x)
(define (foo) a) (letrec ( (foo (lambda () a))
(define a x) (a x) )
;; my-proc's body ;; letrec's body
(foo)) (foo))))
first evaluates the lambda expression, (lambda () a), and binds the name foo to the result, a function; that function will refer to the value of a when (foo) will be called, so it's OK that there's a reference to a inside that lambda expression -- because when that lambda expression is evaluated, no value of a is immediately needed, just the reference to its future value, under the name a, is present there; i.e. the value that a will have after all the names in that letrec are initialized, and the body of letrec is entered. Or in other words, when all the internal defines are completed and the body proper of the procedure my-proc is entered.
So we see that foo is defined, but is not used during the initializations; a is defined but is not used during the initializations; thus all is well. But if we had e.g.
(define (my-proc x)
(define (foo) x) ; `foo` is defined as a function
(define a (foo)) ; `foo` is used by being called as a function
a)
then here foo is both defined and used during the initializations of the internal, or "embedded", defines; this is forbidden in Scheme. This is what the book is warning about: the internal definitions are only allowed to define stuff, but its use should be delayed for later, when we're done with the internal defines and enter the full procedure's body.
This is subtle, and as the footnote and the question you reference imply, the subtlties can vary depending on a particular language's implementation.
These issues will be covered in far more detail later in the book (Chapters 3 and 4) and, generally, the text avoids using internal definitions so that these issues can be avoided until they are examined in detail.
A key difference between between the code above the footnote:
(define (sqrt x)
(define (good-enough? guess)
(< (abs (- (square guess) x)) 0.001))
(define (improve guess)
(average guess (/ x guess)))
(define (sqrt-iter guess)
(if (good-enough? guess)
guess
(sqrt-iter (improve guess))))
(sqrt-iter 1.0))
and the code in the other question:
(define (pi-approx n)
(define (square x) (* x x))
(define (num-prod ind) (* (* 2 ind) (* 2 (+ ind 1)))) ; calculates the product in the numerator for a certain term
(define (denom-prod ind) (square (+ (* ind 2 ) 1))) ;Denominator product at index ind
(define num (product num-prod 1 inc n))
(define denom (product denom-prod 1 inc n))
is that all the defintions in former are procedure definitions, whereas num and denom are value defintions. The body of a procedure is not evaluated until that procedures is called. But a value definition is evaluated when the value is assigned.
With a value definiton:
(define sum (add 2 2))
(add 2 2) will evaluated when the definition is evaluated, if so add must already be defined. But with a procedure definition:
(define (sum n m) (add n m))
a procedure object will be assigned to sum but the procedure body is not yet evaluated so add does not need be defined when sum is defined, but must be by the time sum is called:
(sum 2 2)
As I say there is a lot sublty and a lot of variation so I'm not sure if the following is always true for every variation of scheme, but within 'SICP scheme' you can say..
Valid (order of evaluation of defines not significant):
;procedure body
(define (sum m n) (add m n))
(define (add m n) (+ m n))
(sum 2 2)
Also valid:
;procedure body
(define (sum) (add 2 2))
(define (add m n) (+ m n))
(sum)
Usually invalid (order of evaluation of defines is significant):
;procedure body
(define sum (add 2 2))
(define (add m n) (+ m n))
Whether the following is valid depends on the implementation:
;procedure body
(define (add m n) (+ m n))
(define sum (add 2 2))
And finally an example of intertwining definition and use, whether this works also depends on the implementation. IIRC, this would work with Scheme described in Chapter 4 of the book if the scanning out has been implemented.
;procedure body
(sum)
(define (sum) (add 2 2))
(define (add m n) (+ m n))
It is complicated and subtle, but the key points are:
value definitions are evaluated differently from procedure definitions,
behaviour inside blocks may be different from outside blocks,
there is variation between scheme implementations,
the book is designed so you don't have to worry much about this until Chapter 3,
the book will cover this in detail in Chapter 4.
You discovered one of the difficulties of scheme. And of lisp. Due to this kind of chicken and egg issue there is no single lisp, but lots of lisps appeared.
If the binding forms are not present in code in a letrec-logic in R5RS and letrec*-logic in R6RS, the semantics is undefined. Which means, everything depend on the will of the implementor of scheme.
See the paper Fixing Letrec: A Faithful Yet Efficient Implementation of Scheme’s Recursive Binding Construct.
Also, you can read the discussions from the mailing list from 1986, when there was no general consensus among different implementors of scheme.
Also, at MIT there were developed 2 versions of scheme -- the student version and the researchers' development version, and they behaved differently concerning the order of define forms.

Function Definitions After Expressions in Scheme

I understand I can't define a function inside an expression, but I'm getting errors defining one after an expression within the same nested function. I'm using R5RS with DrScheme.
For example, this throws an error, 'define: not allowed in an expression context':
(define (cube x)
(if (= x 0) 0) ;just for example, no reason to do this
(define (square x) (* x x))
(* x (square x)))
In R5RS you can only use define in rather specific places: in particular you can use it at the beginning (only) of forms which have a <body>. This is described in 5.2.2. That means in particular that internal defines can never occur after any other expression, even in a <body>.
Native Racket (or whatever the right name is for what you get with#lang racket) is much more flexible in this regard: what you are trying to do would (apart from the single-armed if) be legal in Racket.
You can use define inside another definition. Some Schemes won't allow you to have an if without the else part, though. This works for me:
(define (cube x)
(if (= x 0) 0 1) ; just for example, no reason to do this
(define (square x) (* x x))
(* x (square x)))
Have you tried making the definition at the beginning? maybe that could be a problem with your interpreter.
define inside a procedure body (that includes all forms that are syntactic sugar for procedures, like let, let*, and letrec) are only legal before other expressions and never after the first expression and it can not be the last form.
Your example shows no real reason for why you would want to have an expression before definitions. There is no way you can get anything more than moving all define up to the beginning. eg.
(define (cube x)
;; define moved to top
(define (square x) (* x x))
;; dead code moved under define and added missing alternative
(if (= x 0) 0 'undefined)
(* x (square x)))
If the code isn't dead. eg. it's the tail expression you can use let to make a new body:
(define (cube x)
(if (= x 0)
0
(let ()
;; this is top level in the let
(define (square x) (* x x))
;; required expression
(* x (square x)))))
I think perhaps we would need an example where you think it would be warranted to have the define after the expression and we'll be happy to show how to scheme it up.

what is difference between (define (add x y) (+ x y)) and (define add (lambda (x y) (+ x y)))?

I am now writing a scheme's interpreter by using c++. I got a question about define and lambda.
(define (add x y) (+ x y))
is expanded as
(define add (lambda (x y) (+ x y)))
by lispy.py
what's the difference between this two expression?
why need expand the expression? Just because it is easy to evaluate the expression?
They are the same, since one gets expanded into the other. The first expression is easier to both write & read; having it expand into the second simplifies the interpreter (something you should appreciate).
They are equivalent in R5RS:
5.2 Definitions
(define (<variable> <formals>) <body>)
<Formals> should be either a sequence of zero or more variables, or a
sequence of one or more variables followed by a space-delimited period
and another variable (as in a lambda expression). This form is
equivalent to
(define <variable>
(lambda (<formals>) <body>)).

Why does Scheme allow mutation to closed environment in a closure?

The following Scheme code
(let ((x 1))
(define (f y) (+ x y))
(set! x 2)
(f 3) )
which evaluates to 5 instead of 4. It is surprising considering Scheme promotes static scoping. Allowing subsequent mutation to affect bindings in the closed environment in a closure seems to revert to kinda dynamic scoping. Any specific reason that it is allowed?
EDIT:
I realized the code above is less obvious to reveal the problem I am concerned. I put another code fragment below:
(define x 1)
(define (f y) (+ x y))
(set! x 2)
(f 3) ; evaluates to 5 instead of 4
There are two ideas you are confusing here: scoping and indirection through memory. Lexical scope guarantees you that the reference to x always points to the binding of x in the let binding.
This is not violated in your example. Conceptually, the let binding is actually creating a new location in memory (containing 1) and that location is the value bound to x. When the location is dereferenced, the program looks up the current value at that memory location. When you use set!, it sets the value in memory. Only parties that have access to the location bound to x (via lexical scope) can access or mutate the contents in memory.
In contrast, dynamic scope allows any code to change the value you're referring to in f, regardless of whether you gave access to the location bound to x. For example,
(define f
(let ([x 1])
(define (f y) (+ x y))
(set! x 2)
f))
(let ([x 3]) (f 3))
would return 6 in an imaginary Scheme with dynamic scope.
Allowing such mutation is excellent. It allows you to define objects with internal state, accessible only through pre-arranged means:
(define (adder n)
(let ((x n))
(lambda (y)
(cond ((pair? y) (set! x (car y)))
(else (+ x y))))))
(define f (adder 1))
(f 5) ; 6
(f (list 10))
(f 5) ; 15
There is no way to change that x except through the f function and its established protocols - precisely because of lexical scoping in Scheme.
The x variable refers to a memory cell in the internal environment frame belonging to that let in which the internal lambda is defined - thus returning the combination of lambda and its defining environment, otherwise known as "closure".
And if you do not provide the protocols for mutating this internal variable, nothing can change it, as it is internal and we've long left the defining scope:
(set! x 5) ; WRONG: "x", what "x"? it's inaccessible!
EDIT: your new code, which changes the meaning of your question completely, there's no problem there as well. It is like we are still inside that defining environment, so naturally the internal variable is still accessible.
More problematic is the following
(define x 1)
(define (f y) (+ x y))
(define x 4)
(f 5) ;?? it's 9.
I would expect the second define to not interfere with the first, but R5RS says define is like set! in the top-level.
Closures package their defining environments with them. Top-level environment is always accessible.
The variable x that f refers to, lives in the top-level environment, and hence is accessible from any code in the same scope. That is to say, any code.
No, it is not dynamic scoping. Note that your define here is an internal definition, accessible only to the code inside the let. In specific, f is not defined at the module level. So nothing has leaked out.
Internal definitions are internally implemented as letrec (R5RS) or letrec* (R6RS). So, it's treated the same (if using R6RS semantics) as:
(let ((x 1))
(letrec* ((f (lambda (y) (+ x y))))
(set! x 2)
(f 3)))
My answer is obvious, but I don't think that anyone else has touched upon it, so let me say it: yes, it's scary. What you're really observing here is that mutation makes it very hard to reason about what your program is going to do. Purely functional code--code with no mutation--always produces the same result when called with the same inputs. Code with state and mutation does not have this property. It may be that calling a function twice with the same inputs will produce separate results.

Letrec and reentrant continuations

I have been told that the following expression is intended to evaluate to 0, but that many implementations of Scheme evaluate it as 1:
(let ((cont #f))
(letrec ((x (call-with-current-continuation (lambda (c) (set! cont c) 0)))
(y (call-with-current-continuation (lambda (c) (set! cont c) 0))))
(if cont
(let ((c cont))
(set! cont #f)
(set! x 1)
(set! y 1)
(c 0))
(+ x y))))
I must admit that I cannot tell where even to start with this. I understand the basics of continuations and call/cc, but can I get a detailed explanation of this expression?
This is an interesting fragment. I came across this question because I was searching for discussions of the exact differences between letrec and letrec*, and how these varied between different versions of the Scheme reports, and different Scheme implementations. While experimenting with this fragment, I did some research and will report the results here.
If you mentally walk through the execution of this fragment, two questions should be salient to you:
Q1. In what order are the initialization clauses for x and y evaluated?
Q2. Are all the initialization clauses evaluated first, and their results cached, and then all the assignments to x and y performed afterwards? Or are some of the assignments made before some of the initialization clauses have been evaluated?
For letrec, the Scheme reports say that the answer to Q1 is "unspecified." Most implementations will in fact evaluate the clauses in left-to-right order; but you shouldn't rely on that behavior.
Scheme R6RS and R7RS introduce a new binding construction letrec* that does specify left-to-right evaluation order. It also differs in some other ways from letrec, as we'll see below.
Returning to letrec, the Scheme reports going back at least as far as R5RS seem to specify that the answer to Q2 is "evaluate all the initialization clauses before making any of the assignments." I say "seem to specify" because the language isn't as explicit about this being required as it might be. As a matter of fact, many Scheme implementations don't conform to this requirement. And this is what's responsible for the difference between the "intended" and "observed" behavior wrt your fragment.
Let's walk through your fragment, with Q2 in mind. First we set aside two "locations" (reference cells) for x and y to be bound to. Then we evaluate one of the initialization clauses. Let's say it's the clause for x, though as I said, with letrec it could be either one. We save the continuation of this evaluation into cont. The result of this evaluation is 0. Now, depending on the answer to Q2, we either assign that result immediately to x or we cache it to make the assignment later. Next we evaluate the other initialization clause. We save its continuation into cont, overwriting the previous one. The result of this evaluation is 0. Now all of the initialization clauses have been evaluated. Depending on the answer to Q2, we might at this point assign the cached result 0 to x; or the assignment to x may have already occurred. In either case, the assignment to y takes place now.
Then we begin evaluating the main body of the (letrec (...) ...) expression (for the first time). There is a continuation stored in cont, so we retrieve it into c, then clear cont and set! each of x and y to 1. Then we invoke the retrieved continuation with the value 0. This goes back to the last-evaluated initialization clause---which we've assumed to be y's. The argument we supply to the continuation is then used in place of the (call-with-current-continuation (lambda (c) (set! cont c) 0)), and will be assigned to y. Depending on the answer to Q2, the assignment of 0 to x may or may not take place (again) at this point.
Then we begin evaluating the main body of the (letrec (...) ...) expression (for the second time). Now cont is #f, so we get (+ x y). Which will be either (+ 1 0) or (+ 0 0), depending on whether 0 was re-assigned to x when we invoked the saved continuation.
You can trace this behavior by decorating your fragment with some display calls, for example like this:
(let ((cont #f))
(letrec ((x (begin (display (list 'xinit x y cont)) (call-with-current-continuation (lambda (c) (set! cont c) 0))))
(y (begin (display (list 'yinit x y cont)) (call-with-current-continuation (lambda (c) (set! cont c) 0)))))
(display (list 'body x y cont))
(if cont
(let ((c cont))
(set! cont #f)
(set! x 1)
(set! y 1)
(c 'new))
(cons x y))))
I also replaced (+ x y) with (cons x y), and invoked the continuation with the argument 'new instead of 0.
I ran that fragment in Racket 5.2 using a couple of different "language modes", and also in Chicken 4.7. Here are the results. Both implementations evaluated the x init clause first and the y clause second, though as I said this behavior is unspecified.
Racket with #lang r5rs and #lang r6rs conforms to the spec for Q2, and so we get the "intended" result of re-assigning 0 to the other variable when the continuation is invoked. (When experimenting with r6rs, I needed to wrap the final result in a display to see what it would be.)
Here is the trace output:
(xinit #<undefined> #<undefined> #f)
(yinit #<undefined> #<undefined> #<continuation>)
(body 0 0 #<continuation>)
(body 0 new #f)
(0 . new)
Racket with #lang racket and Chicken don't conform to that. Instead, after each initialization clause is evaluated, it gets assigned to the corresponding variable. So when the continuation is invoked, it only ends up re-assigning a value to the final value.
Here is the trace output, with some added comments:
(xinit #<undefined> #<undefined> #f)
(yinit 0 #<undefined> #<continuation>) ; note that x has already been assigned
(body 0 0 #<continuation>)
(body 1 new #f) ; so now x is not re-assigned
(1 . new)
Now, as to what the Scheme reports really do require. Here is the relevant section from R5RS:
library syntax: (letrec <bindings> <body>)
Syntax: <Bindings> should have the form
((<variable1> <init1>) ...),
and <body> should be a sequence of one or more expressions. It is an error
for a <variable> to appear more than once in the list of variables being bound.
Semantics: The <variable>s are bound to fresh locations holding undefined
values, the <init>s are evaluated in the resulting environment (in some
unspecified order), each <variable> is assigned to the result of the
corresponding <init>, the <body> is evaluated in the resulting environment, and
the value(s) of the last expression in <body> is(are) returned. Each binding of
a <variable> has the entire letrec expression as its region, making it possible
to define mutually recursive procedures.
(letrec ((even?
(lambda (n)
(if (zero? n)
#t
(odd? (- n 1)))))
(odd?
(lambda (n)
(if (zero? n)
#f
(even? (- n 1))))))
(even? 88))
===> #t
One restriction on letrec is very important: it must be possible to evaluate
each <init> without assigning or referring to the value of any <variable>. If
this restriction is violated, then it is an error. The restriction is necessary
because Scheme passes arguments by value rather than by name. In the most
common uses of letrec, all the <init>s are lambda expressions and the
restriction is satisfied automatically.
The first sentence of the "Semantics" sections sounds like it requires all the assignments to happen after all the initialization clauses have been evaluated; though, as I said earlier, this isn't as explicit as it might be.
In R6RS and R7RS, the only substantial changes to this part of the specification is the addition of a requirement that:
the continuation of each <init> should not be invoked more than once.
R6RS and R7RS also add another binding construction, though: letrec*. This differs from letrec in two ways. First, it does specify a left-to-right evaluation order. Correlatively, the "restriction" noted above can be relaxed somewhat. It's now okay to reference the values of variables that have already been assigned their initial values:
It must be possible to evaluate each <init> without assigning or
referring to the value of the corresponding <variable> or the
<variable> of any of the bindings that follow it in <bindings>.
The second difference is with respect to our Q2. With letrec*, the specification now requires that the assignments take place after each initialization clause is evaluated. Here is the first paragraph of the "Semantics" from R7RS (draft 6):
Semantics: The <variable>s are bound to fresh locations, each
<variable> is assigned in left-to-right order to the result of evaluating
the corresponding <init>, the <body> is evaluated in the resulting
environment, and the values of the last expression in <body> are
returned. Despite the left-to-right evaluation and assignment order, each
binding of a <variable> has the entire letrec* expression as its region,
making it possible to define mutually recursive procedures.
So Chicken, and Racket using #lang racket---and many other implementations---seem in fact to implement their letrecs as letrec*s.
The reason for this being evaluated to 1 is because of (set! x 1). If instead of 1 you set x to 0 then it will result in zero. This is because the continuation variable cont which is storing the continuation is actually storing the continuation for y and not for x as it is being set to y's continuation after x's.

Resources