Scheme's block structure efficiency - scheme

The book defines block structure in Chapter 1, allowing you to 'package' defines inside a procedure definition.
Consider this mean-square definition for example:
(define (mean-square x y)
(define (square x) (* x x))
(define (average x y) (/ (+ x y) 2))
(average (square x) (square y)))
when I run (mean-square 2 4) I correctly get 10.
My question is, are the internal definitions ( square and average in this toy case ) run each time I invoke the mean-square procedure via the interpreter? If so, isn't that inefficient? and if not, why?

If the code is somewhat naively compiled, there could be some overhead. The reason is that the inner functions are defined in a brand new lexical environment that is freshly instantiated on each entry into the function. In the abstract semantics, each time the function is called, new lexical closures have to be captured and wired into the correct spots in that environment frame.
Thus it boils down to how much of this can the compiler optimize away. For instance it can notice that neither of the functions actually references the surrounding lexical environment. (The x and y references in these functions are to their own parameters, not to those of the surrounding mean-square). Which means they both be moved to the top level without changing semantics:
(define (__anon1 x) (* x x))
(define (__anon2 x y) (/ (+ x y) 2))
(define (mean-square x y)
(define square __anon1)
(define average __anon2)
(average (square x) (square y)))
And since now square and average are effectively simple aliases (aliases for global entities that are generated by the compiler, which the compiler knows aren't being manipulated by anything outside of its control), the values they denote can be propagated through:
(define (mean-square x y)
(__anon2 (__anon1 x) (__anon1 y)))

It's not a problem. When the mean-square procedure is compiled, all the nested procedures are also compiled. It doesn't need to re-compile them every time you invoke the mean-square procedure.

I think the other answers have probably convinced you that the case you give really doesn't need to have any overhead: the local definitions can be just compiled away. But it's worth thinking about how a system might approach cases where this can't be done.
Consider a definition like this:
(define (make-searcher thing)
(define (search in)
(cond [(null? in)
#f]
[(eqv? (first in) thing)
in]
[else (search (rest in))]))
search)
Well, the local search procedure definitely can't be compiled away here, because it's returned from make-searcher. And it's even worse than that: (make-searcher 1) and (make-searcher 2) need to return different procedures, because ((make-searcher 1) '(1 2 3)) is (1 2 3) while ((make-searcher 2) '(1 2 3)) is (2 3).
So this sounds completely hopeless: the local search procedure not only has to be a procedure (it can't be compiled away), it has to be remade each time.
But in fact things are not nearly so bad. Lexical scope means that the system can know exactly what bindings are visible to search (in this case, a binding for thing as well as its argument). So what you can do, for instance, is compile a bit of code which looks up the values of these bindings in a vector. Then, the thing that is returned from make-search packs together the compiled code of search with a vector of bindings. The compiled code is always the same, only the vector needs to be created and initialised each time.

Imagine this code:
(let ((a expr))
(do-something-with a))
It is the same as:
((lambda (a)
(do-something-with a))
expr)
In an interpreter it might create the lambda each time before calling it while other
languages might turn it into (do-something-with expr). The report doesn't want to touch non functional requirements other than guaranteed tail recursion. In all the serious implementations lambdas are cheap.
Since you mention racket:
File test_com.rkt
#lang racket
(define (mean-square x y)
(define (square x) (* x x))
(define (average x y) (/ (+ x y) 2))
(average (square x) (square y)))
(display (mean-square 2 4))
Terminal commands:
raco make test_com.rkt
raco decompile compiled/test_com_rkt.zo
Resulting output:
(module test_com ....
(require (lib "racket/main.rkt"))
(provide)
(define-values
(mean-square)
(#%closed
mean-square49
(lambda (arg0-50 arg1-51)
'#(mean-square #<path:/home/westerp/compiled/test_com.rkt> 2 0 14 136 #f)
'(flags: preserves-marks single-result)
(/ (+ (* arg0-50 arg0-50) (* arg1-51 arg1-51)) '2))))
(#%apply-values print-values (display '10)) ; the only code that matters!
(void)
(module (test_com configure-runtime) ....
(require '#%kernel (lib "racket/runtime-config.rkt"))
(provide)
(print-as-expression '#t)
(void)))
While mean-square has got their local procedures inlined, because I gave it literal values it will never call it so all it does is (display '10) and then exit.
This is of course if you do make or exe. From DrRacket the language options that enabled debugging and better trace and error messages will run slower.

Related

Function Arguments and Continuations

My question is about function arguments in conjunction with continuations.
Specifically, what behavior is required, and what is allowed.
Suppose you have a function call (f arg1 arg2 arg3). I realize that
a compliant Scheme implementation is allowed to evaluate the arguments
arg1, arg2, and arg3 in any order. That's fine. But now suppose
that, say, arg2 creates a continuation. In general, some of the other
arguments may be evaluated before arg2 is evaluated, and some may be
evaluated after arg2 is evaluated.
Suppose that, in the Scheme implementation we're using, arg1 is
evaluated before arg2. Further, suppose that f modifies its local
copy of the first argument. Later, when the continuation created
during the evaluation of arg2 is called, arg3 will be evaluated
again and f will be called.
The question is this: When f is called a second time, via the
continuation, what value must/may its first argument have? Does it
need to be the same value that arg1 evaluated to? Or may it be the
modified value from the previous call to f? (Again, this example
assumes that arg1 is evaluated before arg2, but the same issue
applies with different argument evaluation orders. I.e., if arg3 is
evaluated before arg2, then the question applies to arg3.)
I have tried this in a couple of Scheme implementations, and have obtained
differing results. I took into account different orders of evaluation of
the arguments (it's easy to track it by having the argument expressions
log when they're being evaluated). Ignoring that difference, one
implementation always used the original argument values, and another
sometimes used the original argument values, and sometimes used the
modified argument values, depending on whether f was an inline
lambda vs. a global function. Presumably the difference is due to
whether the actual arguments end up being copied into the function's
local variables, or whether they are used in-place.
Here is a version that uses a global function:
(define (bar x cc y)
(set! x (* x 2))
(set! y (* y 3))
(format #t "~a ~a\n" x y)
cc)
(define (foo a b)
(let* ((first #t)
(cb (bar
(+ a 10)
(call/cc (lambda (x) x))
(+ b 100))))
(if first
(begin
(set! first #f)
cb)
(cb '()))))
(define cc (foo 1 2))
(call/cc cc)
(call/cc cc)
The above version uses the original argument values when calling
the function bar in both of the Scheme implementations that I tested.
The function bar sees 11 for the first argument and 102 for the
third argument each time it is called. The output is:
22 306
22 306
22 306
Now, here is a version that replaces the global function with an inline
lambda:
(define (foo a b)
(let* ((first #t)
(cb ((lambda (x cc y)
(set! x (* x 2))
(set! y (* y 3))
(format #t "~a ~a\n" x y)
cc)
(+ a 10)
(call/cc (lambda (x) x))
(+ b 100))))
(if first
(begin
(set! first #f)
cb)
(cb '()))))
(define cc (foo 1 2))
(call/cc cc)
(call/cc cc)
In one of the Scheme implementations I tested (BiwaScheme), this
behaves the same as the previous version. I.e., the called function
always sees the original argument values.
In another Scheme implementation (Gosh/Gauche), this behaves
differently from the previous version. In this case, the called
function uses the modified value of the first argument. In other
words, it handles the inline lambda differently, taking advantage of
the fact that it can see the function definition, and is presumably
using a more direct argument passing mechanism that avoids having to
copy them. Since it isn't copying the arguments, the ones that were
evaluated before the continuation point retain their modified values.
The lambda sees 11 and 102 for the first and third arguments the
first time, then it sees 22 and 102 the second time, and 44 and
102 the third time. So the continuation is picking up the modified
argument values. The output is:
22 306
44 306
88 306
So again, my question is this: Are both behaviors allowed by the
Scheme standard (R6RS and/or R7RS)? Or does Scheme in fact require
that the original argument values be used when the continuation is
invoked?
Update: I originally reported that the Gauche Scheme implementation
gave the three different sets of values shown above. That was true,
but only for certain versions of Gauche. The version I originally
tested was Gauche 0.9.3.3, which shows the three different sets of
values. I later found a site that has three different versions of
Gauche. The oldest, Gauche 0.9.4, also shows the three different
sets of values. But the two newer versions, Gauche 0.9.5 and Gauche
0.9.8, both show the repeated values:
22 306
22 306
22 306
This argues pretty strongly that this was considered a bug which
has since been fixed (just as everyone has been saying).
A continuation will literally create a copy of the stack in the moment of calling call/cc, copy that is also called a control-point. The continuation also stores inside it a copy of the current dynamic environment (more precisely, of the state-space from the dynamic-wind module) and a copy of the thread-local state.
So, when you reactivate the continuation, everything will continue from the moment when it was saved. If some arguments were previously evaluated, their evaluation is saved on the stack and the rest of arguments will be re-evaluated a second time. (as a remark, the dynamic state in scheme is implemented over the dynamic-wind module, so saving the dynamic state involved saving the state of dynamic wind, which is a combination between stack and the state-space (a tree keeping the in-out thunks for dynamic-wind calls)).
The stack starts from the top-level (actually there are other stacklets that represent continuations of the shutdown procedures, but those are touched only when you finish your code), they are not memorized when you call call/cc. So, if in a file/repl you gave 2 expressions, such as
(+ (f 1) 2)
(display "ok")
each of these expressions will have its own stacklet, so saving the continuation within f won't re-evaluate the display.
I think this should be enough to analyse your problem. The arguments are evaluated in unspecified order.
EDIT:
Concerning the answer of foo, for sure it is not correct 22 306 44 306 88 306 but it's correct 22 306 22 306 22 306.
I never used any of these 2 implementations. It is a bug in the implementation that does not bind x after each invocation of the (lambda (x cc y) ...), as the capture of the continuation is made outside the lambda().
The implementation bug seems obvious, it's in their generation of Scode -- they keep x on the stack, despite the fact that set! x was present, which should be an indicator to allocate x as a box on the heap.
While the evaluation order is undefined in the report it is not undefined in an implementtions CPS code. Eg.
(f (+ x 4) (call/cc cont-fun)), where x is a free variable, becomes either:
(call/cc&
cont-fun&
(lambda (v2)
(+& x
4
(lambda (v1)
(f& v1 v2 halt&))))
Or:
(+& x
4
(lambda (v1)
(call/cc&
cont-fun&
(lambda (v2)
(f& v1 v2 halt&)))))
So if the continuation function cont-fun& mutates x this will have an impact of the result in the version that evaluates the arguments right to left since the addition is done in the continuation of it, but in the second version mutating x will not affect the addition since the value is already computed in the passed value v2 and in the event the continuation is captured and rerun this value will never be recomputed. In the first version though you always compute v1 so here mutating the free variable x will affect the result.
If you as a developer wants to avoid this you let* the damn thing:
(let* ((a2 (call/cc cont-fun))
(a1 (+ x 4)))
(f a1 a2))
This code will force the behavior of the addition always being in the continuation of determining a2.
Now I avoided using your mutating examples, but in reality those are just bindings being rerouted. You have overcomplicated bar as the set! does not have any lasting effect. It is always the same as:
(define (bar x cc y)
(format #t "~a ~a\n" (* x 2) (* y 3))
cc)
The continuation caught in:
(bar (+ a 10)
(call/cc (lambda (x) x))
(+ b 100))
Regardless of the order we know the call to bar is the final continuation after evaluating all 3 expressions and then do the body of the let* the first and the 2 consecutive times.
Your second version doesn't change anything since the function doesn't rely on free variables. How the consecutive call to the continuation gave you 44 and 88 is most definitely a compiler optimization that fails. It shouldn't do that. I would have reported it as a bug.

Main difference between using define and let in scheme [duplicate]

Ok, this is a fairly basic question: I am following the SICP videos, and I am a bit confused about the differences between define, let and set!.
1) According to Sussman in the video, define is allowed to attach a value to avariable only once (except when in the REPL), in particular two defines in line are not allowed. Yet Guile happily runs this code
(define a 1)
(define a 2)
(write a)
and outputs 2, as expected. Things are a little bit more complicated because if I try to do this (EDIT: after the above definitions)
(define a (1+ a))
I get an error, while
(set! a (1+ a))
is allowed. Still I don't think that this the only difference between set! and define: what is that I am missing?
2) The difference between define and let puzzles me even more. I know in theory let is used to bind variables in local scope. Still, it seems to me that this works the same with define, for instance I can replace
(define (f x)
(let ((a 1))
(+ a x)))
with
(define (g x)
(define a 1)
(+ a x))
and f and g work the same: in particular the variable a is unbound outside g as well.
The only way I can see this useful is that let may have a shorter scope that the whole function definition. Still it seems to me that one can always add an anonymous function to create the necessary scope, and invoke it right away, much like one does in javascript. So, what is the real advantage of let?
Your confusion is reasonable: 'let' and 'define' both create new bindings. One advantage to 'let' is that its meaning is extraordinarily well-defined; there's absolutely no disagreement between various Scheme systems (incl. Racket) about what plain-old 'let' means.
The 'define' form is a different kettle of fish. Unlike 'let', it doesn't surround the body (region where the binding is valid) with parentheses. Also, it can mean different things at the top level and internally. Different Scheme systems have dramatically different meanings for 'define'. In fact, Racket has recently changed the meaning of 'define' by adding new contexts in which it can occur.
On the other hand, people like 'define'; it has less indentation, and it usually has a "do-what-I-mean" level of scoping allowing natural definitions of recursive and mutually recursive procedures. In fact, I got bitten by this just the other day :).
Finally, 'set!'; like 'let', 'set!' is pretty straightforward: it mutates an existing binding.
FWIW, one way to understand these scopes in DrRacket (if you're using it) is to use the "Check Syntax" button, and then hover over various identifiers to see where they're bound.
Do you mean (+ 1 a) instead of (1+ a) ? The latter is not syntactically valid.
Scope of variables defined by let are bound to the latter, thus
(define (f x)
(let ((a 1))
(+ a x)))
is syntactically possible, while
(define (f x)
(let ((a 1)))
(+ a x))
is not.
All variables have to be defined in the beginning of the function, thus the following code is possible:
(define (g x)
(define a 1)
(+ a x))
while this code will generate an error:
(define (g x)
(define a 1)
(display (+ a x))
(define b 2)
(+ a x))
because the first expression after the definition implies that there are no other definitions.
set! doesn't define the variable, rather it is used to assign the variable a new value. Therefore these definitions are meaningless:
(define (f x)
(set! ((a 1))
(+ a x)))
(define (g x)
(set! a 1)
(+ a x))
Valid use for set! is as follows:
(define x 12)
> (set! x (add1 x))
> x
13
Though it's discouraged, as Scheme is a functional language.
John Clements answer is good. In some cases, you can see what the defines become in each version of Scheme, which might help you understand what's going on.
For example, in Chez Scheme 8.0 (which has its own define quirks, esp. wrt R6RS!):
> (expand '(define (g x)
(define a 1)
(+ a x)))
(begin
(set! g (lambda (x) (letrec* ([a 1]) (#2%+ a x))))
(#2%void))
You see that the "top-level" define becomes a set! (although just expanding define in some cases will change things!), but the internal define (that is, a define inside another block) becomes a letrec*. Different Schemes will expand that expression into different things.
MzScheme v4.2.4:
> (expand '(define (g x)
(define a 1)
(+ a x)))
(define-values
(g)
(lambda (x)
(letrec-values (((a) '1)) (#%app + a x))))
You may be able to use define more than once but it's not
idiomatic: define implies that you are adding a definition to the
environment and set! implies you are mutating some variable.
I'm not sure about Guile and why it would allow (set! a (+1 a)) but
if a isn't defined yet that shouldn't work. Usually one would use
define to introduce a new variable and only mutate it with set!
later.
You can use an anonymous function application instead of let, in
fact that's usually exactly what let expands into, it's almost
always a macro. These are equivalent:
(let ((a 1) (b 2))
(+ a b))
((lambda (a b)
(+ a b))
1 2)
The reason you'd use let is that it's clearer: the variable names are right next to the values.
In the case of internal defines, I'm not sure that Yasir is
correct. At least on my machine, running Racket in R5RS-mode and in
regular mode allowed internal defines to appear in the middle of the
function definition, but I'm not sure what the standard says. In any
case, much later in SICP, the trickiness that internal defines pose is
discussed in depth. In Chapter 4, how to implement mutually recursive
internal defines is explored and what it means for the implementation
of the metacircular interpreter.
So stick with it! SICP is a brilliant book and the video lectures are wonderful.

Data Structures in Scheme

I am learning Scheme, coming from a background of Haskell, and I've run into a pretty surprising issue - scheme doesn't seem to have custom data types??? (ie. objects, structs, etc.). I know some implementations have their own custom macros implementing structs, but R6RS itself doesn't seem to provide any such feature.
Given this, I have two questions:
Is this correct? Am I missing a feature that allows creation of custom data types?
If not, how do scheme programmers structure a program?
For example, any function trying to return multiple items of data needs some way of encapsulating the data. Is the best practice to use a hash map?
(define (read-user-input)
(display "1. Add todo\n2. Delete todo\n3. Modify todo\n")
(let ((cmd-num (read)))
(if (equal? cmd-num "1") '(("command-number" . cmd-num) ("todo-text" . (read-todo)))
(if (equal? cmd-num "2") '(("command-number" . cmd-num) ("todo-id" . (read-todo-id)))
'(("command-number" . cmd-num) ("todo-id" . (read-todo-id)))))))
In order to answer your question, I think it might help to give you a slightly bigger-picture comment.
Scheme has often been described as not so much a single language as a family of languages. This is particularly true of R5RS, which is still what many people mean when they say "Scheme."
Nearly every one of the languages in the Scheme family has structures. I'm personally most familiar with Racket, where you can define structures with
struct or define-struct.
"But", you might say, "I want to write my program so that it runs in all versions of Scheme." Several very smart people have succeeded in doing this: Dorai Sitaram and Oleg Kiselyov both come to mind. However, my observation about their work is that generally, maintaining compatibility with many versions of scheme without sacrificing performance usually requires a high level of macro expertise and a good deal of Serious Thinking.
It's true that several of the SRFIs describe structure facilities. My own personal advice to you is to pick a Scheme implementation and allow yourself to feel good about using whatever structure facilities it provides. In some ways, this is not unlike Haskell; there are features that are specific to ghc, and generally, I claim that most Haskell programmers are happy to use these features without worrying that they don't work in all versions of Haskell.
Absolutely not. Scheme has several SRFIs for custom types, aka. record types, and with R7RS Red edition it will be SRFI-136, but since you mention R6RS it has records defined in the standard too.
Example using R6RS:
#!r6rs
(import (rnrs))
(define-record-type (point make-point point?)
(fields (immutable x point-x)
(immutable y point-y)))
(define test (make-point 3 7))
(point-x test) ; ==> 3
(point-y test) ; ==> 7
Early Scheme (and lisp) didn't have record types and you usually made constructors and accessors:
Example:
(define (make-point x y)
...)
(define (point-x p)
...)
(define (point-y p)
...)
This is the same contract the record types actually create. How it is implemented is really not important. Here are some ideas:
(define make-point cons)
(define point-x car)
(define point-y cdr)
This works most of the time, but is not really very safe. Perhaps this is better:
(define tag (list 'point))
(define idx-tag 0)
(define idx-x 1)
(define idx-y 2)
(define (point? p)
(and (vector? p)
(positive? (vector-length p))
(eq? tag (vector-ref p idx-tag))))
(define (make-point x y)
(vector tag x y))
;; just an abstraction. Might not be exported
(define (point-acc p idx)
(if (point? p)
(vector-ref p idx)
(raise "not a point")))
(define (point-x p)
(point-acc p idx-x))
(define (point-y p)
(point-acc p idx-y))
Now if you look the the reference implementation for record types you'll find they use vectors so the vector version and R6RSs isn't that different.
Lookup? You can use a vector, list or a case:
Example:
;; list is good for a few elements
(define ops `((+ . ,+) (- . ,-)))
(let ((found (assq '+ ops)))
(if found
((cdr found) 1 2)
(raise "not found")))
; ==> 3
;; case (switch)
((case '+
((+) +)
((-) -)
(else (raise "not found"))) 1 2) ; ==> 3
Of course you have hash tables in SRFI-125 so for a large number of elements its probably vice. Know that it probably uses vector to store the elements :-)

Why is a Lisp file not a list of statements?

I've been learning Scheme through the Little Schemer book and it strikes me as odd that a file in Scheme / Lisp isn't a list of statements. I mean, everything is supposed to be a list in Lisp, but a file full of statements doesn't look like a list to me. Is it represented as a list underneath? Seems odd that it isn't a list in the file.
For instance...
#lang scheme
(define atom?
(lambda (x)
(and (not (pair? x)) (not (null? x)))))
(define sub1
(lambda (x y)
(- x y)))
(define add1
(lambda (x y)
(+ x y)))
(define zero?
(lambda (x)
(= x 0)))
Each define statement is a list, but there is no list of define statements.
It is not, because there is no practical reasons for it. In fact, series of define statements change internal state of the language. Information about the state can be accessible via functions. For example , you can ask Lisp if some symbol is bound to a function.
There is no practical benefit in traversing all entered forms (for example, define forms). I suppose that this approach (all statements are elements of a list) would lead to code that would be hard to read.
Also, I think it not quite correct to think that "everything is supposed to be a list in Lisp", since there are also some atomic types, which are quite self-sufficient.
When you evaluate a form, if the form defines something, that definition is added to the environment, and that environment is (or can be) a single list. You can build a program without using files, by just typing definitions into the REPL. In Lisp as in any language, the program “lives” in the run-time environment, not the source files.

Scheme early "short circuit return"?

I'm trying to find out how I can do an "early return" in a scheme procedure without using a top-level if or cond like construct.
(define (win b)
(let* ((test (first (first b)))
(result (every (lambda (i) (= (list-ref (list-ref b i) i) test))
(enumerate (length b)))))
(when (and (not (= test 0)) result) test))
0)
For example, in the code above, I want win to return test if the when condition is met, otherwise return 0. However, what happens is that the procedure will always return 0, regardless of the result of the when condition.
The reason I am structuring my code this way is because in this procedure I need to do numerous complex checks (multiple blocks similar to the let* in the example) and putting everything in a big cond would be very unwieldy.
Here is how to use call/cc to build return yourself.
(define (example x)
(call/cc (lambda (return)
(when (< x 0) (return #f))
; more code, including possible more calls to return
0)))
Some Schemes define a macro called let/cc that lets you drop some of the noise of the lambda:
(define (example x)
(let/cc return
(when (< x 0) (return #f))
0))
Of course if your Scheme doesn't, let/cc is trivial to write.
This works because call/cc saves the point at which it was called as a continuation. It passes that continuation to its function argument. When the function calls that continuation, Scheme abandons whatever call stack it had built up so far and continues from the end of the call/cc call. Of course if the function never calls the continuation, then it just returns normally.
Continuations don't get truly mind-bending until you start returning them from that function, or maybe storing them in a global data structure and calling them later. Otherwise, they're just like any other language's structured-goto statements (while/for/break/return/continue/exceptions/conditions).
I don't know what your complete code looks like, but it might be better to go with the cond and to factor out the complex checks into separate functions. Needing return and let* is usually a symptom of overly imperative code. However, the call/cc method should get your code working for now.
One way would be to use recursion instead of looping, then an early exit is achieved by not recursing further.
You can use the "call with current continuation" support to simulate a return. There's an example on wikipedia. The function is called call-with-current-continuation, although there's often an alias called call/cc which is exactly the same thing. There's also a slightly cleaner example here
Note: This is quite an advanced Scheme programming technique and can be a bit mind bending at first...!!!!
In this case you don't want a when, you want an if, albeit not top-level.
(define (win b)
(let* ((test (first (first b)))
(result (every (lambda (i) (= (list-ref (list-ref b i) i) test))
(enumerate (length b)))))
(if (and (not (= test 0)) result)
test
0)))
The reason it was always returning zero is that whether or not the body of the when got executed, its result would be dropped on the floor. You see, the lambda implicit in the function define form creates an implicit begin block too, so
(define foo
(lambda (b)
(begin
(let ...)
0)))
and the way begin works is that it returns the result of the last form inside, while dropping all the intermediate results on the floor. Those intermediate results are intended to have side effects. You're not using any of that, which is great(!), but you have to be careful to only have one form (whose result you really want) inside the function definition.
Grem

Resources