Functional programming side effects clarification - scheme

I'm currently trying to gain an understanding about side effects in general with regard to functional programming, racket to be exact. It's my understanding that it relates to changing the state of some variable, like a global one.
Here's some code that I've written;
; Define a variable with the value of 5
(define x 5)
; Define a function to add 1 to x
(define addX
(+ 1 x))
; Test out values
x
addX
x
Which outputs 5 6 5.
Shouldn't the last value be 6? Or is the fundamental principle that I'm missing, the fact that the value is stateless when using functional programming?

The way your code is written, you can think of x as a constant – ie, addX does not mutate the x binding.
It's like the same as (pseudocode)
constant X = 5
constant addX = X + 1
print(X) ; 5
print(addX) ; 6
print(X) ; 5
Functional programming requires the immutable data structures. If you approach scheme/racket with notions from other (imperative style) languages, you'll struggle and the code you produce will be very bad.

(+ 1 x) is an expression. The result of this expression, in the case where we have already (define x 5), is 6. That value just... percolates up to whatever tried to evaluate it. Whether we ask DrRacket to evaluate it, or we assign it to something else, like (define addX (+ 1 x)), what is changing is the expression is becoming the value.
So, if you want to assign a value to an identifier that is already introduced, you need to tell the interpreter to do this assignment. That form is set!, as in,
(define addX #f)
addX ; => #f
(set! addX (+ 1 x))
addX ; => 6

Related

binding values to frames in the environment model

I am a little confused on how the environment model of evaluation works, and hoping someone could explain.
SICP says:
The environment model specifies: To apply a procedure to arguments,
create a new environment containing a frame that binds the parameters
to the values of the arguments. The enclosing environment of this
frame is the environment specified by the procedure. Now, within this
new environment, evaluate the procedure body.
First example:
If I:
(define y 5)
in the global environment, then call
(f y)
where
(define (f x) (set! x 1))
We construct a new environment (e1). Within e1, x would be bound to the value of y (5). In the body, the value of x would now be 1. I found that y is still 5. I believe the reason for this is because x and y are located in different frames. That is, I completely replaced the value of x. I modified the frame where x is bound, not just its value. Is that correct?
Second example:
If we have in the global environment:
(define (cons x y)
(define (set-x! v) (set! x v))
(define (set-y! v) (set! y v))
(define (dispatch m)
(cond ((eq? m 'car) x)
((eq? m 'cdr) y)
((eq? m 'set-car!) set-x!)
((eq? m 'set-cdr!) set-y!)
(else (error "Undefined
operation: CONS" m))))
dispatch)
(define (set-car! z new-value)
((z 'set-car!) new-value)
z)
Now I say:
(define z2 (cons 1 2))
Suppose z2 has a value the dispatch procedure in an environment called e2, and I call:
(set-car! z2 3)
Set-car! creates a new environment e3. Within e3, the parameter z is bound to the value of z2 (the dispatch procedure in e2) just like in my first example. After the body is executed, z2 is now '(3 2). I think set-car! works the way it does is because I am changing the state of the object held by z (which is also referenced by z2 in global), but not replacing it. That is, I did not modify the frame where z is bound.
In this second example it appears that z2 in global and z in e3 are shared. I am not sure about my first example though. Based on the rules for applying procedures in the environment model, it appears x and y are shared although it is completely undetectable because 5 does not have local state.
Is everything I said correct? Did I misunderstood the quote?
To answer your first question: assuming that you meant to write (f y) in your first question rather than (f 5), the reason that y is not modified is that racket (like most languages) is a "call by value" language. That is, values are passed to procedure calls. In this case, then the argument y is evaluated to 5 before the call to f is made. Mutating the x binding does not affect the y binding.
To answer your second question: in your second example, there are shared environments. That is, z is a function that is closed over an environment (you called it e2). Each call to z creates a new environment that is linked to the existing e2 environment. Performing mutation on either x or y in this environment affects all future references to the e2 environment.
Summary: passing the value of a variable is different from passing a closure that contains that variable. If I say
(f y)
... the after the call is done, 'y' will still refer to the same value[*]. If I write
f (lambda (...) ... y ...)
(that is, passing a closure that has a reference to y, then y might be bound to a different value after the call to f.
If you find this confusing, you're not alone. The key is this: don't stop using closures. Instead, stop using mutation.
[*] if y is a mutable value, it may be mutated, but it will still be the "same" value. see note above about confusion.
TL;DR: simple values in Scheme are immutable, are copied in full when passed as arguments into functions. Compound values are mutable, are passed as a copy of a pointer, whereas the copied pointer points to the same memory location as the original pointer does.
What you're grappling with is known as "mutation". Simple values like 5 are immutable. There's no "set-int!" to change 5 to henceforth hold the value 42 in our program. And it is good that there isn't.
But a variable's value is mutable. A variable is a binding in a function invocation's frame, and it can be changed with set!. If we have
(define y 5)
(define (foo x) (set! x 42) (display (list x x)))
(foo 5)
--> foo is entered
foo invocation environment frame is created as { x : {int 5} }
x's binding's value is changed: the frame is now { x : {int 42} }
(42 42) is displayed
y still refers to 5 in the global environment
But if foo receives a value that is itself holding mutable references, which can be mutated, i.e. changed "in place", then though foo's frame itself doesn't change, the value to which a binding in it is referring can be.
(define y (cons 5 6)) ; Scheme's standard cons
--> a cons cell is created in memory, at {memory-address : 123}, as
{cons-cell {car : 5} {cdr : 6} }
(define (foo x) (set-car! x 42) (display (list x x)))
(foo y)
--> foo is entered
foo invocation environment frame is created as
{ x : {cons-cell-reference {memory-address : 123}} }
x's binding's value is *mutated*: the frame is still
{ x : {cons-cell-reference {memory-address : 123}} }
but the cons cell at {memory-address : 123} is now
{cons-cell {car : 42} {cdr : 6} }
((42 . 6) (42 . 6)) is displayed
y still refers to the same binding in the global environment
which still refers to the same memory location, which has now
been altered in-place: at {memory-address : 123} is now
{cons-cell {car : 42} {cdr : 6} }
In Scheme, cons is a primitive which creates mutable cons cells which can be altered in-place with set-car! and set-cdr!.
What these SICP exercises intend to show is that it is not necessary to have it as a primitive built-in procedure; that it could be implemented by a user, even if it weren't built-in in Scheme. Having set! is enough for that.
Another jargon for it is to speak of "boxed" values. If I pass 5 into some function, when that function returns I'm guaranteed to still have my 5, because it was passed by copying its value, setting the function invocation frame's binding to reference the copy of the value 5 (which is also just an integer 5 of course). This is what is referred to as "pass-by-value".
But if I "box" it and pass (list 5) in to some function, the value that is copied -- in Lisp -- is a pointer to this "box". This is referred to as "pass-by-pointer-value" or something.
If the function mutates that box with (set-car! ... 42), it is changed in-place and I will henceforth have 42 in that box, (list 42) -- under the same memory location as before. My environment frame's binding will be unaltered -- it will still reference the same object in memory -- but the value itself will have been changed, altered in place, mutated.
This works because a box is a compound datum. Whether I put a simple or compound value in it, the box itself (i.e. the mutable cons cell) is not simple, so will be passed by pointer value -- only the pointer will be copied, not what it points to.
x bound to the value of y means that x is a new binding which receives a copy of the same value that y contains. x and y are not aliases to a shared memory location.
Though due to issues of optimization, bindings are not exactly memory locations, you can model their behavior that way. That is to say, you can regard an environment to be a bag of storage locations named by symbols.
Educational Scheme-in-Scheme evaluators, in fact, use association lists for representing environments. Thus (let ((x 1) (y 2)) ...) creates an environment which simply looks like ((y . 1) (x . 2)). The storage locations are the cdr fields of the cons pairs in this list, and their labels are the symbols in the car fields. The cell itself is the binding; the symbol and location are bound together by virtue of being in the same cons structure.
If there is an outer environment surrounding this let, then these association pairs can just be pushed onto it with cons:
(let ((z 3))
;; env is now ((z . 3))
(let ((x 1) (y 2))
;; env is now ((y . 2) (x . 1) (z . 3))
The environment is just a stack of bindings that we push onto. When we capture a lexical closure, we just take the current pointer and stash it into the closure object.
(let ((z 3))
;; env is now ((z . 3))
(let ((x 1) (y 2))
;; env is now ((y . 2) (x . 1) (z . 3))
(lambda (a) (+ x y z a))
;; lambda is an object with these three pices:
;; - the environment ((y . 2) (x . 1) (z . 3))
;; - the code (+ x y z a)
;; - the parameter list (a)
)
;; after this let is done, the environment is again ((z . 3))
;; but the above closure maintains the captured one
)
So suppose we call that lambda with an argument 10. The lambda takes the parameter list (a) and binds it to the argument list to create a new environment:
((a . 1))
This new environment is not made in a vacuum; it is created as an extension to the captured environment. So, really:
((a . 1) (y . 2) (x . 1) (z . 3))
Now, in this effective environment, the body (+ x y z a) is executed.
Everything you need to understand about environments can be understood in reference to this cons pair model of bindings.
Assignment to a variable? That's just set-cdr! on a cons-based binding.
What is "extending an environment"? It's just pushing a cons-based binding onto the front.
What is "fresh binding" of a variable? That's just the allocation of a new cell with (cons variable-symbol value) and extending the environment with it by pushing it on.
What is "shadowing" of a variable? If an environment contains (... ((a . 2)) ...) and we push a new binding (a . 3) onto this environment, then this a is now visible, and (a . 2) is hidden, simply because the assoc function searches linearly and finds (a . 2) first! The inner-to-outer environment lookup is perfectly modeled by assoc. Inner bindings appear to the left of outer bindings, closer to the head of the list and are found first.
The semantics of sharing all follow from the semantics of these lists of cells. In the assoc list model, environment sharing occurs when two environment assoc lists share the same tail. For instance, each time we call our lambda above, a new (a . whatever) argument environment is created, but it extends the same captured environment tail. If the lambda changes a, that is not seen by the other invocations, but if it changes x, then the other invocations will see it. a is private to the lambda invocation, but x, y and z are external to the lambda, in its captured environment.
If you fall back on this assoc list model mentally, you will not go wrong as far as working out the behavior of environments, including arbitrarily complex situations.
Real implementations basically just optimize around this. for instance, a variable that is initialized from a constant like 42 and never assigned does not have to exist as an actual environment entry at all; the optimization called "constant propagation" can just replace occurrences of that variable with 42, as if it were a macro. Real implementations may use hash tables or other structures for the environment levels, not assoc lists. Real implementations may be compiled: lexical environments can be compiled according to various strategies such as "closure conversion". Basically, an entire lexical scope can be flattened into a single vector-like object. When a closure is made at run time, the entire vector is duplicated and initialized. Compiled code doesn't refer to variable symbols, but to offsets in the closure vector, which is substantially faster: no linear search through an assoc list is required.

Scheme procedure with 2 arguments

Learned to code C, long ago; wanted to try something new and different with Scheme. I am trying to make a procedure that accepts two arguments and returns the greater of the two, e.g.
(define (larger x y)
(if (> x y)
x
(y)))
(larger 1 2)
or,
(define larger
(lambda (x y)
(if (> x y)
x (y))))
(larger 1 2)
I believe both of these are equivalent i.e. if x > y, return x; else, return y.
When I try either of these, I get errors e.g. 2 is not a function or error: cannot call: 2
I've spent a few hours reading over SICP and TSPL, but nothing is jumping out (perhaps I need to use a "list" and reference the two elements via car and cdr?)
Any help appreciated. If I am mis-posting, missed a previous answer to the same question, or am otherwise inappropriate, my apologies.
The reason is that, differently from C and many other languages, in Scheme and all Lisp languages parentheses are an important part of the syntax.
For instance they are used for function call: (f a b c) means apply (call) function f to arguments a, b, and c, while (f) means apply (call) function f (without arguments).
So in your code (y) means apply the number 2 (the current value of y), but 2 is not a function, but a number (as in the error message).
Simply change the code to:
(define (larger x y)
(if (> x y)
x
y))
(larger 1 2)

Algorithm evaluating user-defined functions

Hello I have some homework that consists of extending a lisp interpreter. We are to build three primitives with pre-evaluated arguments ( for exemple <= ), and three primitives who do their own evaluation ( for example if ).
I went beyond the call of duty and created the only fun function in the bounds of this exercice : (defun) [it's the common lisp keyword for defining a user-function].
I would like to know if my algorithm for managing a user-defined function call is worthwhile.
In pseudo code, here it goes :
get list of parameters # (x y z)
get list of arguments # (1 2 3)
get body of function # (+ x (* y z))
for each parameter, arg # x
body = replace(parameter, argument, body) # (+ 1 (* y z))
# (+ 1 (* 2 z))
# (+ 1 (* 2 3))
eval(body) # 7
Are there better ways to accomplish this?
Thanks.
EDIT: replace() is a function recursing on sub-lists of body.
I never found better, no one proposed better, the question generated no interest whatever, and I'm on a rampage to close my opened questions, so here is the answer :
my algorithm was good enough.

how do i open a racket REPL with the current scope?

Let's say I have a program like this:
(define (foo x)
(local
((define y (- x 1)))
(* x y)))
(foo 3)
I want to be able to open a REPL between lines 3 and 4, such that I can explore (and possibly modify) the values of x and y by executing arbitrary statements.
To do this in Ruby, I would take the equivalent program:
def foo(x)
lambda {
y = x - 1
x * y
}.call
end
puts (foo 3)
And modify it by adding a call to pry to give me a nicely-scoped repl where I want it:
require 'pry'
def foo(x)
lambda {
y = x - 1
binding.pry
x * y
}.call
end
puts (foo 3)
To do it in js, I would run this program under Firebug and just put a breakpoint on line 4:
foo = function(x) {
return (function(){
var y = x - 1;
return x * y;
})();
};
console.log(foo(3));
And then I could explore stuff in the evaluation window.
Is there anything I can do to get this in Racket? The closest I've found is DrScheme's debugger, but that just presents all the values of the current scope, it doesn't let you explore them in a REPL as far as I can see.
This isn't answering your original question, it's in response to your comment about making your own. I thought that was a really interesting idea so I explored it. What I was able to figure out:
Let's say you want this to work:
(define top-x 10)
(define (f)
(for ([i 10])
(displayln i)
(when (= i 5)
(pry)))) ; <= drop into a REPL here, resume after exiting REPL
A first attempt at pry:
(define (pry)
(let loop ()
(display "PRY> ")
(define x (read))
(unless (or (eof-object? x) (equal? x '(unquote exit)))
(pretty-print (eval x))
(loop))))
This seems to work:
> (f)
0
1
2
PRY> (+ 10 10)
20
PRY> ,exit
3
4
>
But although it lets you access Racket functions like +, you can't access even your top-level variables like top-x:
> (f)
0
1
2
PRY> top-x
; top-x: undefined;
; cannot reference undefined identifier
You can get the top-level stuff by giving eval access to the current namespace, as explained here. So pry needs a namespace argument:
(define (pry ns)
(let loop ()
(display "PRY> ")
(define x (read))
(unless (or (eof-object? x) (equal? x '(unquote exit)))
(pretty-print (eval x ns)) ; <---
(loop))))
And to get that argument you need this incantation to your debugee file:
(define-namespace-anchor a) ; <---
(define ns (namespace-anchor->namespace a)) ; <---
(define top-x 10)
(define (f)
(for ([i 5])
(displayln i)
(when (= i 2)
(pry ns)))) ; <---
Now the REPL can see and change top-x:
> (f)
0
1
2
PRY> top-x
10
PRY> (set! top-x 20)
#<void>
PRY> top-x
20
PRY> ,exit
3
4
>
Cool! But it can't change the local variable, i:
> (f)
0
1
2
PRY> i
; i: undefined;
; cannot reference an identifier before its definition
Shoot. The reason why is explained here.
You might imagine that even though eval cannot see the local bindings in broken-eval-formula, there must actually be a data structure mapping x to 2 and y to 3, and you would like a way to get that data structure. In fact, no such data structure exists; the compiler is free to replace every use of x with 2 at compile time, so that the local binding of x does not exist in any concrete sense at run-time. Even when variables cannot be eliminated by constant-folding, normally the names of the variables can be eliminated, and the data structures that hold local values do not resemble a mapping from names to values.
You might say, OK, but in that case...
How does DrRacket provide a debugger?
From what I was able to figure out, DrRacket does this by annotating the syntax before evaluating the program. From drracket/gui-debugger/annotator.rkt:
;; annotate-stx inserts annotations around each expression that introduces a
;; new scope: let, lambda, and function calls. These annotations reify the
;; call stack, and allows to list the current variable in scope, look up
;; their value, as well as change their value. The reified stack is accessed
;; via the CURRENT-CONTINUATION-MARKS using the key DEBUG-KEY
So I think that would be the jumping-off point if you wanted to tackle this.
In the DrRacked IDE you have a DEBUG Q >| button. You can step through your program or you can do as you said in other languages, press right mouse button at the expression you want to investigate and either choose continue to this point for once only or pause at this point for a breakpoint, then press GO > to run the program.
To inspect or change x, put you mouse pointer over it and use right mouse button. To change you choose (set! x ...) in the menu.
As for the in language repl, You could make your own (pry) to start a repl in there and in Common Lisp you could have just signaled an error to get to the nice debugger.

In Scheme, what's the point of "set!"?

What's the point of using the set! assignment operator in scheme? Why not just rebind a variable to a new value using define?
> (define x 100)
> (define (value-of-x) x) ;; value-of-x closes over "x"
> x
100
> (value-of-x)
100
> (set! x (+ x 1))
> x
101
> (value-of-x)
101
> (define x (+ x 1))
> x
102
> (value-of-x)
102
>
Though both define and set! will redefine a value when in the same scope, they do two different things when the scope is different. Here's an example:
(define x 3)
(define (foo)
(define x 4)
x)
(define (bar)
(set! x 4)
x)
(foo) ; returns 4
x ; still 3
(bar) ; returns 4
x ; is now 4
As you can see, when we create a new lexical scope (such as when we define a function), any names defined within that scope mask the names that appear in the enclosing scope. This means that when we defined x to 4 in foo, we really created a new value for x that shadowed the old value. In bar, since foo does not exist in that scope, set! looks to the enclosing scope to find, and change, the value of x.
Also, as other people have said, you're only supposed to define a name once in a scope. Some implementations will let you get away with multiple defines, and some won't. Also, you're only supposed to use set! on a variable that's already been defined. Again, how strictly this rule is enforced depends on the implementation.
It is not usually permitted to define a variable more than once. Most REPLs allow it for convenience when you're trying things out, but if you try to do that in a Scheme program it will give you an error.
For example, in mzscheme, the program
#lang scheme
(define x 1)
(define x 2)
gives the error
test.ss:3:8: module: duplicate definition for identifier at: x in: (define-values (x) 2)
In addition, define has a different meaning when used inside of other contexts. The program
#lang scheme
(define x 1)
x
(let ()
(define x 2)
x)
x
has the output
1
2
1
This is because defines inside of certain constructs are actually treated as letrecs.
When you use lexical bindings you do not define them:
(let ((x 1))
(set! x (+ x 1))
x)
When you use define you create a new variable with the new value, while the old variable still exists with the old value; it is just hidden by the new one. On the command line you don't see the difference to set!, but define won't be usable for e.g. a loop counter in an imperative program.

Resources