I'm writing a simple lisp interpreter from scratch. I have a global environment that top level variables are bound in during evaluation of all the forms in a file. When all the forms in the file have been evaluated, the top level env and all of the key value data structs inside of it are freed.
When the evaluator encounters a lambda form, it creates a PROC object that contains 3 things: a list of arguments to be bound in a local frame when the procedure is applied, the body of the function, and a pointer to the environment it was created in. For example:
(lambda (x) x)
would produce something internally like:
PROC- args: x,
body: x,
env: pointer to top level env
When the PROC is applied, a new environment is created for the frame and the local bindings are staged there to allow the body to be evaluated with the appropriate bindings. This frame environment contains a pointer to its closure to allow variable lookup inside of THAT. In this case, that would be the global environment. After the PROC body is evaluated, I can free all the cells associated with it including its frame environment, and exit with no memory leaks.
My problem is with higher order functions. Consider this:
(define conser
(lambda (x)
(lambda (y) (cons x y))))
A function that takes one argument and produces another function that will cons that argument to something you pass into it. So,
(define aconser (conser '(1)))
Would yield a function that cons'es '(1) to whatever is passed into it. ex:
(aconser '(2)) ; ((1) 2)
My problem here is that aconser must retain a pointer to the environment it was created in, namely that of conser when is was produced via the invocation (conser '(1)). When aconser the PROC is applied, its frame must point to the frame of conser that existed when aconser was defined, so I can't free the frame of conser after applying it. I don't know how/the best way to both free the memory associated with a lambda frame when it is applied and also support this kind of persistent higher order function.
I can think of some solutions:
some type of ARC
copying the enclosing environment into the frame of the evaluated PROC when it is produced
This seems to be what is being implied here. So, instead of saving a pointer in the PROC object to its closure, I would... copy the closure environment and store a pointer to that directly in the cell? Would this not just be kicking the can one level deeper and result in the same problem?
recursively substituting the labels at read time inside of the body of the higher order function
I am worried I might be missing something very simple here, and also I am curious as to how this procedure is supported in other implementations of lisp and other languages with closures in general. I have not had much luck searching for answers because the question is very specific, perhaps even to this implementation (that I am admittedly just pulling out of my hat as a learning project) and much of what I am able to find simply explains the particulars of closures from the language being implemented's perspective, not from the language that the language is being implemented in's.
Here is a link to the relevant line in my source, if it is helpful, and I am happy to elaborate if this question is not detailed enough to describe the problem thoroughly. Thanks!
The way this is handled usually in naive interpreters is to use a garbage-collector (GC) and allocate your activation frames in the GC'd heap. So you never explicitly free those frames, you let the GC free them when applicable.
In more sophisticated implementations, you can use a slightly different approach:
when a closure is created, don't store a pointer to the current environment. Instead, copy the value of those variables which are used by the closure (it's called the free variables of the lambda).
and change the closure's body to use those copies rather than look in the environment for those variables. It's called closure conversion.
Now you can treat your environment as a normal stack, and free activation frames as soon as you exit a scope.
You still need a GC to decide when closures can be freed.
this in turn requires an "assignment conversion": copying the value of variables implies a change of semantics if those variables get modified. So to recover the original semantics, you need to look for those variables which are "copied into a closure" as well as "modified", and turn them into "reference cells" (e.g. a cons cell where you keep the value in the car), so that the copy doesn't copy the value any more, but just copies a reference to the actual place where the value is kept. [ Side note: such an implementation obviously implies that avoiding setq and using a more functional style may end up being more efficient. ]
The more sophisticated implementation also has the advantage that it can provide a safe for space semantics: a closure will only hold on to data to which it actually refers, contrary to the naive approach where closures end up referring to the whole surrounding environment and hence can prevent the GC from collecting data that is not actually referenced but just happened to be in the environment at the time it was captured by the closure.
Related
In doing reference counting, one of the tasks is to "decrement the counter when the variable goes out of scope". But my biggest problem is I can't tell in my head when a variable goes out of scope, at the implementation level of implementing a reference counter.
Could one explain all (or the main) ways in which a variable can go out of scope?
I am specifically talking about in the case of a highly advanced programming language, not a toy / introductory undergraduate language. I am thinking like with JavaScript or Rust, which has closures and nested function definitions (at least in the case of JavaScript). Also when you are using pointers and such and using mutable function parameters. Say you pass in a mutable value to a function, then return a closure using that mutable value, stuff like that.
What are all the ways you can tell when a variable goes out of scope? How do I get this organized enough so I can add it to a reference counter?
A local variable goes out of scope when execution reaches the end of the block in which it was declared.
Variables that are global / static don't ever go out of scope.
Variables that are fields of a composite data type (an class / object, a struct / record, an array, etc) may not have a "scope" per se, but if they do, it is determined by the scope of the composite data type instance they are part of.
If you are trying to analyses this at compile time ... you use a symbol table. This is covered in textbooks on compiler writing.
How to undefine a variable in Scheme? Is this possible?
You're touching a nerve here. Scheme doesn't have a very clear standard notion of how top-level environments work. Why? Because the Scheme standards represent a compromise between two sets of people with very different ideas of how Scheme should work:
The interpretive crowd, who sees the top-level environment as you describe above: a runtime hash-table where bindings are progressively added as program interpretation proceeds.
Then there's the compilation crowd, who sees the top-level environment as something that must be fully computable at compilation time (i.e., a compiler must be able to conclusively identify all of the names that will be bound in the top-level environment).
Your "how do I undefine a variable" question only makes sense in the first model.
Note that the interpretive model, where a program's top-level bindings depend on what code paths get taken, makes efficient compilation of Scheme code much harder for many reasons. For example, how can a Scheme compiler inline a procedure invocation if the name of the procedure is a top-level binding that may not just change during runtime, but even disappear into nothingness?
I'm firmly in the compilation camp here, so what I would recommend to you is to avoid writing code that relies on the ability to add or remove top-level bindings at runtime, or even that requires the use of top-level variables (though those are often unavoidable). Some Scheme systems (e.g., Racket) are able to produce reasonably good compiled code, but if you make those assumptions you'll trip them up in that regard.
In Scheme, variables are defined with either lambda, or one of the various lets. If you want one of them to be 'undefined' then all you need to do is leave the scope that they're in. Of course, that's not really undefining them, it's just that the variable is no longer bound to its previous definition.
If you're making top level definitions, using (define), then technically you're defining a function. Since Scheme is functional, functions never really go away. I suppose that technically, it's stored in some sort of environment function somewhere, so if you were intimately familiar with your implementation (and it's not safeguarded somehow) you could probably overwrite it with your own definition of the globabl environment. Barring that, I'd say that your best bet would be to redefine the function to return the null list- that's really as empty as you get.
Scheme (R7RS) has no standard compliant way to remove a top-level binding.
If you evaluate a non existing variable, you get an error:
(eval 'a)
; => ERROR: undefined variable: a
If you define it, the variable gets added to the top-level environment.
(define a 1)
(eval 'a)
; => 1
As from now no matter what you do, you will not get an error, if you access the variable.
If you set it to false, you will get false:
(set! a #f)
(eval 'a)
; => #f
Even if you set it to something unspecified, it is unlikely that you get an error:
(set! a (if #f #t))
(eval 'a)
; =>
But Schemes may have a non-standard way to remove a top-level binding. MIT Scheme provides the function unbind-variable.
As stated in the other answers there is no standard way of manipulating the namespace in Scheme. For a specific implementation there might be a solution.
In Racket the top-level variables are stored in a namespace. You can remove a variable using namespace-undefined-variable.
There is no way of removing a local variable.
http://docs.racket-lang.org/reference/Namespaces.html?q=namespace#%28def.%28%28quote.~23~25kernel%29._namespace-undefine-variable%21%29%29
(set! no-longer-needed #f)
Does this achieve the effect you want? You can also use define at the top level.
guile> (define nigel "lead guitar")
guile> nigel
"lead guitar"
guile> (define nigel #f)
guile> nigel
#f
guile>
You could then re-define the variable. This all depends on the scope of the variables, of course: see Greg's answer.
You cannot unbind a variable in standard Scheme. You could set! the variable to 'undefined, I guess, or you could write a metainterpreter which reifies environments, allowing you to introduce your own notion of undefining variables.
I think, if your point is to do the equivalent of "free" or de-allocate, then no you're pretty much out of luck. you can't de-allocate a variable. you CAN re-define it to something small, like #f, but once you've done (define foo 'bar) the variable foo will exist in some form until you end the program.
On the other hand, if you use let, or letrec, of course, the name only exists until the relevant close paren...
I think your question is not stupid. In AutoLISP has unexisting (undefined) variable apriori supposted value "nil" (even if the variable does not exist in memory - it means - if it is not in a table of variables - then the value is "nil" - "false"). It means also false. And it is also empty list. If you program some kind of list processing function, it is enough to make initial test only by:
(if input-list ....)
When you want to explicitly undefine any variable, you may do this:
(setq old-var nil); or: (setq old-var ())
I like it. The keyword "setq" means "define". What is better on bounding and unbounding variables in other dialects? You must test if they exist, if they are lists, you need garbage-collector, you may not undefine variable to explicitly free memory. Following command can not be written if variable "my-list" is not defined:
(define my-list (cons 2 my-list))
So I think the AutoLISP way is for programming much better. Possibilities, that I written, you may use there. Unfortunately the AutoLISP works in some CAD engineering graphical systems only.
How does make-array work in SBCL? Are there some equivalents of new and delete operators in C++, or is it something else, perhaps assembler level?
I peeked into the source, but didn't understand anything.
When using SBCL compiled from source and an environment like Emacs/Slime, it is possible to navigate the code quite easily using M-. (meta-point). Basically, the make-array symbol is bound to multiple things: deftransform definitions, and a defun. The deftransform are used mostly for optimization, so better just follow the function, first.
The make-array function delegates to an internal make-array% one, which is quite complex: it checks the parameters, and dispatches to different specialized implementation of arrays, based on those parameters: a bit-vector is implemented differently than a string, for example.
If you follow the case for simple-array, you find a function which calls allocate-vector-with-widetag, which in turn calls allocate-vector.
Now, allocate-vector is bound to several objects, multiple defoptimizers forms, a function and a define-vop form.
The function is only:
(defun allocate-vector (type length words)
(allocate-vector type length words))
Even if it looks like a recursive call, it isn't.
The define-vop form is a way to define how to compile a call to allocate-vector. In the function, and anywhere where there is a call to allocate-vector, the compiler knows how to write the assembly that implements the built-in operation. But the function itself is defined so that there is an entry point with the same name, and a function object that wraps over that code.
define-vop relies on a Domain Specific Language in SBCL that abstracts over assembly. If you follow the definition, you can find different vops (virtual operations) for allocate-vector, like allocate-vector-on-heap and allocate-vector-on-stack.
Allocation on heap translates into a call to calc-size-in-bytes, a call to allocation and put-header, which most likely allocates memory and tag it (I followed the definition to src/compiler/x86-64/alloc.lisp).
How memory is allocated (and garbage collected) is another problem.
allocation emits assembly code using %alloc-tramp, which in turns executes the following:
(invoke-asm-routine 'call (if to-r11 'alloc-tramp-r11 'alloc-tramp) node)
There are apparently assembly routines called alloc-tramp-r11 and alloc-tramp, which are predefined assembly instructions. A comment says:
;;; Most allocation is done by inline code with sometimes help
;;; from the C alloc() function by way of the alloc-tramp
;;; assembly routine.
There is a base of C code for the runtime, see for example /src/runtime/alloc.c.
The -tramp suffix stands for trampoline.
Have also a look at src/runtime/x86-assem.S.
I need help drawing the relevant portions of the environment model diagram when evaluating this code:
Scheme>(define x 10)
Scheme> ((lambda (x y) (+ (y 3) x)) 6 (lambda (w) (* x 9)))
I need to make sure and write each lambda body next to the environment in which it is being evaluated.
Okay I know that there is only one define so most of the work will be done by “anonymous” or “nameless” functions and these will still show up in various ways in the environment model diagram
In addition to the answers already given, the 6.001 course at MIT has two very comprehensive lectures on the environment model, the reasons for its existence, as well as some very helpful and fine-grained step-by-step examples:
Lecture 1
Lecture 2
Hope this helps,
Jason
If I remember correctly, whenever you execute a lambda, a new environment is created where the arguments' values are bound to their names. This environment inherits from whichever environment the lambda was originally declared in.
The first environment in all cases is the global environment--this is where the (define x 10) resides. Then, as I said before, add a new environment whenever you execute a lambda (as in the second line). This environment inherits from whichever environment the lambda was executed in.
The first thing you did (starting with the second line) is call the first lambda. To do this, you have to evaluate the arguments. Since you evaluate the arguments before actually entering the first lambda, the second lambda is declared in the global environment.
Next, an environment is created for the first lambda's call (inheriting from the global environment). Here x is bound to 6 and y is bound to the second lambda. Then, to do the +, the second lambda is called. Since it was declared in the global environment, its new environment inherits from this rather than from the first lambda's environment. This means that, for the second one, x is bound to 10 rather than 6.
I hope this explains everything understandably.
To clarify: there are going to be three environments--the global environment and one environment per function invocation. Both of the function invocations' environments will inherit from the global environment. The first lambda's code will run in its own environment, while the second lambda's code will run the the second lambda's.
Additionally, check out envdraw, which can be found here: http://inst.eecs.berkeley.edu/~cs3s/stk/site-scheme/envdraw/
If you read the ANNOUNCE file, it will tell you how to get it. You'll need to use STk, a particular Scheme interpreter.
envdraw draws environment diagrams for Scheme automatically.
Disclaimer: I never bothered with envdraw when taking the class that used Scheme, but it was endorsed by my professor (apparently one of his students wrote it back in the day) and other people seemed to do fine using it.
The object returned by delay in Scheme is "a promise", but promises are not considered to be a type (so there is no promise? procedure, and it's not listed as a type in R5RS or R6RS).
Is there a strong reson why this is so? It would seem quite natural to me to do something like (if (promise? x) (force x) x), for example. (And I see that some implementations will let me force non-promises, and others will not). Also, if I can store something in a variale and pass it around, I feel like it should have a type.
There can't be that strong a reason, since MIT/GNU scheme, defines a promise? function.
I think it allows for a more optimized implementation of delay/force. The fact that the forced value can be memoized (so that a promise is really forced only once and the resulting value is returned on subsequent force calls) blurs the distinction between a promise and its resulting value. If you have promise? you cannot substitute a forced promise by its value everywhere it is needed. Therefore, depending on the implementation, a promise can be indistinguishable from any other Scheme value.