Related
I am trying to understand how the Scheme meta-circular evaluator handles quoted expressions differently than symbolic data.
The accepted answer Stack Overflow question What exactly is a symbol in lisp/scheme? defines the "symbol" data object in Scheme:
In Scheme and Racket, a symbol is like an immutable string that happens to be interned
The accepted answer writes that in Scheme, there is a built-in correspondence between identifiers and symbols:
To call a method, you look up the symbol that corresponds to the method name. Lisp/Scheme/Racket makes that really easy, because the language already has a built-in correspondence between identifiers (part of the language's syntax) and symbols (values in the language).
To understand the correspondance, I read the page "A Note on Identifiers" in An Introduction to Scheme and Its Implementation, which says
Scheme identifiers (variable names and special form names and keywords) have almost the same restrictions as Scheme symbol object character sequences, and it's no coincidence. Most implementations of Scheme happen to be written in Scheme, and symbol objects are used in the interpreter or compiler to represent variable names.
Based on the above, I'm wondering if my understanding of what is happening in the following session is correct:
user#host:/home/user $ scheme
MIT/GNU Scheme running under GNU/Linux
Type `^C' (control-C) followed by `H' to obtain information about interrupts.
Copyright (C) 2011 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Image saved on Sunday February 7, 2016 at 10:35:34 AM
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41 || LIAR/x86-64 4.118 || Edwin 3.116
1 ]=> (define a (lambda (i) (+ i 1)))
;Value: a
1 ]=> a
;Value 13: #[compound-procedure 13 a]
1 ]=> (quote a)
;Value: a
1 ]=> (eval a (the-environment))
;Value 13: #[compound-procedure 13 a]
1 ]=> (eval (quote a) (the-environment))
;Value 13: #[compound-procedure 13 a]
1 ]=>
The first define statement is a special form captured by the evaluator, which creates a binding for the symbol a to a compound procedure object in the global environment.
Writing a in the top-level causes the evaluator to receive the symbol object 'a, which evaluates to the compound-procedure object that 'a points to in the global environment.
Writing (quote a) in the top-level causes the evaluator to receive a list of symbols ('quote 'a)); this expression is a special form captured by the evaluator, which evaluates to the quoted expression, namely the symbol object 'a.
Writing (eval a (the-environment)) causes the evaluator to receive a list of symbols ('eval 'a ...) (ignoring the environment). The evaluator performs a lookup for 'eval, which yields the eval compiled procedure object, a lookup for 'a, which yields the compound-procedure. Finally, the top-level evaluator applies the eval procedure to its arguments, since a compound-procedure is self-evaluating (not true in Scheme48), the final value of the expression is the compound-procedure itself.
Writing (eval (quote a) (the-environment)) causes the evaluator to receive a list of symbols ('eval ('quote 'a) ...). The evaluator performs a lookup for 'eval, which yields the eval compiled procedure object. It evaluates the expression ('quote 'a) which yields the symbol object 'a. Finally, the top-level evaluator applies the eval procedure to 'a, which is a symbol object and therefore invokes an environment lookup that yields the compound procedure.
Does this explanation correctly describe (at a high level) how a Scheme interpreter might differentiate between symbol objects and identifiers in the language? Are there fundamental misunderstandings in these descriptions?
The R6RS Scheme report, in 4.2 Lexical Syntax, uses the term identifer to refer to the character-level syntax. That is to say, roughly, identifier means something like the lexical token from which a symbol is constructed when the expression becomes an object. However, elsewhere in the text, identifier seems to be freely used as a synonym for symbol. E.g. "Scheme allows identifiers to stand for locations containing values. These identifiers are called variables." (1.3 Variables and Binding). Basically, the spec seems to be loose with regard to this terminology. Depending on context, an identifier is either the same thing as a symbol (an object), or else <identifier>: the grammar category from the lexical syntax.
In a sentence which says something like that a certain character may or may not appear in an identifier, the context is clearly lexical syntax, because a symbol object is an atom and not a character string; it doesn't contain anything. But when we talk about an identifier denoting a memory location (being a variable), that's the symbol; we're past the issue of what kinds of tokens can produce the symbol in the textual source code.
The An Introduction to Scheme and Its Implementation tutorial linked to in the question is using its own peculiar definition of identifier which is at odds with the Scheme language. It implies that identifiers are "variable names, and special form names and keywords" (so that symbols which are not variable names are not identifiers, which is not supported by the specification).
ObPreface: Apologies in advance for telling you things you already know!
Your very first sentence is raising big XY question issues for me. You write "I am trying to understand how the Scheme meta-circular evaluator handles quoted expressions differently than symbolic data." What do you mean by "the Scheme meta-circular evaluator"? Also, what do you mean by "symbolic data"? Both of these terms suggest to me that you want to ask some more high-level questions.
Regardless, your title suggests a question about the difference between identifiers and symbols. The difference is this:
"Identifiers" are a syntactic category. That is, suppose we take a text file and break it up into tokens. Some of those tokens will be left-parens. Some will be right-parens. Some will be numbers. Some will be identifiers. Every language has its own set of syntactic categories, but many of them use the name "identifier" for "word-like thing that can usually be a function name or a variable name or whatever."
"Symbols", on the other hand, are a particular kind of value in Scheme and Lisp systems. Scheme has lots of different kinds of values: Numbers, Booleans, Strings, Pairs, Symbols, and others.
In Scheme, when developing a parser/interpreter/compiler/whatever, it turns out to be very convenient to use symbols (the values) to represent identifiers (the syntactic entities). Specifically, "quote" has a special ability to turn certain host language token sequences into lists of symbols, numbers, strings, and booleans. You don't need to take advantage of this, but it eliminates a lot of code.
I know that you can use ' (aka quote) to create a list, and I use this all the time, like this:
> (car '(1 2 3))
1
But it doesn’t always work like I’d expect. For example, I tried to create a list of functions, like this, but it didn’t work:
> (define math-fns '(+ - * /))
> (map (lambda (fn) (fn 1)) math-fns)
application: not a procedure;
expected a procedure that can be applied to arguments
given: '+
When I use list, it works:
> (define math-fns (list + - * /))
> (map (lambda (fn) (fn 1)) math-fns)
'(1 -1 1 1)
Why? I thought ' was just a convenient shorthand, so why is the behavior different?
TL;DR: They are different; use list when in doubt.
A rule of thumb: use list whenever you want the arguments to be evaluated; quote “distributes” over its arguments, so '(+ 1 2) is like (list '+ '1 '2). You’ll end up with a symbol in your list, not a function.
An in-depth look at list and quote
In Scheme and Racket, quote and list are entirely different things, but since both of them can be used to produce lists, confusion is common and understandable. There is an incredibly important difference between them: list is a plain old function, while quote (even without the special ' syntax) is a special form. That is, list can be implemented in plain Scheme, but quote cannot be.
The list function
The list function is actually by far the simpler of the two, so let’s start there. It is a function that takes any number of arguments, and it collects the arguments into a list.
> (list 1 2 3)
(1 2 3)
This above example can be confusing because the result is printed as a quoteable s-expression, and it’s true, in this case, the two syntaxes are equivalent. But if we get slightly more complicated, you’ll see that it is different:
> (list 1 (+ 1 1) (+ 1 1 1))
(1 2 3)
> '(1 (+ 1 1) (+ 1 1 1))
(1 (+ 1 1) (+ 1 1 1))
What’s going on in the quote example? Well, we’ll discuss that in a moment, but first, take a look at list. It’s just an ordinary function, so it follows standard Scheme evaluation semantics: it evaluates each of its arguments before they get passed to the function. This means that expressions like (+ 1 1) will be reduced to 2 before they get collected into the list.
This behavior is also visible when supplying variables to the list function:
> (define x 42)
> (list x)
(42)
> '(x)
(x)
With list, the x gets evaluated before getting passed to list. With quote, things are more complicated.
Finally, because list is just a function, it can be used just like any other function, including in higher-order ways. For example, it can be passed to the map function, and it will work appropriately:
> (map list '(1 2 3) '(4 5 6))
((1 4) (2 5) (3 6))
The quote form
Quotation, unlike list, is a special part of Lisps. The quote form is special in part because it gets a special reader abbreviation, ', but it’s also special even without that. Unlike list, quote is not a function, and therefore it does not need to behave like one—it has rules of its own.
A brief discussion of Lisp source code
In Lisp, of which Scheme and Racket are derivatives, all code is actually made up of ordinary data structures. For example, consider the following expression:
(+ 1 2)
That expression is actually a list, and it has three elements:
the + symbol
the number 1
the number 2
All of these values are normal values that can be created by the programmer. It’s really easy to create the 1 value because it evaluates to itself: you just type 1. But symbols and lists are harder: by default, a symbol in the source code does a variable lookup! That is, symbols are not self-evaluating:
> 1
1
> a
a: undefined
cannot reference undefined identifier
As it turns out, though, symbols are basically just strings, and in fact we can convert between them:
> (string->symbol "a")
a
Lists do even more than symbols, because by default, a list in the source code calls a function! Doing (+ 1 2) looks at the first element in the list, the + symbol, looks up the function associated with it, and invokes it with the rest of the elements in the list.
Sometimes, though, you might want to disable this “special” behavior. You might want to just get the list or get the symbol without it being evaluated. To do this, you can use quote.
The meaning of quotation
With all this in mind, it’s pretty obvious what quote does: it just “turns off” the special evaluation behavior for the expression that it wraps. For example, consider quoteing a symbol:
> (quote a)
a
Similarly, consider quoteing a list:
> (quote (a b c))
(a b c)
No matter what you give quote, it will always, always spit it back out at you. No more, no less. That means if you give it a list, none of the subexpressions will be evaluated—do not expect them to be! If you need evaluation of any kind, use list.
Now, one might ask: what happens if you quote something other than a symbol or a list? Well, the answer is... nothing! You just get it back.
> (quote 1)
1
> (quote "abcd")
"abcd"
This makes sense, since quote still just spits out exactly what you give it. This is why “literals” like numbers and strings are sometimes called “self-quoting” in Lisp parlance.
One more thing: what happens if you quote an expression containing quote? That is, what if you “double quote”?
> (quote (quote 3))
'3
What happened there? Well, remember that ' is actually just a direct abbreviation for quote, so nothing special happened at all! In fact, if your Scheme has a way to disable the abbreviations when printing, it will look like this:
> (quote (quote 3))
(quote 3)
Don’t be fooled by quote being special: just like (quote (+ 1)), the result here is just a plain old list. In fact, we can get the first element out of the list: can you guess what it will be?
> (car (quote (quote 3)))
quote
If you guessed 3, you are wrong. Remember, quote disables all evaluation, and an expression containing a quote symbol is still just a plain list. Play with this in the REPL until you are comfortable with it.
> (quote (quote (quote 3)))
''3
(quote (1 2 (quote 3)))
(1 2 '3)
Quotation is incredibly simple, but it can come off as very complex because of how it tends to defy our understanding of the traditional evaluation model. In fact, it is confusing because of how simple it is: there are no special cases, there are no rules. It just returns exactly what you give it, precisely as stated (hence the name “quotation”).
Appendix A: Quasiquotation
So if quotation completely disables evaluation, what is it good for? Well, aside from making lists of strings, symbols, or numbers that are all known ahead of time, not much. Fortunately, the concept of quasiquotation provides a way to break out of the quotation and go back into ordinary evaluation.
The basics are super simple: instead of using quote, use quasiquote. Normally, this works exactly like quote in every way:
> (quasiquote 3)
3
> (quasiquote x)
x
> (quasiquote ((a b) (c d)))
((a b) (c d))
What makes quasiquote special is that is recognizes a special symbol, unquote. Wherever unquote appears in the list, then it is replaced by the arbitrary expression it contains:
> (quasiquote (1 2 (+ 1 2)))
(1 2 (+ 1 2))
> (quasiquote (1 2 (unquote (+ 1 2))))
(1 2 3)
This lets you use quasiquote to construct templates of sorts that have “holes” to be filled in with unquote. This means it’s possible to actually include the values of variables inside of quoted lists:
> (define x 42)
> (quasiquote (x is: (unquote x)))
(x is: 42)
Of course, using quasiquote and unquote is rather verbose, so they have abbreviations of their own, just like '. Specifically, quasiquote is ` (backtick) and unquote is , (comma). With those abbreviations, the above example is much more palatable.
> `(x is: ,x)
(x is: 42)
One final point: quasiquote actually can be implemented in Racket using a rather hairy macro, and it is. It expands to usages of list, cons, and of course, quote.
Appendix B: Implementing list and quote in Scheme
Implementing list is super simple because of how “rest argument” syntax works. This is all you need:
(define (list . args)
args)
That’s it!
In contrast, quote is a lot harder—in fact, it’s impossible! It would seem totally feasible, since the idea of disabling evaluation sounds a lot like macros. Yet a naïve attempt reveals the trouble:
(define fake-quote
(syntax-rules ()
((_ arg) arg)))
We just take arg and spit it back out... but this doesn’t work. Why not? Well, the result of our macro will be evaluated, so all is for naught. We might be able to expand to something sort of like quote by expanding to (list ...) and recursively quoting the elements, like this:
(define impostor-quote
(syntax-rules ()
((_ (a . b)) (cons (impostor-quote a) (impostor-quote b)))
((_ (e ...)) (list (impostor-quote e) ...))
((_ x) x)))
Unfortunately, though, without procedural macros, we can’t handle symbols without quote. We could get closer using syntax-case, but even then, we would only be emulating quote’s behavior, not replicating it.
Appendix C: Racket printing conventions
When trying the examples in this answer in Racket, you may find that they do not print as one would expect. Often, they may print with a leading ', such as in this example:
> (list 1 2 3)
'(1 2 3)
This is because Racket, by default, prints results as expressions when possible. That is, you should be able to type the result into the REPL and get the same value back. I personally find this behavior nice, but it can be confusing when trying to understand quotation, so if you want to turn it off, call (print-as-expression #f), or change the printing style to “write” in the DrRacket language menu.
The behavior you are seeing is a consequence of Scheme not treating symbols as functions.
The expression '(+ - * /) produces a value which is a list of symbols. That's simply because (+ - * /) is a list of symbols, and we are just quoting it to suppress evaluation in order to get that object literally as a value.
The expression (list + - * /) produces a list of functions. This is because it is a function call. The symbolic expressions list, +, -, * and / are evaluated. They are all variables which denote functions, and so are reduced to those functions. The list function is then called, and returns a list of those remaining four functions.
In ANSI Common Lisp, calling symbols as functions works:
[1]> (mapcar (lambda (f) (funcall f 1)) '(+ - * /))
(1 -1 1 1)
When a symbol is used where a function is expected, the top-level function binding of the symbol is substituted, if it has one, and everything is cool. In effect, symbols are function-callable objects in Common Lisp.
If you want to use list to produce a list of symbols, just like '(+ - * /), you have to quote them individually to suppress their evaluation:
(list '+ '- '* '/)
Back in the Scheme world, you will see that if you map over this, it will fail in the same way as the original quoted list. The reason is the same: trying to use a symbol objects as a functions.
The error message you are being shown is misleading:
expected a procedure that can be applied to arguments
given: '+
This '+ being shown here is (quote +). But that's not what the application was given; it was given just +, the issue being that the symbol object + isn't usable as a function in that dialect.
What's going on here is that the diagnostic message is printing the + symbol in "print as expression" mode, a feature of Racket, which is what I guess you're using.
In "print as expression" mode, objects are printed using a syntax which must be read and evaluated to produce a similar object. See the StackOverflow question "Why does the Racket interpreter write lists with an apostroph before?"
(Though this is indeed simple question, I find sometimes it is common mistakes that I made when writing Scheme program as a beginner.)
I encountered some confusion about the define special form. A situation is like below:
(define num1
2)
(define (num2)
2)
I find it occurs quite often that I call num2 without the parentheses and program fails. I usually end up spending hours to find the cause.
By reading the r5rs, I realized that definition without parenthesis, e.g. num1, is a variable; while definition with parenthesis, e.g. num2, is a function without formal parameters.
However, I am still blurred about the difference between a "variable" and "function".
From a emacs lisp background, I can only relate above difference to similar idea as in emacs lisp:
In Emacs Lisp, a symbol can have a value attached to it just as it can
have a function definition attached to it.
[here]
Question: Is this a correct way of understanding the difference between enclosed and non-enclosed definitions in scheme?
There is no difference between a value and a function in Scheme. A function is just a value that can be used in a particular way - it can be called (as opposed to other kinds of value, such as numbers, which cannot be called, but can be e.g. added, which a function cannot).
The parentheses are just a syntactic shortcut - they're a faster, more readable (to experts) way of writing out the definition of the name as a variable containing a function:
(define (num)
2)
;is exactly the same as
(define num
(lambda () 2) )
The second of these should make it more visually obvious that the value being assigned to num is not a number.
If you wanted the function to take arguments, they would either go within the parentheses (after num, e.g. (num x y) in the first form, or within lambda's parentheses (e.g. (lambda (x y)... in the second.
Most tutorials for the total beginner actually don't introduce the first form for several exercises, in order to drive home the point that it isn't separate and doesn't really provide any true functionality on its own. It's just a shorthand to reduce the amount of repetition in your program's text.
In Scheme, all functions are values; variables hold any one value.
In Scheme, unlike Common Lisp and Emacs Lisp, there are no different namespaces for functions and other values. So the statement you quoted is not true for Scheme. In Scheme a symbol is associated with at most one value and that value may or may not be a function.
As to the difference between a non-function value and a nullary function returning that value: In your example the only difference is that, as you know, num2 must be applied to get the numeric value and num1 does not have to be and in fact can't be applied.
In general the difference between (define foo bar) and (define (foo) bar) is that the former evaluated bar right now and foo then refers to the value that bar has been evaluated to, whereas in the latter case bar is evaluated each time that (foo) is used. So if the expression foo is costly to calculate, that cost is paid when (and each time) you call the function, not at the definition. And, perhaps more importantly, if the expression has side effects (like, for example, printing something) those effects happen each time the function is called. Side effects are the primary reason you'd define a function without parameters.
Even though #sepp2k has answered the question, I will make it more clearer with example:
1 ]=> (define foo1 (display 23))
23
;Value: foo1
1 ]=> foo1
;Unspecified return value
Notice in the first one, foo1 is evaluated on the spot (hence it prints) and evaluated value is assigned to name foo1. It doesn't evaluate again and again
1 ]=> (define (foo2) (display 23))
;Value: foo2
1 ]=> foo2
;Value 11: #[compound-procedure 11 foo2]
1 ]=> (foo2)
23
;Unspecified return value
Just foo2 will return another procedure (which is (display 23)). Doing (foo2) actually evaluates it. And each time on being called, it re-evaluates again
1 ]=> (foo1)
;The object #!unspecific is not applicable.
foo1 is a name that refers a value. So Applying foo1 doesn't make sense as in this case that value is not a procedure.
So I hope things are clearer. In short, in your former case it is a name that refers to value returned by evaluating expression. In latter case, each time it is evaluated.
I've started trying to learn the innards of Scheme evaluation, and one aspect of quasiquotation, unquoting, evaluation and cons-cells is confusing me. If you can recommend any good references on the subject I'd be very grateful.
The R7RS draft has this example in section 4.2.8 on quasiquotation.
`(( foo ,(- 10 3)) ,#(cdr '(c)) . ,(car '(cons)))
(It's in the R4RS spec too, so this isn't a new thing.)
According to the spec this should evaluate to:
((foo 7) . cons)
I'm having some trouble understanding why. To my mind, the . removes the unquote from the start of the inner list, meaning it won't be evaluated as a procedure.
Here's a simpler expression that demonstrates the same problem:
`(foo . ,(car '(bar)))
Using the same logic as above, this should evaluate to:
(foo . bar)
And indeed it does evaluate to that on the Scheme systems I've tried.
However, to my understanding it shouldn't evaluate to that, so I want to find out where I'm going wrong.
My understanding of Scheme evaluation is (OK, simplified) if it's the first keyword after an open-bracket, call that procedure with the remainder of the list as the parameters.
My understanding of the spec is that ',' is exactly equivalent to wrapping the next expression in an '(unquote' procedure.
My understanding of the dot notation is that, for general display purposes, you remove the dot and opening parenthesis (and matching closing parenthesis), as described here:
In general, the rule for printing a pair is as follows: use the dot
notation always, but if the dot is immediately followed by an open
parenthesis, then remove the dot, the open parenthesis, and the
matching close parenthesis.
So:
`(foo . ,(car '(bar)))
Could equally be rendered as:
(quasiquote (foo unquote (car (quote (bar)))))
(In fact, this is how jsScheme will render the input in its log window.)
However, when it comes to evaluating this:
(quasiquote (foo unquote (car (quote (bar)))))
Why is the 'unquote' evaluated (as a procedure?), unquoting and evaluating the (car...) list? Surely it should just be treated as a quoted symbol, since it's not after an opening bracket?
I can think of a number of possible answers - 'unquote' isn't a regular procedure, the 'unquote' is evaluated outside of the regular evaluation process, there's a different way to indicate a procedure to be called other than a '(' followed by the procedure's symbol - but I'm not sure which is right, or how to dig for more information.
Most of the scheme implementations I've seen handle this using a macro rather than in the same language as the evaluator, and I'm having difficulty figuring out what's supposed to be going on. Can someone explain, or show me any good references on the subject?
You are correct in that there are macros involved: in particular, quasiquote is a macro, and unquote and unquote-splicing are literals. None of those are procedures, so normal evaluation rules do not apply.
Thus, it's possible to give (quasiquote (foo bar baz unquote x)) the special treatment it needs, despite unquote not being the first syntax element.
I have found this question about the special function "or" in scheme:
Joe Hacker states loudly that there is no reason or in Scheme needs to be special -- it can just be defined by the programmer, like this:
(define (or x y)
(if x
#t
y))
Is Joe right?
I can't figure out why it shouldn't be possible to do that.
Could some scheme-expert please explain if this works, and if no: why not?
It's because this version of or evaluates all of its arguments (since it's a function), while the standard Scheme or (which is not a function but special syntax) doesn't. Try running (or #t (exit)) at the Scheme REPL and then try the same with your or function.
The behavior of the standard or is sometimes called short-circuited: it evaluates only those arguments that it needs to. This is very common for the binary boolean operator (or and and) across programming languages. The fact that or looks like a function call is a feature of Scheme/Lisp syntax, but looks deceive.
Whether it works or not depends on what you want it to do. It certainly works in the sense that for two given boolean values it will return the expected resulted. However it will not be functionally equivalent to regular or because it does not short-circuit, i.e. given your definition (or #t (/ 0 0)) will cause an error because you're dividing 0 by 0 while using regular or it would just return #t and not try to evaluate (/ 0 0) at all.