Difference between an identifier and symbol in scheme? - scheme

I am trying to understand how the Scheme meta-circular evaluator handles quoted expressions differently than symbolic data.
The accepted answer Stack Overflow question What exactly is a symbol in lisp/scheme? defines the "symbol" data object in Scheme:
In Scheme and Racket, a symbol is like an immutable string that happens to be interned
The accepted answer writes that in Scheme, there is a built-in correspondence between identifiers and symbols:
To call a method, you look up the symbol that corresponds to the method name. Lisp/Scheme/Racket makes that really easy, because the language already has a built-in correspondence between identifiers (part of the language's syntax) and symbols (values in the language).
To understand the correspondance, I read the page "A Note on Identifiers" in An Introduction to Scheme and Its Implementation, which says
Scheme identifiers (variable names and special form names and keywords) have almost the same restrictions as Scheme symbol object character sequences, and it's no coincidence. Most implementations of Scheme happen to be written in Scheme, and symbol objects are used in the interpreter or compiler to represent variable names.
Based on the above, I'm wondering if my understanding of what is happening in the following session is correct:
user#host:/home/user $ scheme
MIT/GNU Scheme running under GNU/Linux
Type `^C' (control-C) followed by `H' to obtain information about interrupts.
Copyright (C) 2011 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Image saved on Sunday February 7, 2016 at 10:35:34 AM
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41 || LIAR/x86-64 4.118 || Edwin 3.116
1 ]=> (define a (lambda (i) (+ i 1)))
;Value: a
1 ]=> a
;Value 13: #[compound-procedure 13 a]
1 ]=> (quote a)
;Value: a
1 ]=> (eval a (the-environment))
;Value 13: #[compound-procedure 13 a]
1 ]=> (eval (quote a) (the-environment))
;Value 13: #[compound-procedure 13 a]
1 ]=>
The first define statement is a special form captured by the evaluator, which creates a binding for the symbol a to a compound procedure object in the global environment.
Writing a in the top-level causes the evaluator to receive the symbol object 'a, which evaluates to the compound-procedure object that 'a points to in the global environment.
Writing (quote a) in the top-level causes the evaluator to receive a list of symbols ('quote 'a)); this expression is a special form captured by the evaluator, which evaluates to the quoted expression, namely the symbol object 'a.
Writing (eval a (the-environment)) causes the evaluator to receive a list of symbols ('eval 'a ...) (ignoring the environment). The evaluator performs a lookup for 'eval, which yields the eval compiled procedure object, a lookup for 'a, which yields the compound-procedure. Finally, the top-level evaluator applies the eval procedure to its arguments, since a compound-procedure is self-evaluating (not true in Scheme48), the final value of the expression is the compound-procedure itself.
Writing (eval (quote a) (the-environment)) causes the evaluator to receive a list of symbols ('eval ('quote 'a) ...). The evaluator performs a lookup for 'eval, which yields the eval compiled procedure object. It evaluates the expression ('quote 'a) which yields the symbol object 'a. Finally, the top-level evaluator applies the eval procedure to 'a, which is a symbol object and therefore invokes an environment lookup that yields the compound procedure.
Does this explanation correctly describe (at a high level) how a Scheme interpreter might differentiate between symbol objects and identifiers in the language? Are there fundamental misunderstandings in these descriptions?

The R6RS Scheme report, in 4.2 Lexical Syntax, uses the term identifer to refer to the character-level syntax. That is to say, roughly, identifier means something like the lexical token from which a symbol is constructed when the expression becomes an object. However, elsewhere in the text, identifier seems to be freely used as a synonym for symbol. E.g. "Scheme allows identifiers to stand for locations containing values. These identifiers are called variables." (1.3 Variables and Binding). Basically, the spec seems to be loose with regard to this terminology. Depending on context, an identifier is either the same thing as a symbol (an object), or else <identifier>: the grammar category from the lexical syntax.
In a sentence which says something like that a certain character may or may not appear in an identifier, the context is clearly lexical syntax, because a symbol object is an atom and not a character string; it doesn't contain anything. But when we talk about an identifier denoting a memory location (being a variable), that's the symbol; we're past the issue of what kinds of tokens can produce the symbol in the textual source code.
The An Introduction to Scheme and Its Implementation tutorial linked to in the question is using its own peculiar definition of identifier which is at odds with the Scheme language. It implies that identifiers are "variable names, and special form names and keywords" (so that symbols which are not variable names are not identifiers, which is not supported by the specification).

ObPreface: Apologies in advance for telling you things you already know!
Your very first sentence is raising big XY question issues for me. You write "I am trying to understand how the Scheme meta-circular evaluator handles quoted expressions differently than symbolic data." What do you mean by "the Scheme meta-circular evaluator"? Also, what do you mean by "symbolic data"? Both of these terms suggest to me that you want to ask some more high-level questions.
Regardless, your title suggests a question about the difference between identifiers and symbols. The difference is this:
"Identifiers" are a syntactic category. That is, suppose we take a text file and break it up into tokens. Some of those tokens will be left-parens. Some will be right-parens. Some will be numbers. Some will be identifiers. Every language has its own set of syntactic categories, but many of them use the name "identifier" for "word-like thing that can usually be a function name or a variable name or whatever."
"Symbols", on the other hand, are a particular kind of value in Scheme and Lisp systems. Scheme has lots of different kinds of values: Numbers, Booleans, Strings, Pairs, Symbols, and others.
In Scheme, when developing a parser/interpreter/compiler/whatever, it turns out to be very convenient to use symbols (the values) to represent identifiers (the syntactic entities). Specifically, "quote" has a special ability to turn certain host language token sequences into lists of symbols, numbers, strings, and booleans. You don't need to take advantage of this, but it eliminates a lot of code.

Related

Should numbers in scheme be quoted?

Should numbers in scheme be quoted?
In the following examples (tested in ikarus), it seems that quoting numbers does not matter while too much quoting creates problems.
> (+ '1 1)
2
> (+ '1 '1)
2
> (+ '1 ''1)
1
What is the standard way to use numbers (e.g. in the definition of a function body)? quoted or not quoted?
Numbers in Scheme are self evaluating. That means they act in the same way if they are quoted or not.
If you enter (some 1) in DrRacket and start the Macro stepper and disable macro hiding the call will end up looking like:
(#%app call-with-values (lambda () (#%app some (quote 1))) print-values))
Thus Racket actually quotes the values that are self evaluating because their runtime doesn't support self evaluation in the core language / fully expanded program.
It might be that in some implementations a unquoted and a quoted number will be evaluated differently even if Racket threats them the same, however it would be surprising if it had any real impact.
Most programmers are lazy and would refrain from quoting self evaluating code. The exception would be as communication to the reader. Eg. in Common Lisp nil () and the quoted variants are all the same and could indeed used () everywhere, but many choose to use nil when the object is used as a boolean and '() if it is used as a literal list.
R6RS's definition of quotation says so:
(quote <datum>)‌‌ syntax
Syntax: <Datum> should be a syntactic datum.
Semantics: (quote <datum>) evaluates to the datum value represented by
<datum> (see section 4.3). This notation is used to include constants.
So it is correct to do '"aa" or '123 but I have never seen it, I would find it funny to read code quoting the numbers or other constants.
In older lisps, such as emacs lisp, it is the same (in emacs lisp the syntax is called sexp or S-Expression instead of datun). But the real origin of the quotation's meaning comes from McCarthy and described in A Micro-Manual for Lisp.

Should I be getting an error when modifying a literal list?

The example code below appears in both R5RS (page 26) and R7RS-small (page 41).
(define (g) '(constant-list))
(set-car! (g) 3) ; Error.
The standards say that there should be an error when trying to modify a literal list. However, when I tried to run the code above in MIT Scheme 11.2, Chez Scheme 9.5, Guile Scheme 3.0.1, and Racket 7.2 (plt-r5rs), there is no error at all. Is my understanding incorrect, or are all these Scheme implementations non-compliant with both R5RS and R7RS-small?
The bottom line is that you are misreading the Standards, but this is quite understandable.
R6RS Scheme
Chez Scheme is R6RS. In section 5.10 the R6RS Standard says (emphasis mine):
An attempt to store a new value into a location referred to by an immutable object should raise an exception with condition type &assertion.
But in Chapter 2 of R6RS the meaning of the word "should" is defined for the purposes of the Standard:
should
This word, or the adjective “recommended”, means that valid reasons may exist in particular circumstances to ignore a statement, but that the implications must be understood and weighed before choosing a different course.
This means that implementations may choose whether or not to raise an exception when encountering an attempt to modify an immutable object. This aligns with what is said in The Scheme Programming Language, which was written by Kent Dybvig (the creator of Chez Scheme) who was also one of the editors of R6RS. The passage goes on to emphasize that modification of immutable objects results in unspecified behavior, even when no exception is raised:
Quoted and self-evaluating constants are immutable. That is, programs should not alter a constant via set-car!, string-set!, etc., and implementations are permitted to raise an exception with condition type &assertion if such an alteration is attempted. If an attempt to alter an immutable object is undetected, the behavior of the program is unspecified.
In R6RS Scheme, there is no requirement to raise an exception in this case. Chez Scheme is an implementation of R6RS.
R7RS Scheme
In Section 3.4 of the R7RS Standard, the language has been changed a bit (reverted to the R5RS language) around the discussion of immutable objects:
It is an error to attempt to store a new value into a location that is denoted by an immutable object.
But Section 1.3.2 discusses error reporting requirements:
When speaking of an error situation, this report uses the
phrase “an error is signaled” to indicate that implementations must detect and report the error.
The documentation for set-car! does show this example:
(define (g) ’(constant-list))
(set-car! (g) 3) ⇒ error
But nowhere does it say that "an error is signalled". The above example simply illustrates that evaluation of the form on the left results in an error; the implementation is not required to detect and signal an error in this case.
Even if other implementations do not, it does seem that Chibi Scheme (which is something of a de facto reference implementation for R7RS) raises an error in this case:
> (define (g) '(1 2 3))
> (g)
(1 2 3)
> (set-car! (g) 0)
ERROR on line 3: set-car!: immutable pair: (1 2 3)
R7RS Scheme has exactly the same language as R5RS in this instance, and the example for set-car! is the same. So neither R7RS nor R5RS require implementations to raise an error when an attempt is made to alter a list literal with set-car!, even though it is an error to do so and doing so results in unspecified behavior.

Why are the names of predicates in scheme in the form of questions?

Racket is the first dialect of scheme I am learning, and I’m not that far in, however due to scheme’s minimal syntax, I believe it’s safe to assume that a question mark in variable names is not treated by the interpreter any differently than any other viable character.
With that run on sentence out of the way, why does scheme use the symbol “?” to denote a function that returns true or false (called a predicate)? For example, in racket, there is a built in function called number?. number? returns true when applied to any number (1, 5, -5, 2.7, etc), and false otherwise. I believe that number? is short for something along the lines of is_the_following_argument_a_number?. Assuming that is true, the expression (number? 5) translates into (is_the_following_argument_a_number? 5).
In english (the language this variable was written in), the predicate of “is the following argument a number?” can be found by first translating the question into its statement form by moving the verb: “the following argument is a number”, and then extracting the predicate: “is a number”. Now, I’m not the best at speaking languages as I am at programming languages, but I believe that is correct. Also, sorry if this is turning into an english question more than a scheme question.
What I am having trouble understanding is the fact that if the lisp community calls number? a predicate, why is the variable name not a predicate in english (I say that the variable name isn’t a predicate in english, not the type of function it is in scheme isn’t a predicate) I found what I thought the predicate of what I thought number? translated into, as being “is a number”, not the entire question “is the following argument a number?”, just the predicate. So, why does the lisp community choose to name predicates in scheme as questions in english? I believe that this is because the community mistakes the values of statements (true or false) for the answers to yes/no questions (yes or no (obviously)). Am I wrong to think this?
A predicate in computer science doesn't have anything to do with a predicate in language grammar. They both derive from having to do with thruth but otherwise they are unrelated concepts. A predicate in Scheme is a procedure that checks if something is true or not and in reality it can have any name. However since we can code information in the name it should contain to the point what it is about, which can be any word or even sentence delimited by hyphens, ending with question mark to indicate that it is indeed a predicate procedure. Both the name in the definition and the usage will stand out to the reader so that they know it without looking at the documentation or the implementation.
Scheme predicates in the very first Scheme report and the second looked like Common Lisp and the predicates in Scheme followed the same naming convention as Common Lisp has today. Old procedures that were in LISP 1.5 has the same name without the common p-ending while new introduced ones had it, like procp (called procedure? today). The reason for this is that Scheme run under MacLisp and borrowed all the dull stuff from it while it was lexical closures that were the magic of Scheme. Actually, it looked a lot looked like Common Lisp.
In the RRRS or R2RS they made all predicated end with ? and it worked with eq? and friends but the arithmetic predicates that used symbols, like <?, =?, <=?, etc, was not a success and were removed in the R3RS.
In a conditional we call the parts predicate, consequence and alternative:
(if (< a 0) ; predicate
(- a) ; consequent
a) ; alternative
Here a predicate is just an expression that either turns true or false. Actually all Scheme values are allowed and only #f is false. A predicate procedure is a procedure that always either returns #t or #f and it is as you are writing that number? check whether the argument is a number and string=? checks whether two arguments are strings that look the same. The pattern is very good and you can imagine what it does just by looking at the name being used while keeping the procedure names short. In speech we often do the same, like saying "coffee?" and getting either positive or negative response. It works most of the times and some times people need to spell it out that they are offering them a hot beverage whose name is coffee. In coding that means looking in the documentation or definition of a procedure.
There are other naming conventions in Scheme.
foo->bar is a procedure that takes an argument that is a foo type and it returns it as a bar type. number->string takes a number and makes a string representation of it. (number->string 5) ; ==> "5"
foo! may change the objects you pass it in order to do the job slightly faster than if it was named foo. set! and set-car! are examples.
*variable* are from CL but in Scheme you can be sure it is a global variable.
CONSTANT, +CONSTANT+, +constant+ are common naming for variables that are considered to be constants.
form* does something similar to what form does, but not quite. Special form let* does something similar to let but it binds one variable at a time.
The code works whether you follow these or not, but you are making it easier to read by using this convention and when you try to make a somparison procedure foo=? is just as easy to understand as are-these-two-foo-things-equal and foo? is just as easy as argument-is-a-foo.
Note that other programming languages also does this. In Java one write isFoo and equals so it's not spelled out there either.
It's just a programming convention. Predicates - meaning: those procedures that return true or false, are defined with a name that ends in a question mark. Similarly, Procedures that have side effects (e.g., that mutate state) are defined with a name that ends in exclamation mark.

Two Scheme code samples and their equivalent expression in Common Lisp

I am reading an article which uses Scheme for describing an implementation. I know a bit of Common Lisp but no Scheme.
I am hoping you will be so kind as to explain two Scheme code samples and show me how they correspond to Common Lisp.
First, what does this Scheme mean:
(define (content cell)
(cell ’content))
I believe it means this: define a function named content which has one argument named cell. In Common Lisp it is written as:
(defun content (cell)
(...))
Am I right so far?
I am uncertain what the function's body is doing. Is the argument (cell) actually a function and the body is invoking the function, passing it a symbol, which happens to be the name of the current function? Is this the corresponding Common Lisp:
(defun content (cell)
(funcall cell ’content))
Here is the second Scheme code sample:
(define nothing #(*the-nothing*))
I believe it is creating a global variable and initializing it to #(*the-number*)). So the corresponding Common Lisp is:
(defvar nothing #(*the-nothing*))
Is that right? Does the pound symbol (#) have a special meaning? I'm guessing that *the-nothing* is referring to a global variable, yes?
Broadly speaking: yes to both, with one major caveat. More specifically, the first one is accepting an argument called cell and calling it with the symbol 'content. (BTW, your unicode quotation mark is freaking me out a bit. Is that just a copy-paste issue?)
In the second case, the hash is a shortcut for defining a vector. So, for instance:
(vector? #(abc)) ;; evaluates to #t
However, the hash also has quoting behavior. Just as the first element of
'(a b c)
is the symbol 'a (and not the value of the variable named a), the first value in the vector
#(*the-nothing*)
is the symbol '*the-nothing*, rather than the value of a global variable.

How to capture the return value of `string-search-forward` in Scheme?

I want to write a procedure (function) that checks if a string contains another string. I read the documentation of string library from http://sicp.ai.mit.edu/Fall-2004/manuals/scheme-7.5.5/doc/scheme_7.html
According from to them,
Pattern must be a string. Searches
string for the rightmost occurrence of
the substring pattern. If successful,
the index to the right of the last
character of the matched substring is
returned; otherwise, #f is returned.
This seemed odd to me, cause the return value is either integer or boolean, so what should I compare my return value with?
I tried
(define (case-one str)
(if (= #f (string-search-forward "me" str))
#t
#f))
DrScheme don't like it,
expand: unbound identifier in module in: string-search-forward
Thanks,
string-search-forward is not a standardized Scheme procedure; it is an extension peculiar to the MIT-Scheme implementation (that's why your link goes to the "MIT Scheme Reference Manual.") To see only those procedures that are guaranteed, look at the R5RS document.
In Scheme, #f is the only value that means "false," anything else when used in a conditional expression will mean "true." There is therefore no point in "comparing" it to anything. In cases like string-search-forward that returns mixed types, you usually capture the return value in a variable to test it, then use it if it's non-false:
(let ((result (string-search-forward "me" str)))
(if result
(munge result) ; Execute when S-S-F is successful (result is the index.)
(error "hurf") ; Execute when S-S-F fails (result has the value #f.)
))
A more advanced tactic is to use cond with a => clause which is in a sense a shorthand for the above:
(cond ((string-search-forward "me" str) => munge)
(else (error "hurf")))
Such a form (<test> => <expression>) means that if <test> is a true value, then <expression> is evaluated, which has to be a one-argument procedure; this procedure is called with the value of <test> as an argument.
Scheme has a very small standard library, which is both a blessing (you can make small scheme implementations to embed in an application or device, you can learn the language quickly) and a curse (it's missing a lot of useful functions). string-search-forward is a non-standard function of MIT Scheme, it's not present in DrScheme.
Many library additions are available in the form of SRFIs. An SRFI is a community-adopted extension to the base language — think of it as an optional part of a Scheme implementation. DrScheme (or at least its successor Racket) implements many SRFIs.
DrScheme has a number of string functions as part of SRFI 13. Amongst the string searching functions, there is string-contains, which is similar except that it takes its arguments in the opposite order.
(require srfi/13)
(define (case-one str)
(integer? (string-contains str "me")))
You'll notice that the two implementations used a different argument order (indicating that they were developed independently), yet use the same return value. This illustrates that it's quite natural in Scheme to have a function return different types depending on what it's conveying. In particular, it's fairly common to have a function return a useful piece of information if it can do its job, or #f if it can't do its job. That way, the function naturally combines doing its job (here, returning the index of the substring) with checking whether the job is doable (here, testing whether the substring occurs).
Error message seems a little odd (I don't have drscheme installed unfortunately so can't investigate too much).
Are you sure str is a string?
Additionally = is for integer comparisons only, you can use false? instead.
As for the return value of string-search-forward having mixed types, scheme has the mindset that if any useful value can be returned it should be returned, so this means different return types are common for functions.
Try using srfi-13's string-index: http://docs.racket-lang.org/srfi-std/srfi-13.html#Searching The documentation you are looking at isn't specifically for PLT. and probably corresponds to some other version of Scheme.

Resources