R6RS vs. R5RS scheme - scheme

I'm relatively new to scheme and am having a hard time finding a concrete document online overviewing the major changes that happened with R6RS. Anyone care to elaborate?

http://community.schemewiki.org/?R6RS has compiled a list of high level changes with some commentary, including:
case sensitive syntax
square brackets are now equivalent to parentheses (e.g., (let ([foo 3]) ...) - this was supported in some scheme implementations but is now part of the standard
retaining the ability to return multiple values
more escape characters in strings, e.g., "\n"
hashtables as a library
multiline and expression comments
http://www.r6rs.org/versions/CHANGES
http://www.r6rs.org/formal-comments/
http://lambda-the-ultimate.org/node/1342
If you're relatively new to scheme and have the fortitude you will get more mileage reading the spec instead of skimming a changelog though...

Related

Does Guile relax restriction on variable name convention by allowing variable names beginning with number?

For example, It seems that 1+2 can be used in Guile as a variable name:
(define 1+2 4)
1+2 ;==>4
I was surprised to find that R6RS appears not to like identifiers whose names start with a digit (unless they are escaped, perhaps?), if I am reading it properly. It looks as if the same is true for R5RS. I have not looked at other specifications.
So, if my readings of the specs are correct, then yes, Guile is relaxing this requirement. However, as I say, I was surprised by this, as, for instance, Racket is perfectly happy with identifiers like 1+, even when using the r5rs language, and such identifiers are very common in other Lisp-family languages (Common Lisp defines 1+ and 1- in the language itself).
It may however be the case that I am misreading the syntax for <identifier> in the specs, or misinterpreting what they mean.

An algorithm for compiler designing?

Recently I am thinking about an algorithm constructed by myself. I call it Replacment Compiling.
It works as follows:
Define a language as well as its operators' precedence, such as
(1) store <value> as <id>, replace with: var <id> = <value>, precedence: 1
(2) add <num> to <num>, replace with: <num> + <num>, precedence: 2
Accept a line of input, such as store add 1 to 2 as a;
Tokenize it: <kw,store><kw,add><num,1><kw,to><2><kw,as><id,a><EOF>;
Then scan through all the tokens until reach the end-of-file, find the operation with highest precedence, and "pack" the operation:
<kw,store>(<kw,add><num,1><kw,to><2>)<kw,as><id,a><EOF>
Replace the "sub-statement", the expression in parenthesis, with the defined replacement:
<kw,store>(1 + 2)<kw,as><id,a><EOF>
Repeat until no more statements left:
(<kw,store>(1 + 2)<kw,as><id,a>)<EOF>
(var a = (1 + 2))
Then evaluate the code with the built-in function, eval().
eval("var a = (1 + 2)")
Then my question is: would this algorithm work, and what are the limitations? Is this algorithm works better on simple languages?
This won't work as-is, because there's no way of deciding the precedence of operations and keywords, but you have essentially defined parsing (and thrown in an interpretation step at the end). This looks pretty close to operator-precedence parsing, but I could be wrong in the details of your vision. The real keys to what makes a parsing algorithm are the direction/precedence it reads the code, whether the decisions are made top-down (figure out what kind of statement and apply the rules) or bottom-up (assemble small pieces into larger components until the types of statements are apparent), and whether the grammar is encoded as code or data for a generic parser. (I'm probably overlooking something, but this should give you a starting point to make sense out of further reading.)
More typically, code is generally parsed using an LR technique (LL if it's top-down) that's driven from a state machine with look-ahead and next-step information, but you'll also find the occasional recursive descent. Since they're all doing very similar things (only implemented differently), your rough algorithm could probably be refined to look a lot like any of them.
For most people learning about parsing, recursive-descent is the way to go, since everything is in the code instead of building what amounts to an interpreter for the state machine definition. But most parser generators build an LL or LR compiler.
And I'm obviously over-simplifying the field, since you can see at the bottom of the Wikipedia pages that there's a smattering of related systems that partly revolve around the kind of grammar you have available. But for most languages, those are the big-three algorithms.
What you've defined is a rewriting system: https://en.wikipedia.org/wiki/Rewriting
You can make a compiler like that, but it's hard work and runs slowly, and if you do a really good job of optimizing it then you'll get conventional table-driven parser. It would be better in the end to learn about those first and just start there.
If you really don't want to use a parser generating tool, then the easiest way to write a parser for a simple language by hand is usually recursive descent: https://en.wikipedia.org/wiki/Recursive_descent_parser

Atom escaping rules in Prolog

I need to export to a file a Prolog program expressed using an arbitrary term representation in Java. The idea is that a Prolog interpreter should be able to consult the generated file afterwards.
My question is about the correct way to write in the file Java Strings representing atom terms.
For example, if the string has a space in the middle, it should be surrounded by single quotes in the file:
hello world becomes 'hello world'
And the exporter should take into consideration characters that should be escaped:
' becomes '\''
Could someone point me to the place were these rules are specified?, and: Can I assume that these rules are respected by major Prolog implementors? (I mean, a Prolog program generated following these rules would be correctly parsed by most Prolog interpreters?).
The precise place for this is the standard, ISO/IEC 13211-1:1995, quoted_token (* 6.4.2 *). See this answer how to get it for USD 30.
The precise syntax is quite complex due to a lot of extras like continuation lines and the like. If you are only writing atoms that should be read by Prolog, things are a bit easier. Also in that situation, you could always quote, which makes writing again a bit simpler.
Some things to be aware of:
Only simple spaces may occur as layout in a quoted atom. All other spaces need to be escaped like \t, \n (abrftnv). Many systems accept also other layout but they differ to each other in very tiny details.
Backslash and quote must be escaped.
Characters outside the printable ASCII range depend on the PCS supported by a system. In a conforming system, the accompanying documentation should define how the additional characters (extended characters) are classified. Documentation quality varies on a wide range.
In any case, test your interface also with GNU-Prolog from 1.4.1 upwards. To date, no differences are known between GNU 1.4.1+ and the standard as far as syntax is concerned.
Here are some 240+ syntax related test cases. Please report any oversight!
A practical hint: if you issue a writeq with your Prolog, with data you need to know about, you'll get quotes around when required.

How does addition work in racket/scheme?

For example, if you try (+ 3 4), how is it broken down and calculated in the source, specifically? Does it use recursion with add1?
The implementation of + is actually much more complicated than you might expect, because arithmetic is generic in Racket: it works on integers, rational numbers, complex numbers, and so on. You can even mix and match these kinds of numbers and it'll do the right thing. In the end, it's ultimately going to use arithmetic in C, which is what the runtime system is written in.
If you're curious, you can find more of the guts of the numeric tower here: https://github.com/plt/racket/blob/master/src/racket/src/numarith.c
Other pointers: Bignum arithmetic, the Scheme numeric tower, the Racket reference on numbers.
The + operator is a primitive operation, part of the core language. For efficiency reasons, it wouldn't make much sense to implement it as a recursive procedure.

How does the Scheme function inexact->exact operate?

How does the Scheme procedure inexact->exact, described in SICP, operate?
The Scheme standard only gives some general constraints on how exactness/inexactness is recorded, but most Scheme implementations, up to standard R5RS, operate as follows (MIT Scheme, which is SICP's "mother tongue", also works this way):
The type information for each cell that contains data of a numeric type says whether the data is exact or inexact.
Arithmetic operations on the data record derive the exactness of the result from the exactness of the inputs, where generally inexactness is infectious: if any of the operands is inexact, the result probably will be so too. Note, though, Scheme implementations are allowed to infer exactness in special cases, say if you multiply inexact 4.3 by exact 0, you can know the result is 0 exactly.
The special operations inexact->exact and exact->inexact are casts on the numeric types, ensuring that the resulting type is exact or inexact respectively.
Some points: first, different scheme standards vary in when operators give exactness or not; the standards underdetermine what happens. For example, several Scheme implementations have representations for exact rationals, allowing (/ 1 3) to be represented exactly, where a Scheme implementation with only floats must represent this inexactly.
Second, R6RS has a different notion of contagion from that of SICP and earlier standards, because the older criterion is, frankly, broken.
Exactness is simply a property of a number: it doesn't change the value of the number itself. So, for an implementation that uses a flag to indicate exactness, inexact->exact simply sets the exactness flag on that number.

Resources