Are there good alternative Scheme syntaxes? - syntax

I imagine Scheme (and perhaps Lisp) could be made more `user friendly' by using a different syntax. For example, instead of nested S-expressions with ugly parentheses, one could devise some kind of syntax closer to some of the more widely used languages (e.g. Java-like without needing to define classes).
It's not necessarily a bad thing if it's more verbose. For example, the syntax may require line separators and commas in the places where many people will expect them, and expect explicit return statements. Also, it doesn't seem that difficult to allow some operators to be used infix style (just obey the generally accepted operator preference rules).
And if it doesn't make things too messy, the syntax could even be backwards-compatible, so that in any place where an expression is expected, a normal S-expression between parentheses can be used.
What are your opinions and ideas about this? And does anything like this exist? (I expect it does, but "Scheme" is a worthless google term, I can't find anything!)

Originally, Lisp was planned to use a syntax called M-Expressions, with S-Expressions being only a transitional solution for easier compiler building. When M-Expressions were ready to be introduces, the programmers who had already taken on Lisp just stayed with what they had become accustomed to, and M-Expressions never caught on.
There is an infix notation in Guile, but it's rarely used. A good Lisp programmer doesn't even see the parens anymore, and prefix notation does have its merits...

I think "sweet expressions" might be one of the more thoughtful approaches to getting rid of the parentheses in Lisp. It apparently even supports macros.
http://www.dwheeler.com/readable/sweet-expressions.html
However, I think most people eventually get over the parentheses or use another language.

Take a look at "sweet-expressions", which provides a set of additional abbreviations for traditional s-expressions. They add syntactically-relevant indentation, a way to do infix, and traditional function calls like f(x). Unlike nearly all past efforts to make Lisps readable, sweet-expressions are backwards-compatible (you can freely mix well-formatted s-expressions and sweet-expressions), generic, and homoiconic.
Sweet-expressions were developed on http://readable.sourceforge.net and there is a sample implementation.
For Scheme there is a SRFI for sweet-expresssions: http://srfi.schemers.org/srfi-110/

Try SRFI 49 for size. :-P
(Seriously, though, as Rafe commented, "I don't think anybody wants this".)

Some people consider Python to be a kind of Scheme with infix notation for operators, algebraic notation for functions and which uses a more "java-like" syntax for representing the language. I don't agree with that assessment, but I can see where the idea comes from.
The big problem with changing the notation for Scheme is that macros become very hard to write (to see how hard, take a look at the Nimrod language or Boo). Instead of working directly with the code as lists, you have to parse the input language first. This usually involves constructing an AST (abstract syntax tree) for the language from the input. When working directly with Scheme, this is unnecessary.
However, you might check out the SIX expression syntax in Gambit Scheme. There's a nice set of slides here which contains a discussion of this:
http://www.iro.umontreal.ca/~gambit/Gambit-inside-out.pdf
But don't tell anyone about it! (The inside joke is that someone suggests writing a Lisp without parentheses and with infix notation about once a day, and someone announces an implementation about once a month.)

There are some languages that do exactly that. For instance: Dylan.

Related

Tips for writing good EBNF grammars

I'm writing some Extended Backus–Naur Form grammars for document parsing. There are lots of excellent guides for the syntax of these definitions, but very little online about how to design and structure them.
Can anyone suggest good articles (or general tips) about how you like to approach writing these as there does seem to be an element of style even if the final parse trees can be equivalent.
e.g. things like:
Deciding if you should explicitly tag newlines, or just treat it as whitespace?
Naming schemes for your nonterminals
Handing optional whitespace in long definitions
When to use bad syntax checks vs just letting those not match
Thanks,
You should work in the direction that you are most comfortable with - either bottom-up, top-down, or "sandwich" (do a little of both, meet somewhere in the middle).
Any "group" that can be derived and has a meaning of its own, should start from it's own non-terminal. So for example, I would use a non-terminal for all newline-related whitespaces, one for all the other whitespaces, and one for all whitespaces (which is basically the union of the former 2).
Naming conventions in grammars in general are that non-terminals are, or start with, a capital letter, and terminals start with non-capitals (but this of course depends on the language you're designing).
Regarding bad syntax checks, I'm not familiar with the concept. What I know of EBNFs are that you just write everything your language accepts, and only that.
Generally, just look around at some EBNFs of different languages from different websites, get a feeling of how they look, and then do what feels right to you.

Pseudocode interpreter?

Like lots of you guys on SO, I often write in several languages. And when it comes to planning stuff, (or even answering some SO questions), I actually think and write in some unspecified hybrid language. Although I used to be taught to do this using flow diagrams or UML-like diagrams, in retrospect, I find "my" pseudocode language has components of C, Python, Java, bash, Matlab, perl, Basic. I seem to unconsciously select the idiom best suited to expressing the concept/algorithm.
Common idioms might include Java-like braces for scope, pythonic list comprehensions or indentation, C++like inheritance, C#-style lambdas, matlab-like slices and matrix operations.
I noticed that it's actually quite easy for people to recognise exactly what I'm triying to do, and quite easy for people to intelligently translate into other languages. Of course, that step involves considering the corner cases, and the moments where each language behaves idiosyncratically.
But in reality, most of these languages share a subset of keywords and library functions which generally behave identically - maths functions, type names, while/for/if etc. Clearly I'd have to exclude many 'odd' languages like lisp, APL derivatives, but...
So my questions are,
Does code already exist that recognises the programming language of a text file? (Surely this must be a less complicated task than eclipse's syntax trees or than google translate's language guessing feature, right?) In fact, does the SO syntax highlighter do anything like this?
Is it theoretically possible to create a single interpreter or compiler that recognises what language idiom you're using at any moment and (maybe "intelligently") executes or translates to a runnable form. And flags the corner cases where my syntax is ambiguous with regards to behaviour. Immediate difficulties I see include: knowing when to switch between indentation-dependent and brace-dependent modes, recognising funny operators (like *pointer vs *kwargs) and knowing when to use list vs array-like representations.
Is there any language or interpreter in existence, that can manage this kind of flexible interpreting?
Have I missed an obvious obstacle to this being possible?
edit
Thanks all for your answers and ideas. I am planning to write a constraint-based heuristic translator that could, potentially, "solve" code for the intended meaning and translate into real python code. It will notice keywords from many common languages, and will use syntactic clues to disambiguate the human's intentions - like spacing, brackets, optional helper words like let or then, context of how variables are previously used etc, plus knowledge of common conventions (like capital names, i for iteration, and some simplistic limited understanding of naming of variables/methods e.g containing the word get, asynchronous, count, last, previous, my etc). In real pseudocode, variable naming is as informative as the operations themselves!
Using these clues it will create assumptions as to the implementation of each operation (like 0/1 based indexing, when should exceptions be caught or ignored, what variables ought to be const/global/local, where to start and end execution, and what bits should be in separate threads, notice when numerical units match / need converting). Each assumption will have a given certainty - and the program will list the assumptions on each statement, as it coaxes what you write into something executable!
For each assumption, you can 'clarify' your code if you don't like the initial interpretation. The libraries issue is very interesting. My translator, like some IDE's, will read all definitions available from all modules, use some statistics about which classes/methods are used most frequently and in what contexts, and just guess! (adding a note to the program to say why it guessed as such...) I guess it should attempt to execute everything, and warn you about what it doesn't like. It should allow anything, but let you know what the several alternative interpretations are, if you're being ambiguous.
It will certainly be some time before it can manage such unusual examples like #Albin Sunnanbo's ImportantCustomer example. But I'll let you know how I get on!
I think that is quite useless for everything but toy examples and strict mathematical algorithms. For everything else the language is not just the language. There are lots of standard libraries and whole environments around the languages. I think I write almost as many lines of library calls as I write "actual code".
In C# you have .NET Framework, in C++ you have STL, in Java you have some Java libraries, etc.
The difference between those libraries are too big to be just syntactic nuances.
<subjective>
There has been attempts at unifying language constructs of different languages to a "unified syntax". That was called 4GL language and never really took of.
</subjective>
As a side note I have seen a code example about a page long that was valid as c#, Java and Java script code. That can serve as an example of where it is impossible to determine the actual language used.
Edit:
Besides, the whole purpose of pseudocode is that it does not need to compile in any way. The reason you write pseudocode is to create a "sketch", however sloppy you like.
foreach c in ImportantCustomers{== OrderValue >=$1M}
SendMailInviteToSpecialEvent(c)
Now tell me what language it is and write an interpreter for that.
To detect what programming language is used: Detecting programming language from a snippet
I think it should be possible. The approach in 1. could be leveraged to do this, I think. I would try to do it iteratively: detect the syntax used in the first line/clause of code, "compile" it to intermediate form based on that detection, along with any important syntax (e.g. begin/end wrappers). Then the next line/clause etc. Basically write a parser that attempts to recognize each "chunk". Ambiguity could be flagged by the same algorithm.
I doubt that this has been done ... seems like the cognitive load of learning to write e.g. python-compatible pseudocode would be much easier than trying to debug the cases where your interpreter fails.
a. I think the biggest problem is that most pseudocode is invalid in any language. For example, I might completely skip object initialization in a block of pseudocode because for a human reader it is almost always straightforward to infer. But for your case it might be completely invalid in the language syntax of choice, and it might be impossible to automatically determine e.g. the class of the object (it might not even exist). Etc.
b. I think the best you can hope for is an interpreter that "works" (subject to 4a) for your pseudocode only, no-one else's.
Note that I don't think that 4a,4b are necessarily obstacles to it being possible. I just think it won't be useful for any practical purpose.
Recognizing what language a program is in is really not that big a deal. Recognizing the language of a snippet is more difficult, and recognizing snippets that aren't clearly delimited (what do you do if four lines are Python and the next one is C or Java?) is going to be really difficult.
Assuming you got the lines assigned to the right language, doing any sort of compilation would require specialized compilers for all languages that would cooperate. This is a tremendous job in itself.
Moreover, when you write pseudo-code you aren't worrying about the syntax. (If you are, you're doing it wrong.) You'll wind up with code that simply can't be compiled because it's incomplete or even contradictory.
And, assuming you overcame all these obstacles, how certain would you be that the pseudo-code was being interpreted the way you were thinking?
What you would have would be a new computer language, that you would have to write correct programs in. It would be a sprawling and ambiguous language, very difficult to work with properly. It would require great care in its use. It would be almost exactly what you don't want in pseudo-code. The value of pseudo-code is that you can quickly sketch out your algorithms, without worrying about the details. That would be completely lost.
If you want an easy-to-write language, learn one. Python is a good choice. Use pseudo-code for sketching out how processing is supposed to occur, not as a compilable language.
An interesting approach would be a "type-as-you-go" pseudocode interpreter. That is, you would set the language to be used up front, and then it would attempt to convert the pseudo code to real code, in real time, as you typed. An interactive facility could be used to clarify ambiguous stuff and allow corrections. Part of the mechanism could be a library of code which the converter tried to match. Over time, it could learn and adapt its translation based on the habits of a particular user.
People who program all the time will probably prefer to just use the language in most cases. However, I could see the above being a great boon to learners, "non-programmer programmers" such as scientists, and for use in brainstorming sessions with programmers of various languages and skill levels.
-Neil
Programs interpreting human input need to be given the option of saying "I don't know." The language PL/I is a famous example of a system designed to find a reasonable interpretation of anything resembling a computer program that could cause havoc when it guessed wrong: see http://horningtales.blogspot.com/2006/10/my-first-pli-program.html
Note that in the later language C++, when it resolves possible ambiguities it limits the scope of the type coercions it tries, and that it will flag an error if there is not a unique best interpretation.
I have a feeling that the answer to 2. is NO. All I need to prove it false is a code snippet that can be interpreted in more than one way by a competent programmer.
Does code already exist that
recognises the programming language
of a text file?
Yes, the Unix file command.
(Surely this must be a less
complicated task than eclipse's syntax
trees or than google translate's
language guessing feature, right?) In
fact, does the SO syntax highlighter
do anything like this?
As far as I can tell, SO has a one-size-fits-all syntax highlighter that tries to combine the keywords and comment syntax of every major language. Sometimes it gets it wrong:
def median(seq):
"""Returns the median of a list."""
seq_sorted = sorted(seq)
if len(seq) & 1:
# For an odd-length list, return the middle item
return seq_sorted[len(seq) // 2]
else:
# For an even-length list, return the mean of the 2 middle items
return (seq_sorted[len(seq) // 2 - 1] + seq_sorted[len(seq) // 2]) / 2
Note that SO's highlighter assumes that // starts a C++-style comment, but in Python it's the integer division operator.
This is going to be a major problem if you try to combine multiple languages into one. What do you do if the same token has different meanings in different languages? Similar situations are:
Is ^ exponentiation like in BASIC, or bitwise XOR like in C?
Is || logical OR like in C, or string concatenation like in SQL?
What is 1 + "2"? Is the number converted to a string (giving "12"), or is the string converted to a number (giving 3)?
Is there any language or interpreter
in existence, that can manage this
kind of flexible interpreting?
On another forum, I heard a story of a compiler (IIRC, for FORTRAN) that would compile any program regardless of syntax errors. If you had the line
= Y + Z
The compiler would recognize that a variable was missing and automatically convert the statement to X = Y + Z, regardless of whether you had an X in your program or not.
This programmer had a convention of starting comment blocks with a line of hyphens, like this:
C ----------------------------------------
But one day, they forgot the leading C, and the compiler choked trying to add dozens of variables between what it thought was subtraction operators.
"Flexible parsing" is not always a good thing.
To create a "pseudocode interpreter," it might be necessary to design a programming language that allows user-defined extensions to its syntax. There already are several programming languages with this feature, such as Coq, Seed7, Agda, and Lever. A particularly interesting example is the Inform programming language, since its syntax is essentially "structured English."
The Coq programming language allows "syntax extensions", so the language can be extended to parse new operators:
Notation "A /\ B" := (and A B).
Similarly, the Seed7 programming language can be extended to parse "pseudocode" using "structured syntax definitions." The while loop in Seed7 is defined in this way:
syntax expr: .while.().do.().end.while is -> 25;
Alternatively, it might be possible to "train" a statistical machine translation system to translate pseudocode into a real programming language, though this would require a large corpus of parallel texts.

Why Ruby has so many redundancies?

I love Ruby, for past couple of years it is my language of choice.
But even since I started learning it, I was repelled with the fact that so often there are several ways to do the same (or equivalent) thing. I'll give couple of examples:
methods often have aliases, so you always have to bother to choose the most adequate, popular or commonly accepted alternative
and and or, besides && and || - just look at how much confusion precedence difference among them causes
for keyword, used almost exclusively by inexperienced non-native Ruby developers
What was the rationale behind such design decisions? Did they (Matz?) believe that the language will be easier to adopt, and therefore more popular that way?
Ruby is inspired by Perl, and one important Perl philosophy is "There is more than one way to do it", i.e. redunancies are fine since they give the programmer more freedom (and increases the odds that the functionality they want is available under the name they'd give it - not only under one). Your decision whether that's actually a good thing.
When Matz wrote Ruby, he tried to follow the 'Principle of Least Surprise'. Often this meant that there'd be more than one way to do the same thing, for example assigning to arrays by using square brackets, or an insert method. I enjoy it, because I find that rather than trying to remember which exact name to use in which situation (I always used to pause for a moment for size vs length in Java), I just just write what seems logical, and usually it will work. When reading the code, it's normally not a problem to use a different name, as the names are usually self-explanatory. So, I don't worry about which is most adequate or popular, I choose the most logical at the time.
Matz was also inspired by Perl, which has 'There's more than one way to do it' as its slogan.
I don't believe Matz was worried about what would be most popular, he just wanted to write the language he wanted to use.
I'm not going to try to explain and vs && though...
Beware that and vs. &&, though similar, have different precedence.
a = b && c # => equivalent to a = (b and c). a is set to a boolean.
a = b and c # => equivalent to (a = b) and c. a is set to b, and expression is a boolean.
There's more than one way to do it, but there may be subtle differences between them.
(update, just noticed you mentioned the precedence difference in your question... sorry. nothing to see here. move along.)

How to write an interpreter?

I have decided to write a small interpreter as my next project, in Ruby. What knowledge/skills will I need to have to be successful?
I haven't decided on the language to interpret yet, but I am looking for something that is not a toy language, but would be relatively easy to write an interpreter for.
Thanks in advance.
You will have to learn at least:
lexical analysis (grouping characters into tokens)
parsing (grouping tokens together into structure)
abstract syntax trees (representing program structure in a data structure)
data representation (assuming your language will have variables)
an evaluation loop that "runs" your program
An excellent introduction to some of these topics can be found in the introductory text Structure and Interpretation of Computer Programs. The language used in that book is Scheme, which is a robust, well-specified language that is ideally suited for your first interpreter implementation. Highly recommended.
I haven't decided on the language to interpret yet, but I am looking for
something that is not a toy language, but would be relatively easy to write an
interpreter for. Thanks in advance.
Try some dialect of Lisp like Scheme or Clojure. (Now there's an idea: Clojure-in-Ruby, which integrates with Ruby as well as Clojure does with Java.)
With Lisp, there is no need to bother with idiosyncracies of syntax, as Lisp's syntax is much closer to the abstract syntax tree.
This SICP chapter shows how to write a Lisp interpreter in Lisp (a metacircular evaluator). In my opinion this is the best place to start. Then you can move on to Lisp in Small Pieces to learn how to write advanced interpreters and compilers for Lisp. The advantage of implementing a language like Lisp (in Lisp itself!) is that you get the lexical analyzer, parser, AST, data/program representation and REPL for free. You can concentrate on the task of getting your great language working!
There is Tree top project wich can be helpful for you http://treetop.rubyforge.org/
You can checkout Ruby Draft Specification http://ruby-std.netlab.jp/
I had a similar idea a couple of days ago. LISP is by far the easiest to implement because the syntax is so simple, and the data structures that the language manipulates are the same structures that the code is written in. Hence you need only a minimal implementation, and can define the rest in terms of itself.
However, if you are trying to learn about parsing, you may want to do a more complex language with Abstract Syntax Trees, etc.
If you want to check out my (literally two days old) Java implementation of lisp, check out mylisp.googlecode.com. I'm still working on it but it is incredible how short a time it took to get the existing stuff working.
It's not sooo hard. here's a LISP interpreter in ruby and the source is so small you are supposed to copy/paste it. but are you gonna learn LISP now? hehe.
If you're just doing this for fun, make up your own, simple language and just try it. My recommendation would be something like a really simple classic BASIC (no visual basic or object oriented stuff). With line numbers, GOTO, INPUT and PRINT and that's it. You get to do the basics, and you get a better understanding of how things work.
The knowledge you'll need?
Tokenizing (turning that huge chunk of characters into something more efficiently readable, effectively splitting it up into 'words')
Parsing (going over the tokens and building a data structure from it)
Interpreting (looping over the data structure and executing each command)
And for that last one you'll also need a way to keep around variables. Usually you'd just implement a "stack", one huge block of data where you can mark off an area at the end.
It's not implemented in Lisp, but I found Write Yourself A Scheme in 48 Hours to be a very useful document while I was starting out with Haskell (though I didn't get anywhere near finishing it after 48 hours; YMMV). It also gives you a lot of insight into interpreters in general.
I can recommend this book. It discusses patterns for writing parsers and interpreters and more:
http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=language+implementation+patterns&x=0&y=0

Avoiding Mixup of Language Details

Today someone asked me what was wrong with their source code. It was obvious. "Use double equals in place of that single equal in that if statement. Um, I think..." As I remember some languages actually take a single equals for comparison. Since I sometimes forget or mix up the syntax details among the several languages I use, I stepped over to my laptop to try a quickie experiment.
It costs a bit of time and is a break in the flow to try "quick" experiments (though maybe the practice is good for memory.) What tips do you have for keeping straight in your mind the syntax (and other) details of multiple languages?
(And nowadays, this applies just as well to the many wiki-like markups!)
To me, the hardest part isn't the syntax -usually you get into the mode when looking at the code you're working on. The really hard part is remembering the library of the language so you don't go inventing the wheel over and over again. Now if only people would organize their help files so it was easy to search for particular stuff in the library.
IDEs that can draw red and yellow squiggles can help, until you develop that mental muscle memory.
One of the annoying things with XCode (for Cocoa/ObjectiveC) is that you don't get said squiggles until you compile. (As opposed to Eclipse/Java where you get live squiggles).
In my case it's just experience. I think once you code in a language for long enough your brain seems to be able to do language-context-switching with it.
Indeed, on SO I advised not to forget avoiding if (a = b) in Java, and someone reminded me that it is legal only if a and b are boolean! Of course, the advice is good for C, C++, JavaScript and a number of other C-like languages.
Likewise, I realized only recently that var v in JavaScript have a function-level scope only, not a brace-level scope.
Somehow, that's the pitfall of having similar syntaxes, but different behaviors.
For the anecdote, some people in the Lua mailing list complain that this language isn't C-like, with the terse and familiar curly braces, the += and ++, the bitwise operators. They say it hurts adoption of the language, because people are more familiar with C-like syntax.
That's non-sense, Basic was (and still is) widely used with its verbose syntax. And so is Pascal (Delphià. And lot of people find the Lua syntax readable and easy to learn, good for those non familiar to programming (game AI specialists, for example).
Moreover, and to the point, Lua is designed to be integrated to C/C++ programs and to be extended with C[++] functions. And people say the quite different syntaxes helps in the mindset shifting.

Resources