Parsing s-expressions in Go - go

Here's a link to lis.py if you're unfamiliar: http://norvig.com/lispy.html
I'm trying to implement a tiny lisp interpreter in Go. I've been inspired by Peter Norvig's Lis.py lisp implementation in Python.
My problem is I can't think of a single somewhat efficient way to parse the s-expressions. I had thought of a counter that would increment by 1 when it see's a "(" and that would decrement when it sees a ")". This way when the counter is 0 you know you've got a complete expression.
But the problem with that is that it means you have to loop for every single expression which would make the interpreter incredibly slow for any large program.
Any alternative ideas would be great because I can't think of any better way.

There is an S-expression parser implemented in Go at Rosetta code:
S-expression parser in Go
It might give you an idea of how to attack the problem.

You'd probably need to have an interface "Sexpr" and ensure that your symbol and list data structures matches the interface. Then you can use the fact that an S-expression is simply "a single symbol" or "a list of S-expressions".
That is, if the first character is "(", it's not a symbol, but a list, so start accumulating a []Sexpr, reading each contained Sexpr at a time, until you hit a ")" in your input stream. Any contained list will already have had its terminal ")" consumed.
If it's not a "(", you're reading a symbol, so read until you hit a non-symbol-constituent character, unconsume it and return the symbol.

In 2022, you can also test eigenhombre/l1, a small Lisp 1 written in Go, by John Jacobsen .
It is presented in "(Yet Another) Lisp In Go"
It does include in commit b3a84e1 a parsing and tests for S-expressions.
func TestSexprStrings(T *testing.T) {
var tests = []struct {
input sexpr
want string
}{
{Nil, "()"},
{Num(1), "1"},
{Num("2"), "2"},
{Cons(Num(1), Cons(Num("2"), Nil)), "(1 2)"},
{Cons(Num(1), Cons(Num("2"), Cons(Num(3), Nil))), "(1 2 3)"},
{Cons(
Cons(
Num(3),
Cons(
Num("1309875618907812098"),
Nil)),
Cons(Num(5), Cons(Num("6"), Nil))), "((3 1309875618907812098) 5 6)"},
}

Related

why strict code format on keywords in go lang

As every new programer starting in Go 1st think you see is, strict code format.
Means:
//Valid
func foo(){
}
//Invalid
func foo()
{
}
Same goes for if-else that else should be in same line where if ends, like:
//Valid
if{
}else{
}
//Invalid
if{
}
else{
}
we get below error:
syntax error: unexpected else, expecting }
I have checked the spec spec, but not able to find why.
The only explanation I'am getting is its mandatory.
Can anyone explain us why this is mandated does this have any reason? if any.
UPDATE
I think i have mention this already that "I know lang say so", Question is WHY?
Why to go this length to make it compile time error, what problems it was posing if we don't do this?
The language was designed as such and the reason is outlined in the FAQ.
https://golang.org/doc/faq
Why are there braces but no semicolons? And why can't I put the opening brace on the next line?
o uses brace brackets for statement
grouping, a syntax familiar to programmers who have worked with any
language in the C family. Semicolons, however, are for parsers, not
for people, and we wanted to eliminate them as much as possible. To
achieve this goal, Go borrows a trick from BCPL: the semicolons that
separate statements are in the formal grammar but are injected
automatically, without lookahead, by the lexer at the end of any line
that could be the end of a statement. This works very well in practice
but has the effect that it forces a brace style. For instance, the
opening brace of a function cannot appear on a line by itself.
Some have argued that the lexer should do lookahead to permit the
brace to live on the next line. We disagree. Since Go code is meant to
be formatted automatically by gofmt, some style must be chosen. That
style may differ from what you've used in C or Java, but Go is a
different language and gofmt's style is as good as any other. More
important—much more important—the advantages of a single,
programmatically mandated format for all Go programs greatly outweigh
any perceived disadvantages of the particular style. Note too that
Go's style means that an interactive implementation of Go can use the
standard syntax one line at a time without special rules.
Go inserts a ; at the end of lines ending in certain tokens including }.
Since if {...} else {...} is a single statement, you can't put a semicolon in the middle of it after the first closing brances i.e. }, hence the requirement to put } else { on one line is mandatory.
I hope it answers your question.

Compile error using fast pipe operator after pipe last in ReasonML

The way that the "fast pipe" operator is compared to the "pipe last" in many places implies that they are drop-in replacements for each other. Want to send a value in as the last parameter to a function? Use pipe last (|>). Want to send it as the first parameter? Use fast pipe (once upon a time |., now deprecated in favour of ->).
So you'd be forgiven for thinking, as I did until earlier today, that the following code would get you the first match out of the regular expression match:
Js.String.match([%re "/(\\w+:)*(\\w+)/i"], "int:id")
|> Belt.Option.getExn
-> Array.get(1)
But you'd be wrong (again, as I was earlier today...)
Instead, the compiler emits the following warning:
We've found a bug for you!
OCaml preview 3:10-27
This has type:
'a option -> 'a
But somewhere wanted:
'b array
See this sandbox. What gives?
Looks like they screwed up the precedence of -> so that it's actually interpreted as
Js.String.match([%re "/(\\w+:)*(\\w+)/i"], "int:id")
|> (Belt.Option.getExn->Array.get(1));
With the operators inlined:
Array.get(Belt.Option.getExn, 1, Js.String.match([%re "/(\\w+:)*(\\w+)/i"], "int:id"));
or with the partial application more explicit, since Reason's syntax is a bit confusing with regards to currying:
let f = Array.get(Belt.Option.getExn, 1);
f(Js.String.match([%re "/(\\w+:)*(\\w+)/i"], "int:id"));
Replacing -> with |. works. As does replacing the |> with |..
I'd consider this a bug in Reason, but would in any case advise against using "fast pipe" at all since it invites a lot of confusion for very little benefit.
Also see this discussion on Github, which contains various workarounds. Leaving #glennsl's as the accepted answer because it describes the nature of the problem.
Update: there is also an article on Medium that goes into a lot of depth about the pros and cons of "data first" and "data last" specifically as it applies to Ocaml / Reason with Bucklescript.

ocaml: Basic syntax for function of several arguments

I am learning OCaml and I'm a complete beginner at this point. I'm trying to get used to the syntax and I just spent 15 minutes debugging a stupid syntax error.
let foo a b = "bar";;
let biz = foo 2. -1.;;
I was getting an error This expression has type 'a -> string but an expression was expected of type int. I resolved the error, but it prompted me to learn what is the best way to handle this syntax peculiarity.
Basically OCaml treats what I intended as the numeric constant -1. as two separate tokens: - and 1. and I end up passing just 1 argument to foo. In other languages I'm familiar with this doesn't happen because arguments are separated with a comma (or in Scheme there are parentheses).
What is the usual way to handle this syntax peculiarity in OCaml? Is it surrounding the number with parentheses (foo 2. (-1.)) or there is some other way?
There is an unary minus operator ~-. that can be used to avoid this issue: foo ~-.1. (and its integer counterpart ~-) but it is generally simpler to add parentheses around the problematic expression.

Do three dots contain multiple meanings?

As I recognize, "..." means the length of the array in the below snippet.
var days := [...]string { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" }
On the other hand, "..." means unpacking the slice y to arguments of int in the below snippet, as I guess. I'm not really sure about this.
x := []int{1,2,3}
y := []int{4,5,6}
x = append(x, y...)
Now, the difference in the two meanings makes it hard for me to understand what "..." is.
You've noted two cases of ... in Go. In fact, there are 3:
[...]int{1,2,3}
Evaluates at compile time to [3]int{1,2,3}
a := make([]int, 500)
SomeVariadicFunc(a...)
Unpacks a as the arguments to a function. This matches the one you missed, the variadic definition:
func SomeVariadicFunc(a ...int)
Now the further question (from the comments on the OP) -- why can ... work semantically in all these cases? The answer is that in English (and other languages), this is known as an ellipsis. From that article
Ellipsis (plural ellipses; from the Ancient Greek: ἔλλειψις,
élleipsis, "omission" or "falling short") is a series of dots that
usually indicates an intentional omission of a word, sentence, or
whole section from a text without altering its original meaning.1
Depending on their context and placement in a sentence, ellipses can
also indicate an unfinished thought, a leading statement, a slight
pause, and a nervous or awkward silence.
In the array case, this matches the "omission of a word, sentence, or whole section" definition. You're omitting the size of the array and letting the compiler figure it out for you.
In the variadic cases, it uses the same meaning, but differently. It also has hints of "an unfinished thought". We often use "..." to mean "and so on." "I'm going to get bread, eggs, milk..." in this case "..." signifies "other things similar to breads, eggs, and milk". The use in, e.g., append means "an element of this list, and all the others." This is perhaps the less immediately intuitive usage, but to a native speaker, it makes sense. Perhaps a more "linguistically pure" construction would have been a[0]... or even a[0], a[1], a[2]... but that would cause obvious problems with empty slices (which do work with the ... syntax), not to mention being verbose.
In general, "..." is used to signify "many things", and in this way both uses of it make sense. Many array elements, many slice elements (albeit one is creation, and the other is calling).
I suppose the hidden question is "is this good language design?" On one hand, once you know the syntax, it makes perfect sense to most native speakers of English, so in that sense it's successful. On the other hand, there's value in not overloading symbols in this way. I probably would have chose a different symbol for array unpacking, but I can't fault them for using a symbol that was probably intuitive to the language designers. Especially since the array version isn't even used terribly often.
As mentioned, this is of no issue to the compiler, because the cases can never overlap. You can never have [...] also mean "unpack this", so there's no symbol conflict.
(Aside: There is another use of it in Go I omitted, because it's not in the language itself, but the build tool. Typing something like go test ./... means "test this package, and all packages in subdirectories of this one". But it should be pretty clear with my explanation of the other uses why it makes sense here.)
Just FYI, myfunc(s...) does not mean "unpack" the input s.
Rather, "bypass" would be a more suitable expression.
If s is a slice s := []string{"a", "b", "c"},
myfunc(s...) is not equivalent to myfunc(s[0], s[1], s[2]).
This simple code shows it.
Also, see the official Go specification (slightly modified for clarity):
Given the function
func Greeting(prefix string, who ...string)
If the final argument is assignable to a slice type []T and is
followed by ..., it is passed unchanged as the value for a ...T
parameter. In this case no new slice is created.
Given the slice s and call
s := []string{"James", "Jasmine"}
Greeting("goodbye:", s...)
within Greeting, who will have the same value as s with the same underlying
array.
If it "unpacks" the input argument, a new slice with a different array should be created (which is not the case).
Note: It's not real "bypass" because the slice itself (not the underlying array) is copied into the function (there is no 'reference' in Go). But, that slice within the function points to the same original underlying array, so it would be a better description than "unpack".

Pythonesque blocks and postfix expressions

In JavaScript,
f = function(x) {
return x + 1;
}
(5)
seems at a glance as though it should assign f the successor function, but actually assigns the value 6, because the lambda expression followed by parentheses is interpreted by the parser as a postfix expression, specifically a function call. Fortunately this is easy to fix:
f = function(x) {
return x + 1;
};
(5)
behaves as expected.
If Python allowed a block in a lambda expression, there would be a similar problem:
f = lambda(x):
return x + 1
(5)
but this time we can't solve it the same way because there are no semicolons. In practice Python avoids the problem by not allowing multiline lambda expressions, but I'm working on a language with indentation-based syntax where I do want multiline lambda and other expressions, so I'm trying to figure out how to avoid having a block parse as the start of a postfix expression. Thus far I'm thinking maybe each level of the recursive descent parser should have a parameter along the lines of 'we have already eaten a block in this statement so don't do postfix'.
Are there any existing languages that encounter this problem, and how do they solve it if so?
Python has semicolons. This is perfectly valid (though ugly and not recommended) Python code: f = lambda(x): x + 1; (5).
There are many other problems with multi-line lambdas in otherwise standard Python syntax though. It is completely incompatible with how Python handles indentation (whitespace in general, actually) inside expressions - it doesn't, and that's the complete opposite of what you want. You should read the numerous python-ideas thread about multi-line lambdas. It's somewhere between very hard to impossible.
If you want arbitrarily complex compound statements inside lambdas you can't use the existing rules for multi-line expressions even if you made all statements expressions. You'd have to change the indentation handling (see the language reference for how it works right now) so that expressions can also contain blocks. This is hard to do without breaking perfectly fine Python code, and will certainly result in a language many Python programmers will consider worse in several regards: Harder to understand, more complex to implement, permits some stupid errors, etc.
Most languages don't solve this exact problem at all. Most candidates (Scala, Ruby, Lisps, and variants of these three) have explicit end-of-block tokens. I know of two languages that have the same problem, one of which (Haskell) has been mentioned by another answer. Coffeescript also uses indentation without end-of-block tokens. It parses the transliteration of your example correctly. However, I could not find any specification of how or why it does this (and I won't dig through the parser source code). Both differ significantly from Python in syntax as well as design philosophy, so their solution is of little (if any) use for Python.
In Haskell, there is an implicit semicolon whenever you start a line with the same indentation as a previous one, assuming the parser is in a layout-sensitive mode.
More specifically, after a token is encountered that signals the start of a (layout-sensitive) block, the indentation level of the first token of the first block item is remembered. Each line that is indented more continues the current block item; each line that is indented the same starts a new block item, and the first line that is indented less implies the closure of the block.
How your last example would be treated depends on whether the f = is a block item in some block or not. If it is, then there will be an implicit semicolon between the lambda expression and the (5), since the latter is indented the same as the former. If it is not, then the (5) will be treated as continuing whatever block item the f = is a part of, making it an argument to the lamda function.
The details are a bit messier than this; look at the Haskell 2010 report.

Resources