As I was about to write a code generator for F#, I was wondering whether I could avoid stepping into the indentation mess by generating only one-line-values.
As part of this effort, I was considering how I could express Object Expressions in one single line, but couldn't succeed, except in Verbose Mode.
let Expr() = (let ToString = "ToString" in { new System.Object() with member this.ToString() = ToString; member this.GetHashCode() = ToString.GetHashCode()})
The issue is that I don't want to generate all my code in Verbose mode, this is a compatibility feature. Is there any other option?
Thanks a lot in advance for your insights!
François
The reason I ask for this is that I have to generate Object Expressions in arbitrary expressions and I would like to avoid to count the number of characters in the current line to compute how much I've to indent the next line.
(Shameless plug) I maintain F# source code formatter which exposes APIs to pretty-print full F# 3.0 syntax. You have several options:
Implement your code generator using verbose mode and run the source code formatter APIs on the output. Then you don't have to worry about indentation and line break on verbose mode is very easy.
Implement your code generator using functional combinators. There are many combinators in FormatConfig module which you can copy and modify. F# indentation rule is clear; you can read more in this article.
You probably have an AST for pretty printing. If you prefer a lightweight solution, F# Compiler CodeDom has similar combinators for code generation.
This is not strictly speaking an answer, but what is so bad about writing it with proper indentation? You can always make a list of the generated lines and add indentation in a separate step. This is how I usually generate code, even if the language (e.g. HTML) does not need indentation.
Why does generated code have to be illegible for humans? The proper indentation makes a review of the generated code easier and you can even use usual diff tools, if you for instance have the generated sources commited to source control.
At least in my experience proper indentation is actually far less difficult than one might think, especially when generating. Don't forget, that you have all the context at hand during the generation, so adding a level of indentation should be very easy.
let indent = List.map (sprintf " %s")
[
yield "let Expr() ="
yield! [
yield "let ToString = \"ToString\""
yield "{"
yield! [
"new System.Object() with"
"member this.ToString() = ToString"
"member this.GetHashCode() = ToString.GetHashCode()"
] |> indent
yield "}"
] |> indent
]
Related
I know I should use YARD. Not an option in this environment.
So I'm using rdoc and call-seq is generally been great, but I want to add indentation for a call-seq that is always used in a block/yield style, something like:
call-seq:
someFunc {
example1(..)
example2(..)
}
But call-seq: removes any indentation and it becomes:
someFunc {
example1(..)
example2(..)
}
Which is sad. I know I could use a separate code block as an example, but this is really part of the general API for how this should be called, and it would be nice if I could use call-seq (instead of the ugly inverted colors of a code block).
I am guessing this isn't possible with the limitations of rdoc, but I thought I'd ask. Any thoughts?
I think you are confused about what the purpose of the calling sequence is, but the name unfortunately is really confusing. Normally, I would assume that it was designed by a non-native speaker, but if I remember my history correctly, RDoc was designed by David Thomas and Andy Hunt for their book Programming Ruby in 2000.
Anyway, calling sequence is not for documenting a sequence of calls, as one might surmise, i.e. it is not for documenting multiple lines of code.
It is for documenting what we could all in other languages multiple overloads of the method. Note that the syntax that is used for each overload is not even legal Ruby syntax (because of the →), so treating it as executable code that is to be indented does not make much sense:
inject(initial, sym) → obj
inject(sym) → obj
inject(initial) { |memo, obj| block } → obj
inject { |memo, obj| block } → obj
What this tells us is that the Enumerable#inject method has four different "overloads", 2×2 combinations: with or without initial value and specifying the binary operation either as a block or as a Symbol.
Hence why it is not possible to format code inside the calling sequence.
As every new programer starting in Go 1st think you see is, strict code format.
Means:
//Valid
func foo(){
}
//Invalid
func foo()
{
}
Same goes for if-else that else should be in same line where if ends, like:
//Valid
if{
}else{
}
//Invalid
if{
}
else{
}
we get below error:
syntax error: unexpected else, expecting }
I have checked the spec spec, but not able to find why.
The only explanation I'am getting is its mandatory.
Can anyone explain us why this is mandated does this have any reason? if any.
UPDATE
I think i have mention this already that "I know lang say so", Question is WHY?
Why to go this length to make it compile time error, what problems it was posing if we don't do this?
The language was designed as such and the reason is outlined in the FAQ.
https://golang.org/doc/faq
Why are there braces but no semicolons? And why can't I put the opening brace on the next line?
o uses brace brackets for statement
grouping, a syntax familiar to programmers who have worked with any
language in the C family. Semicolons, however, are for parsers, not
for people, and we wanted to eliminate them as much as possible. To
achieve this goal, Go borrows a trick from BCPL: the semicolons that
separate statements are in the formal grammar but are injected
automatically, without lookahead, by the lexer at the end of any line
that could be the end of a statement. This works very well in practice
but has the effect that it forces a brace style. For instance, the
opening brace of a function cannot appear on a line by itself.
Some have argued that the lexer should do lookahead to permit the
brace to live on the next line. We disagree. Since Go code is meant to
be formatted automatically by gofmt, some style must be chosen. That
style may differ from what you've used in C or Java, but Go is a
different language and gofmt's style is as good as any other. More
important—much more important—the advantages of a single,
programmatically mandated format for all Go programs greatly outweigh
any perceived disadvantages of the particular style. Note too that
Go's style means that an interactive implementation of Go can use the
standard syntax one line at a time without special rules.
Go inserts a ; at the end of lines ending in certain tokens including }.
Since if {...} else {...} is a single statement, you can't put a semicolon in the middle of it after the first closing brances i.e. }, hence the requirement to put } else { on one line is mandatory.
I hope it answers your question.
In JavaScript,
f = function(x) {
return x + 1;
}
(5)
seems at a glance as though it should assign f the successor function, but actually assigns the value 6, because the lambda expression followed by parentheses is interpreted by the parser as a postfix expression, specifically a function call. Fortunately this is easy to fix:
f = function(x) {
return x + 1;
};
(5)
behaves as expected.
If Python allowed a block in a lambda expression, there would be a similar problem:
f = lambda(x):
return x + 1
(5)
but this time we can't solve it the same way because there are no semicolons. In practice Python avoids the problem by not allowing multiline lambda expressions, but I'm working on a language with indentation-based syntax where I do want multiline lambda and other expressions, so I'm trying to figure out how to avoid having a block parse as the start of a postfix expression. Thus far I'm thinking maybe each level of the recursive descent parser should have a parameter along the lines of 'we have already eaten a block in this statement so don't do postfix'.
Are there any existing languages that encounter this problem, and how do they solve it if so?
Python has semicolons. This is perfectly valid (though ugly and not recommended) Python code: f = lambda(x): x + 1; (5).
There are many other problems with multi-line lambdas in otherwise standard Python syntax though. It is completely incompatible with how Python handles indentation (whitespace in general, actually) inside expressions - it doesn't, and that's the complete opposite of what you want. You should read the numerous python-ideas thread about multi-line lambdas. It's somewhere between very hard to impossible.
If you want arbitrarily complex compound statements inside lambdas you can't use the existing rules for multi-line expressions even if you made all statements expressions. You'd have to change the indentation handling (see the language reference for how it works right now) so that expressions can also contain blocks. This is hard to do without breaking perfectly fine Python code, and will certainly result in a language many Python programmers will consider worse in several regards: Harder to understand, more complex to implement, permits some stupid errors, etc.
Most languages don't solve this exact problem at all. Most candidates (Scala, Ruby, Lisps, and variants of these three) have explicit end-of-block tokens. I know of two languages that have the same problem, one of which (Haskell) has been mentioned by another answer. Coffeescript also uses indentation without end-of-block tokens. It parses the transliteration of your example correctly. However, I could not find any specification of how or why it does this (and I won't dig through the parser source code). Both differ significantly from Python in syntax as well as design philosophy, so their solution is of little (if any) use for Python.
In Haskell, there is an implicit semicolon whenever you start a line with the same indentation as a previous one, assuming the parser is in a layout-sensitive mode.
More specifically, after a token is encountered that signals the start of a (layout-sensitive) block, the indentation level of the first token of the first block item is remembered. Each line that is indented more continues the current block item; each line that is indented the same starts a new block item, and the first line that is indented less implies the closure of the block.
How your last example would be treated depends on whether the f = is a block item in some block or not. If it is, then there will be an implicit semicolon between the lambda expression and the (5), since the latter is indented the same as the former. If it is not, then the (5) will be treated as continuing whatever block item the f = is a part of, making it an argument to the lamda function.
The details are a bit messier than this; look at the Haskell 2010 report.
Occasionally we like to look into how certain System` functions are defined (when they're written in Mathematica). This question is about the best way to do that.
Points to keep in mind:
Of couse ReadProtected needs to be removed first.
Builtins usually need to be used at least once before they get loaded into the kernel. Is a single simple invocation usually sufficient for this when they have extended functionality (e.g. through options)?
Information (??) gives the definition in a hard-to-read format (no indentation, and all private context names prepended). What is the best way to get rid of the context names, and get formatted code?
One idea for getting rid of certain contexts is Block[{$ContextPath = Append[$ContextPath, "SomeContext`Private`"], Information[symbol]]. Code can be auto-formatted using Workbench. Some issues remain, e.g. Information doesn't quote strings, preventing the code from being able to be copied into Workbench.
Generally, I'm interested in how people do this, what methods they use to make the code of builtins as easy to read as possible.
Use case: For example, recently I digged into the code of RunThrough when I found out that it simply doesn't work on Windows XP (turns out it fails to quote the names of temp files when the path to them contains spaces).
Update: It appears that there used to be a function for printing definitions without context prepended, Developer`ContextFreeForm, but it's not working any more in newer versions.
Regarding the pretty-printing: the following is a very schematic code which builds on the answer of #Mr.Wizard to show that a few simple rules can go a long way towards improving the readability of the code:
Internal`InheritedBlock[{RunThrough},
Unprotect[RunThrough];
ClearAttributes[RunThrough, ReadProtected];
Block[{$ContextPath = Append[$ContextPath, "System`Dump`"]},
With[{boxes = ToBoxes# DownValues[RunThrough]},
CellPrint[Cell[BoxData[#], "Input"]] &[
boxes /.
f_[left___, "\[RuleDelayed]", right___] :>
f[left, "\[RuleDelayed]", "\n", right] //.
{
RowBox[{left___, ";", next : Except["\n"], right___}] :>
RowBox[{left, ";", "\n", "\t", next, right}],
RowBox[{sc : ("Block" | "Module" | "With"), "[",
RowBox[{vars_, ",", body_}], "]"}] :>
RowBox[{sc, "[", RowBox[{vars, ",", "\n\t", body}], "]"}]
}]]]]
This is for sure not a general solution (in particular it won't work well on deeply nested functional code without many separate statements), but I am sure it can be improved and generalized without too much trouble to cover many cases of interest.
Good question, because I don't think I have seen this discussed yet.
I do essentially the same thing you outlined. You can get a somewhat different print-out with Definition, and more information with FullDefinition:
Unprotect[RunThrough];
ClearAttributes[RunThrough, ReadProtected]
Block[{$ContextPath = Append[$ContextPath, "System`Dump`"]},
Print # FullDefinition # RunThrough
]
I have had some ideas for a new programming language floating around in my head, so I thought I'd take a shot at implementing it. A friend suggested I try using Treetop (the Ruby gem) to create a parser. Treetop's documentation is sparse, and I've never done this sort of thing before.
My parser is acting like it has an infinite loop in it, but with no stack traces; it is proving difficult to track down. Can somebody point me in the direction of an entry-level parsing/AST guide? I really need something that list rules, common usage etc for using tools like Treetop. My parser grammer is on GitHub, in case someone wishes to help me improve it.
class {
initialize = lambda (name) {
receiver.name = name
}
greet = lambda {
IO.puts("Hello, #{receiver.name}!")
}
}.new(:World).greet()
I asked treetop to compile your language into an .rb file. That gave me something to dig into:
$ tt -o /tmp/rip.rb /tmp/rip.treetop
Then I used this little stub to recreate the loop:
require 'treetop'
load '/tmp/rip.rb'
RipParser.new.parse('')
This hangs. Now, isn't that interesting! An empty string reproduces the behavior just as well as the dozen-or-so-line example in your question.
To find out where it's hanging, I used an Emacs keyboard macro to edit rip.rb, adding a debug statement to the entry of each method. For example:
def _nt_root
p [__LINE__, '_nt_root'] #DEBUG
start_index = index
Now we can see the scope of the loop:
[16, "root"]
[21, "_nt_root"]
[57, "_nt_statement"]
...
[3293, "_nt_eol"]
[3335, "_nt_semicolon"]
[3204, "_nt_comment"]
[57, "_nt_statement"]
[57, "_nt_statement"]
[57, "_nt_statement"]
...
Further debugging from there reveals that an integer is allowed to be an empty string:
rule integer
digit*
end
This indirectly allows a statement to be an empty string, and the top-level rule statement* to forever consume empty statements. Changing * to + fixes the loop, but reveals another problem:
/tmp/rip.rb:777:in `_nt_object': stack level too deep (SystemStackError)
from /tmp/rip.rb:757:in `_nt_compound_object'
from /tmp/rip.rb:1726:in `_nt_range'
from /tmp/rip.rb:1671:in `_nt_special_literals'
from /tmp/rip.rb:825:in `_nt_literal_object'
from /tmp/rip.rb:787:in `_nt_object'
from /tmp/rip.rb:757:in `_nt_compound_object'
from /tmp/rip.rb:1726:in `_nt_range'
from /tmp/rip.rb:1671:in `_nt_special_literals'
... 3283 levels...
Range is left-recursing, indirectly, via special_literals, literal_object, object, and compound_object. Treetop, when faced with left recursion, eats stack until it pukes. I don't have a quick fix for that problem, but at least you've got a stack trace to go from now.
Also, this is not your immediate problem, but the definition of digit is odd: It can either one digit, or multiple. This causes digit* or digit+ to allow the (presumably) illegal integer 1________2.
I really enjoyed Language Implementation Patterns by Parr; since Parr created the ANTLR parser generator, it's the tool he uses throughout the book, but it should be simple enough to learn from it all the same.
What I really liked about it was the way each example grew upon the previous one; he doesn't start out with a gigantic AST-capable parser, instead he slowly introduces problems that need more and more 'backend smarts' to do the job, so the book scales well along with the language that needs parsing.
What I wish it covered in a little more depth is the types of languages that one can write and give advice on Do's and Do Not Do's when designing languages. I've seen some languages that are a huge pain to parse and I'd have liked to know more about the design decisions that could have been made differently.