I have had some ideas for a new programming language floating around in my head, so I thought I'd take a shot at implementing it. A friend suggested I try using Treetop (the Ruby gem) to create a parser. Treetop's documentation is sparse, and I've never done this sort of thing before.
My parser is acting like it has an infinite loop in it, but with no stack traces; it is proving difficult to track down. Can somebody point me in the direction of an entry-level parsing/AST guide? I really need something that list rules, common usage etc for using tools like Treetop. My parser grammer is on GitHub, in case someone wishes to help me improve it.
class {
initialize = lambda (name) {
receiver.name = name
}
greet = lambda {
IO.puts("Hello, #{receiver.name}!")
}
}.new(:World).greet()
I asked treetop to compile your language into an .rb file. That gave me something to dig into:
$ tt -o /tmp/rip.rb /tmp/rip.treetop
Then I used this little stub to recreate the loop:
require 'treetop'
load '/tmp/rip.rb'
RipParser.new.parse('')
This hangs. Now, isn't that interesting! An empty string reproduces the behavior just as well as the dozen-or-so-line example in your question.
To find out where it's hanging, I used an Emacs keyboard macro to edit rip.rb, adding a debug statement to the entry of each method. For example:
def _nt_root
p [__LINE__, '_nt_root'] #DEBUG
start_index = index
Now we can see the scope of the loop:
[16, "root"]
[21, "_nt_root"]
[57, "_nt_statement"]
...
[3293, "_nt_eol"]
[3335, "_nt_semicolon"]
[3204, "_nt_comment"]
[57, "_nt_statement"]
[57, "_nt_statement"]
[57, "_nt_statement"]
...
Further debugging from there reveals that an integer is allowed to be an empty string:
rule integer
digit*
end
This indirectly allows a statement to be an empty string, and the top-level rule statement* to forever consume empty statements. Changing * to + fixes the loop, but reveals another problem:
/tmp/rip.rb:777:in `_nt_object': stack level too deep (SystemStackError)
from /tmp/rip.rb:757:in `_nt_compound_object'
from /tmp/rip.rb:1726:in `_nt_range'
from /tmp/rip.rb:1671:in `_nt_special_literals'
from /tmp/rip.rb:825:in `_nt_literal_object'
from /tmp/rip.rb:787:in `_nt_object'
from /tmp/rip.rb:757:in `_nt_compound_object'
from /tmp/rip.rb:1726:in `_nt_range'
from /tmp/rip.rb:1671:in `_nt_special_literals'
... 3283 levels...
Range is left-recursing, indirectly, via special_literals, literal_object, object, and compound_object. Treetop, when faced with left recursion, eats stack until it pukes. I don't have a quick fix for that problem, but at least you've got a stack trace to go from now.
Also, this is not your immediate problem, but the definition of digit is odd: It can either one digit, or multiple. This causes digit* or digit+ to allow the (presumably) illegal integer 1________2.
I really enjoyed Language Implementation Patterns by Parr; since Parr created the ANTLR parser generator, it's the tool he uses throughout the book, but it should be simple enough to learn from it all the same.
What I really liked about it was the way each example grew upon the previous one; he doesn't start out with a gigantic AST-capable parser, instead he slowly introduces problems that need more and more 'backend smarts' to do the job, so the book scales well along with the language that needs parsing.
What I wish it covered in a little more depth is the types of languages that one can write and give advice on Do's and Do Not Do's when designing languages. I've seen some languages that are a huge pain to parse and I'd have liked to know more about the design decisions that could have been made differently.
ok, so there's basically 3 tasks this program must carry out:
Parse a sentence given in the form of a list, in this case (and throughout the example) the sentence will be [the,traitorous,tostig_godwinson,was,slain]. (its history, don't ask!) so this would look like:
sentence(noun_phrase(det(the),np2(adj(traitorous),np2(noun(tostig_godwinson)))),verb_phrase(verb(slain),np(noun(slain)))).
use the parsed sentence to extract the subject, verb and object, and output as a list, e.g. [tostig_godwinson,was,slain] using the current example. I had this working too until I attempted number 3.
use the target list and compare it against a knowledge base to basically answer the question you asked in the 1st place (see code below) so using this question and the knowledge base the program would print out 'the_battle_of_stamford_bridge' as this is the sentence in the knowledge base with the most matches to the list in question
so here's where i am so far:
history('battle_of_Winwaed',[penda, king_of_mercia,was,slain,killed,oswui,king_of_bernicians, took_place, '15_November_1655']).
history('battle_of_Stamford_Bridge',[tostig_godwinson,herald_hardrada,was,slain, took_place, '25_September_1066']).
history('battle_of_Boroughbridge',[edwardII,defeated,earl_of_lancaster,execution, took_place, '16_march_1322']).
history('battle_of_Towton',[edwardIV,defeated,henryVI,palm_Sunday]).
history('battle_of_Wakefield',[richard_of_york, took_place,
'30_December_1490',was,slain,war_of_the_roses]).
history('battle_of_Adwalton_Moor',[earl_of_newcastle,defeats,fairfax, took_place, '30_June_1643',battle,bradford,bloody]).
history('battle_of_Marston_Moor',[prince_rupert,marquis_of_newcastle,defeats,fairfax,oliver_cromwell,ironsides, took_place,
'2_June_1644', bloody]).
noun(penda).
noun(king_of_mercia).
noun(oswui).
noun(king_of_bernicians).
noun('15_November_1655').
noun(tostig_godwinson).
noun(herald_hardrada).
noun('25_September_1066').
noun(edwardII).
noun(earl_of_lancaster).
noun('16_march_1322').
noun(edwardIV).
noun(henryVI).
noun(palm_Sunday).
noun(richard_of_york).
noun('30_December_1490').
noun(war_of_the_roses).
noun(earl_of_newcastle).
noun(fairfax).
noun('30_June_1643').
noun(bradford).
noun(prince_rupert).
noun(marquis_of_newcastle).
noun(fairfax).
noun(oliver_cromwell).
noun('2_June_1644').
noun(battle).
noun(slain).
noun(defeated).
noun(killed).
adj(bloody).
adj(traitorous).
verb(defeats).
verb(was).
det(a).
det(the).
prep(on).
best_match(Subject,Object,Verb):-
history(X,Y),
member(Subject,knowledgebase),
member(Object,knowledgebase),
member(Verb,knowledgebase),
write(X),nl,
fail.
micro_watson:- write('micro_watson: Please ask me a question:'), read(X),
sentence(X,Sentence,Subject,Object,Verb),nl,write(Subject),nl,write(Verb),nl,write(Object).
sentence(Sentence,sentence(Noun_Phrase, Verb_Phrase),Subject,Object,Verb):-
np(Sentence,Noun_Phrase,Rem),
vp(Rem,Verb_Phrase),
nl, write(sentence(Noun_Phrase,Verb_Phrase)),
noun(Subject),
member(Subject,Sentence),
noun(Object),
member(Object,Rem),
verb(Verb),
member(Verb,Rem),
best_match(Subject,Object,Verb).
member(X,[X|_]).
member(X,[_|Tail]):-
member(X,Tail).
np([X|T],np(det(X),NP2),Rem):-
det(X),
np2(T,NP2,Rem).
np(Sentence,Parse,Rem):- np2(Sentence,Parse,Rem).
np(Sentence,np(NP,PP),Rem):-
np(Sentence,NP,Rem1),
pp(Rem1,PP,Rem).
np2([H|T],np2(noun(H)),T):-noun(H).
np2([H|T],np2(adj(H),Rest),Rem):- adj(H),np2(T,Rest,Rem).
pp([H|T],pp(prep(H),Parse),Rem):-
prep(H),
np(T,Parse,Rem).
vp([H|[]],verb(H)):-
verb(H).
vp([H|T],vp(verb(H),Rest)):-
verb(H),
pp(T, Rest,_).
vp([H|T],vp(verb(H),Rest)):-
verb(H),
np(T, Rest,_).
As i said i had number 2 working until i tried number 3, now it just prints the parsed sentence out and then give me a 'Error: out of local stack message' any help is greatly appreciated! So at the top is the knowledge base with which we are comparing out list to find the best match, these are called (albeit incorrectly at this stage) by the best_match method, which executes immediately after the sentence method which parses the sentence and extract the key words. Also i apologise if the code is terribly laid out!
Cheers
I assume the person who posted this is never coming back, I wanted to remind myself some prolog, so here it is.
There are two major issues with this code, apart from the fact that there are still some logical problems in some predicates.
Problem 1:
You ignored singleton warnings, and they usually are something not to be ignored. The best match predicate should look like this:
best_match(Subject,Object,Verb):-
history(X,Y),
member(Subject,Y),
member(Object,Y),
member(Verb,Y),
write(X),nl,
fail.
The other warning was about the Sentence variable in the sentence predicate, so it goes like this:
sentence(X,Subject,Object,Verb),nl,write(Subject),nl,write(Verb),nl,write(Object).
sentence(Sentence,Subject,Object,Verb):-
np(Sentence,_,Rem),
vp(Rem,_),
nl,
noun(Subject),
member(Subject,Sentence),
noun(Object),
member(Object,Rem),
verb(Verb),
member(Verb,Rem),
best_match(Subject,Object,Verb).
Problem 2:
I assume you divided the np logic into np and np2 to avoid infinite loops, but then forgot to apply this division just where it was necessary. The longest np clause should be:
np(Sentence,np(NP,PP),Rem):-
np2(Sentence,NP,Rem1),
pp(Rem1,PP,Rem).
If you really wanted to allow more complicated np there, which I doubt, you can do it like this:
np(Sentence,np(NP,PP),Rem):-
append(List1,List2,Sentence),
List1\=[],
List2\=[],
np(List1,NP,Rem1),
append(Rem1,List2,Rem2),
pp(Rem2,PP,Rem).
This way you will not end up calling np with the same arguments over and over again, because you make sure that the sentence checked is shorter each time.
Minor issues:
(How the program works, after the infinite loop problem has been fixed)
The last vp is repeated
I am not sure about your grammar, and e.g. why "defeated" is a noun...
Just to check that the program works I used the sentence [edwardIV,defeated,henryVI,on,palm_Sunday].
I changed "defeated" to a verb, and also changed the last vp clause to:
vp([H|T],vp(verb(H),Rest)):-
verb(H),
np(T,_,Rest1),
pp(Rest1, Rest,_).
For the example sentence I got battle_of_Boroughbridge and battle_of_Towton as results.
Is it cool?
IMO one-liners reduces the readability and makes debugging/understanding more difficult.
Maximize understandability of the code.
Sometimes that means putting (simple, easily understood) expressions on one line in order to get more code in a given amount of screen real-estate (i.e. the source code editor).
Other times that means taking small steps to make it obvious what the code means.
One-liners should be a side-effect, not a goal (nor something to be avoided).
If there is a simple way of expressing something in a single line of code, that's great. If it's just a case of stuffing in lots of expressions into a single line, that's not so good.
To explain what I mean - LINQ allows you to express quite complicated transformations in relative simplicity. That's great - but I wouldn't try to fit a huge LINQ expression onto a single line. For instance:
var query = from person in employees
where person.Salary > 10000m
orderby person.Name
select new { person.Name, person.Deparment };
is more readable than:
var query = from person in employees where person.Salary > 10000m orderby person.Name select new { person.Name, person.Deparment };
It's also more readabe than doing all the filtering, ordering and projection manually. It's a nice sweet-spot.
Trying to be "clever" is rarely a good idea - but if you can express something simply and concisely, that's good.
One-liners, when used properly, transmit your intent clearly and make the structure of your code easier to grasp.
A python example is list comprehensions:
new_lst = [i for i in lst if some_condition]
instead of:
new_lst = []
for i in lst:
if some_condition:
new_lst.append(i)
This is a commonly used idiom that makes your code much more readable and compact. So, the best of both worlds can be achieved in certain cases.
This is by definition subjective, and due to the vagueness of the question, you'll likely get answers all over the map. Are you referring to a single physical line or logical line? EG, are you talking about:
int x = BigHonkinClassName.GetInstance().MyObjectProperty.PropertyX.IntValue.This.That.TheOther;
or
int x = BigHonkinClassName.GetInstance().
MyObjectProperty.PropertyX.IntValue.
This.That.TheOther;
One-liners, to me, are a matter of "what feels right." In the case above, I'd probably break that into both physical and logic lines, getting the instance of BigHonkinClassName, then pulling the full path to .TheOther. But that's just me. Other people will disagree. (And there's room for that. Like I said, subjective.)
Regarding readability, bear in mind that, for many languages, even "one-liners" can be broken out into multiple lines. If you have a long set of conditions for the conditional ternary operator (? :), for example, it might behoove you to break it into multiple physical lines for readability:
int x = (/* some long condition */) ?
/* some long method/property name returning an int */ :
/* some long method/property name returning an int */ ;
At the end of the day, the answer is always: "It depends." Some frameworks (such as many DAL generators, EG SubSonic) almost require obscenely long one-liners to get any real work done. Othertimes, breaking that into multiple lines is quite preferable.
Given concrete examples, the community can provide better, more practical advice.
In general, I definitely don't think you should ever "squeeze" a bunch of code onto a single physical line. That doesn't just hurt legibility, it smacks of someone who has outright disdain for the maintenance programmer. As I used to teach my students: always code for the maintenance programmer, because it will often be you.
:)
Oneliners can be useful in some situations
int value = bool ? 1 : 0;
But for the most part they make the code harder to follow. I think you only should put things on one line when it is easy to follow, the intent is clear, and it won't affect debugging.
One-liners should be treated on a case-by-case basis. Sometimes it can really hurt readability and a more verbose (read: easy-to-follow) version should be used.
There are times, however when a one-liner seems more natural. Take the following:
int Total = (Something ? 1 : 2)
+ (SomethingElse ? (AnotherThing ? x : y) : z);
Or the equivalent (slightly less readable?):
int Total = Something ? 1 : 2;
Total += SomethingElse ? (AnotherThing ? x : y) : z;
IMHO, I would prefer either of the above to the following:
int Total;
if (Something)
Total = 1;
else
Total = 2;
if (SomethingElse)
if (AnotherThing)
Total += x;
else
Total += y;
else
Total += z
With the nested if-statements, I have a harder time figuring out the final result without tracing through it. The one-liner feels more like the math formula it was intended to be, and consequently easier to follow.
As far as the cool factor, there is a certain feeling of accomplishment / show-off factor in "Look Ma, I wrote a whole program in one line!". But I wouldn't use it in any context other than playing around; I certainly wouldn't want to have to go back and debug it!
Ultimately, with real (production) projects, whatever makes it easiest to understand is best. Because there will come a time that you or someone else will be looking at the code again. What they say is true: time is precious.
That's true in most cases, but in some cases where one-liners are common idioms, then it's acceptable. ? : might be an example. Closure might be another one.
No, it is annoying.
One liners can be more readable and they can be less readable. You'll have to judge from case to case.
And, of course, on the prompt one-liners rule.
VASTLY more important is developing and sticking to a consistent style.
You'll find bugs MUCH faster, be better able to share code with others, and even code faster if you merely develop and stick to a pattern.
One aspect of this is to make a decision on one-liners. Here's one example from my shop (I run a small coding department) - how we handle IFs:
Ifs shall never be all on one line if they overflow the visible line length, including any indentation.
Thou shalt never have else clauses on the same line as the if even if it comports with the line-length rule.
Develop your own style and STICK WITH IT (or, refactor all code in the same project if you change style).
.
The main drawback of "one liners" in my opinion is that it makes it hard to break on the code and debug. For example, pretend you have the following code:
a().b().c(d() + e())
If this isn't working, its hard to inspect the intermediate values. However, it's trivial to break with gdb (or whatever other tool you may be using) in the following, and check each individual variable and see precisely what is failing:
A = a();
B = A.b();
D = d();
E = e(); // here i can query A B D and E
B.C(d + e);
One rule of thumb is if you can express the concept of the one line in plain language in a very short sentence. "If it's true, set it to this, otherwise set it to that"
For a code construct where the ultimate objective of the entire structure is to decide what value to set a single variable, With appropriate formatting, it is almost always clearer to put multiple conditonals into a single statement. With multiple nested if end if elses, the overall objective, to set the variable...
" variableName = "
must be repeated in every nested clause, and the eye must read all of them to see this.. with a singlr statement, it is much clearer, and with the appropriate formatting, the complexity is more easily managed as well...
decimal cost =
usePriority? PriorityRate * weight:
useAirFreight? AirRate * weight:
crossMultRegions? MultRegionRate:
SingleRegionRate;
The prose is an easily understood one liner that works.
The cons is the concatenation of obfuscated gibberish on one line.
Generally, I'd call it a bad idea (although I do it myself on occasion) -- it strikes me as something that's done more to impress on how clever someone is than it is to make good code. "Clever tricks" of that sort are generally very bad.
That said, I personally aim to have one "idea" per line of code; if this burst of logic is easily encapsulated in a single thought, then go ahead. If you have to stop and puzzle it out a bit, best to break it up.