Treetop grammar infinite loop - ruby

I have had some ideas for a new programming language floating around in my head, so I thought I'd take a shot at implementing it. A friend suggested I try using Treetop (the Ruby gem) to create a parser. Treetop's documentation is sparse, and I've never done this sort of thing before.
My parser is acting like it has an infinite loop in it, but with no stack traces; it is proving difficult to track down. Can somebody point me in the direction of an entry-level parsing/AST guide? I really need something that list rules, common usage etc for using tools like Treetop. My parser grammer is on GitHub, in case someone wishes to help me improve it.
class {
initialize = lambda (name) {
receiver.name = name
}
greet = lambda {
IO.puts("Hello, #{receiver.name}!")
}
}.new(:World).greet()

I asked treetop to compile your language into an .rb file. That gave me something to dig into:
$ tt -o /tmp/rip.rb /tmp/rip.treetop
Then I used this little stub to recreate the loop:
require 'treetop'
load '/tmp/rip.rb'
RipParser.new.parse('')
This hangs. Now, isn't that interesting! An empty string reproduces the behavior just as well as the dozen-or-so-line example in your question.
To find out where it's hanging, I used an Emacs keyboard macro to edit rip.rb, adding a debug statement to the entry of each method. For example:
def _nt_root
p [__LINE__, '_nt_root'] #DEBUG
start_index = index
Now we can see the scope of the loop:
[16, "root"]
[21, "_nt_root"]
[57, "_nt_statement"]
...
[3293, "_nt_eol"]
[3335, "_nt_semicolon"]
[3204, "_nt_comment"]
[57, "_nt_statement"]
[57, "_nt_statement"]
[57, "_nt_statement"]
...
Further debugging from there reveals that an integer is allowed to be an empty string:
rule integer
digit*
end
This indirectly allows a statement to be an empty string, and the top-level rule statement* to forever consume empty statements. Changing * to + fixes the loop, but reveals another problem:
/tmp/rip.rb:777:in `_nt_object': stack level too deep (SystemStackError)
from /tmp/rip.rb:757:in `_nt_compound_object'
from /tmp/rip.rb:1726:in `_nt_range'
from /tmp/rip.rb:1671:in `_nt_special_literals'
from /tmp/rip.rb:825:in `_nt_literal_object'
from /tmp/rip.rb:787:in `_nt_object'
from /tmp/rip.rb:757:in `_nt_compound_object'
from /tmp/rip.rb:1726:in `_nt_range'
from /tmp/rip.rb:1671:in `_nt_special_literals'
... 3283 levels...
Range is left-recursing, indirectly, via special_literals, literal_object, object, and compound_object. Treetop, when faced with left recursion, eats stack until it pukes. I don't have a quick fix for that problem, but at least you've got a stack trace to go from now.
Also, this is not your immediate problem, but the definition of digit is odd: It can either one digit, or multiple. This causes digit* or digit+ to allow the (presumably) illegal integer 1________2.

I really enjoyed Language Implementation Patterns by Parr; since Parr created the ANTLR parser generator, it's the tool he uses throughout the book, but it should be simple enough to learn from it all the same.
What I really liked about it was the way each example grew upon the previous one; he doesn't start out with a gigantic AST-capable parser, instead he slowly introduces problems that need more and more 'backend smarts' to do the job, so the book scales well along with the language that needs parsing.
What I wish it covered in a little more depth is the types of languages that one can write and give advice on Do's and Do Not Do's when designing languages. I've seen some languages that are a huge pain to parse and I'd have liked to know more about the design decisions that could have been made differently.

Related

Is there a better way to assign a Ruby hash while avoiding RuboCop's ABC Size warnings?

I have a method that builds a laptop's attributes, but only if the attributes are present within a row that is given to the method:
def build_laptop_attributes desk_id, row, laptop
attributes = {}
attributes[:desk_number] = room_id if laptop && desk_id
attributes[:status] = row[:state].downcase if row[:state]
attributes[:ip_address] = row[:ip_address] if row[:ip_address]
attributes[:model] = row[:model] if row[:model]
attributes
end
Currently, RuboCop is saying that the Metric/AbcSize is too high, and I was wondering if there is an obvious and clean way to assign these attributes?
Style Guides Provide "Best Practices"; Evaluate and Tune When Needed
First of all, RuboCop is advisory. Just because RuboCop complains about something doesn't mean it's wrong in some absolute sense; it just means you ought to expend a little more skull sweat (as you're doing) to see if what you're doing makes sense.
Secondly, you haven't provided a self-contained, executable example. That makes it impossible for SO readers to reliably refactor it, since it can't currently be tested without sample inputs and expected outputs not provided in your original post. You'll need those things yourself to evaluate and refactor your own code, too.
Finally, the ABC Metric looks at assignments, branches, and conditionals. You have five assignments, four conditionals, and what looks liks a method call. Is that a lot? If you haven't tuned Rubocop, the answer is "RuboCop thinks so." Whether or not you agree is up to you and your team.
If you want to try feeding Rubocop, you can do a couple of things that might help reduce the metric:
Refactor the volume and complexity of your assignments. Some possible examples include:
Replace your postfix if-statements with safe navigators (&.) to guard against calling methods on nil.
Extract some of your branching logic and conditionals to methods that "do the right thing", potentially reducing your current method to a single assignment with four method calls. For example:
attributes = { desk_number: location, status: laptop_status, ... }
Replace all your multiple assignments with a deconstructing assignment (although Rubocop often complains about those, too).
Revisit whether you have the right data structure in the first place. Maybe you really just want an OpenStruct, or some other data object.
Your current code seems readable, so is the juice really worth the squeeze? If you decide that RuboCop is misguided in this particular case, and your code works and passes muster in your internal code reviews, then you can tune the metric's sensitivity in your project's .rubocop.yml or disable that particular metric for just that section of your source code.
After reading #Todd A. Jacobs answer, you may want (or not) to write something like this:
def build_laptop_attributes desk_id, row, laptop
desk_number = room_id if laptop && desk_id
{
desk_number: desk_number,
status: row[:state]&.downcase,
ip_address: = row[:ip_address],
model: row[:model]
}.compact
end
This reduces has the advantage of reducing the number of calls to []=, as well as factorizing many ifs in a single compact.
In my opinion, it is more readable because it is more concise and because the emphasis is completely on the correspondence between your keys and values.
Alternative version to reduce the amount of conditionals (assuming you are checking for nil / initialized values):
def build_laptop_attributes desk_id, row, laptop
attributes = {}
attributes[:desk_number] = room_id if laptop && desk_id
attributes[:status] = row[:state]&.downcase
attributes[:ip_address] = row[:ip_address]
attributes[:model] = row[:model]
attributes.compact
end
There is an additional .compact as a cost of removing assignments checks.

fuzzy logic for query-based document summarisation in Python

I am trying to use fuzzy logic to weight and extract the best sentences for the query. I have extracted the following features which they can be used in fuzzy logic:
Each sentence has cosine value.
How many proper-noun is in the sentence.
the position of the sentence in the document.
sentence length.
I want to use the above features to apply the fuzzy logic. for instance, i want to create the rule base something like the following
if cosineValue >= 0.9 && numberOfPropernoun >=1
THEN the sentence is important
I am not quite sure how to start implementing the rule base, the facts and inference engine. It would like someone to guide me to implement this in python. Please note that I am not familiar with logic programming languages. I would like to implement it in python
This is just a sketch; I'm not even going to try this code because I'm not sure what you want.
Make a class for your features:
Features = namedtuple('Features', ['cosine', 'nouns', 'position', ...])
etc.
Now imagine you are building your AST. What grammar does your language have? Well, you have conditions, and your conditions have consequences, and your conditions can be combined by boolean operators, so let's make some basic ones:
class CosineValue(object):
def evaluate(self, features):
return features.cosine
class Nouns(object):
def evaluate(self, features):
return features.nouns
... etc.
Now you need to combine these AST nodes with some operations
class GreaterThan(object):
def __init__(self, property, value):
self.property, self.value = property, value
def evaluate(self, sentence):
return property.evaluate(sentence) > self.value
Now GreaterThan(CosineValue(), 0.9) is an object (an abstract syntax tree, actually) that represents cosineValue > 0.9. You can evaluate it like so:
expr = GreaterThan(CosineValue(), 0.9)
expr.evaluate(Features(cosine=0.95, ...)) # returns True
expr.evaluate(Features(cosine=0.40, ...)) # returns False
These objects don't look like much, but what they are doing is reifying your process. Their structure encodes what formerly would have been code. Think about this, because this is the only hard part about what you are trying to do: comprehending how you can delay computation by turning it into structure, and how you can play with when values become part of your computation. You were probably stuck thinking about how to write those "if" statements and keeping them separate from the code and the runtime values you need to run them against. Now you should be able to see how, but it's a more advanced way of thinking about programming.
Now you need to build your if/then structure. I'm not sure what you want here either but I would say your if/then is going to be a class that takes an expression like we've just created as one argument and a "then" case, and does the test and either performs or does not perform the "then" case. Probably you will need if/then/else, or else a way to track if it fired, or a way to evaluate your if into a value. You will have to think about this part; nobody can tell you based on what you wrote above what you should be doing.
To make your conditions more powerful, you will need to add some more classes for boolean operators that take conditions as arguments, but it should be straightforward; you'll have And and Or, they'll both take two Condition arguments and their evaluation will do the sensible thing. You could make a Condition superclass, and then add some methods like And and Or to simplify generating these structures.
Finally, if you want to parse something like what you have above, you should try out pyparsing, but make sure you have the AST figured out first or it will be an uphill battle. Or look at what they have; maybe they have some primitives for this, I haven't dealt with pyparsing in a long time.
Best of luck, and please ask a better question next time!

chaining ruby enumerator functions in a clean way

I just finished a course on ruby where the instructor takes a list of movies, groups them, then calls map, sort, and reverse. It works fine, but I don't find the syntax to be very readable and I'm trying to figure out if what I have in mind is valid. I come from a c# background.
#we can reformat our code to make it shorter
#note that a lot of people don't like calling functions on the
#end of function blocks. (I don't like the look, either)
count_by_month = movies.group_by do |movie|
movie.release_date.strftime("%B")
end.map do |month, list|
[month, list.size]
end.sort_by(&:last).reverse
What I am wondering is if I can do something like
#my question: can I do this?
count_by_month = movies.group_by(&:release_date.strftime("%B"))
.map(&:first, &:last.size)
.sort_by(&:last)
.reverse
#based on what I've seen online, I could maybe do something like
count_by_month = movies.groupBy({m -> m.release_date.strftime("%B")})
.map{|month, list| [month, list.size]}
.sort_by(&:last)
.reverse
As a number of people in the comments suggest, this is really a matter of style; that being said, I have to agree with the comments within the code and say that you want to avoid method chaining at the end of a do..end.
If you're going to split methods by line, use a do..end. {} and do...end are synonymous, as you know, but the braces are more often used (in my experience) for single-line pieces of code, and as 'mu is too short' pointed out, if you're set on using them, you may want to look into lambdas. But I'd stick to do..end in this case.
A general style rule I was taught that I follow is to split up chains if what is being worked with changes class in a way that might not be intuitive. ex: fizz = "buzz".split.reverse breaks up a string into an array, but it's clear what the code is doing.
In the example you provided, there's a lot going on that's a bit hard to follow; I like that you wrote out the group_by using hash notation in the last example because it's clear what the group_by is sorting by there and what the output is - I'd put it in a [well named] variable of its own.
grouped_by_month = movies.groupBy({m -> m.release_date.strftime("%B")})
count_by_month = grouped_by_month.map{|month, list| [month, list.size]}.sort_by(&:last).reverse
This splits up the code into one line that sets up the grouping hash and another line that manipulates it.
Again, this is style, so everyone has their own quirks; this is simply how I'd edit this based off a quick glance. You seem to be getting into Ruby quite well overall! Sometimes I just like the look of a chain of methods on one line, even if its against best practices (and I'm doing Project Euler or some other project of my own). I'd suggest looking at large projects on Github (ex: rails) to get a feel for how those far more experienced than myself write clean code. Good luck!

Why are else statements discouraged in Ruby?

I was looking for a Ruby code quality tool the other day, and I came across the pelusa gem, which looks interesting. One of the things it checks for is the number of else statements used in a given Ruby file.
My question is, why are these bad? I understand that if/else statements often add a great deal of complexity (and I get that the goal is to reduce code complexity) but how can a method that checks two cases be written without an else?
To recap, I have two questions:
1) Is there a reason other than reducing code complexity that else statements could be avoided?
2) Here's a sample method from the app I'm working on that uses an else statement. How would you write this without one? The only option I could think of would be a ternary statement, but there's enough logic in here that I think a ternary statement would actually be more complex and harder to read.
def deliver_email_verification_instructions
if Rails.env.test? || Rails.env.development?
deliver_email_verification_instructions!
else
delay.deliver_email_verification_instructions!
end
end
If you wrote this with a ternary operator, it would be:
def deliver_email_verification_instructions
(Rails.env.test? || Rails.env.development?) ? deliver_email_verification_instructions! : delay.deliver_email_verification_instructions!
end
Is that right? If so, isn't that way harder to read? Doesn't an else statement help break this up? Is there another, better, else-less way to write this that I'm not thinking of?
I guess I'm looking for stylistic considerations here.
Let me begin by saying that there isn't really anything wrong with your code, and generally you should be aware that whatever a code quality tool tells you might be complete nonsense, because it lacks the context to evaluate what you are actually doing.
But back to the code. If there was a class that had exactly one method where the snippet
if Rails.env.test? || Rails.env.development?
# Do stuff
else
# Do other stuff
end
occurred, that would be completely fine (there are always different approaches to a given thing, but you need not worry about that, even if programmers will hate you for not arguing with them about it :D).
Now comes the tricky part. People are lazy as hell, and thusly code snippets like the one above are easy targets for copy/paste coding (this is why people will argue that one should avoid them in the first place, because if you expand a class later you are more likely to just copy and paste stuff than to actually refactor it).
Let's look at your code snippet as an example. I'm basically proposing the same thing as #Mik_Die, however his example is equally prone to be copy/pasted as yours. Therefore, would should be done (IMO) is this:
class Foo
def initialize
#target = (Rails.env.test? || Rails.env.development?) ? self : delay
end
def deliver_email_verification_instructions
#target.deliver_email_verification_instructions!
end
end
This might not be applicable to your app as is, but I hope you get the idea, which is: Don't repeat yourself. Ever. Every time you repeat yourself, not only are you making your code less maintainable, but as a result also more prone to errors in the future, because one or even 99/100 occurrences of whatever you've copied and pasted might be changed, but the one remaining occurrence is what causes the #disasterOfEpicProportions in the end :)
Another point that I've forgotten was brought up by #RayToal (thanks :), which is that if/else constructs are often used in combination with boolean input parameters, resulting in constructs such as this one (actual code from a project I have to maintain):
class String
def uc(only_first=false)
if only_first
capitalize
else
upcase
end
end
end
Let us ignore the obvious method naming and monkey patching issues here, and focus on the if/else construct, which is used to give the uc method two different behaviors depending on the parameter only_first. Such code is a violation of the Single Responsibility Principle, because your method is doing more than one thing, which is why you should've written two methods in the first place.
def deliver_email_verification_instructions
subj = (Rails.env.test? || Rails.env.development?) ? self : delay
subj.deliver_email_verification_instructions!
end

What are the pros and cons of putting as much logic as possible in a minimum(one-liners) piece of code?

Is it cool?
IMO one-liners reduces the readability and makes debugging/understanding more difficult.
Maximize understandability of the code.
Sometimes that means putting (simple, easily understood) expressions on one line in order to get more code in a given amount of screen real-estate (i.e. the source code editor).
Other times that means taking small steps to make it obvious what the code means.
One-liners should be a side-effect, not a goal (nor something to be avoided).
If there is a simple way of expressing something in a single line of code, that's great. If it's just a case of stuffing in lots of expressions into a single line, that's not so good.
To explain what I mean - LINQ allows you to express quite complicated transformations in relative simplicity. That's great - but I wouldn't try to fit a huge LINQ expression onto a single line. For instance:
var query = from person in employees
where person.Salary > 10000m
orderby person.Name
select new { person.Name, person.Deparment };
is more readable than:
var query = from person in employees where person.Salary > 10000m orderby person.Name select new { person.Name, person.Deparment };
It's also more readabe than doing all the filtering, ordering and projection manually. It's a nice sweet-spot.
Trying to be "clever" is rarely a good idea - but if you can express something simply and concisely, that's good.
One-liners, when used properly, transmit your intent clearly and make the structure of your code easier to grasp.
A python example is list comprehensions:
new_lst = [i for i in lst if some_condition]
instead of:
new_lst = []
for i in lst:
if some_condition:
new_lst.append(i)
This is a commonly used idiom that makes your code much more readable and compact. So, the best of both worlds can be achieved in certain cases.
This is by definition subjective, and due to the vagueness of the question, you'll likely get answers all over the map. Are you referring to a single physical line or logical line? EG, are you talking about:
int x = BigHonkinClassName.GetInstance().MyObjectProperty.PropertyX.IntValue.This.That.TheOther;
or
int x = BigHonkinClassName.GetInstance().
MyObjectProperty.PropertyX.IntValue.
This.That.TheOther;
One-liners, to me, are a matter of "what feels right." In the case above, I'd probably break that into both physical and logic lines, getting the instance of BigHonkinClassName, then pulling the full path to .TheOther. But that's just me. Other people will disagree. (And there's room for that. Like I said, subjective.)
Regarding readability, bear in mind that, for many languages, even "one-liners" can be broken out into multiple lines. If you have a long set of conditions for the conditional ternary operator (? :), for example, it might behoove you to break it into multiple physical lines for readability:
int x = (/* some long condition */) ?
/* some long method/property name returning an int */ :
/* some long method/property name returning an int */ ;
At the end of the day, the answer is always: "It depends." Some frameworks (such as many DAL generators, EG SubSonic) almost require obscenely long one-liners to get any real work done. Othertimes, breaking that into multiple lines is quite preferable.
Given concrete examples, the community can provide better, more practical advice.
In general, I definitely don't think you should ever "squeeze" a bunch of code onto a single physical line. That doesn't just hurt legibility, it smacks of someone who has outright disdain for the maintenance programmer. As I used to teach my students: always code for the maintenance programmer, because it will often be you.
:)
Oneliners can be useful in some situations
int value = bool ? 1 : 0;
But for the most part they make the code harder to follow. I think you only should put things on one line when it is easy to follow, the intent is clear, and it won't affect debugging.
One-liners should be treated on a case-by-case basis. Sometimes it can really hurt readability and a more verbose (read: easy-to-follow) version should be used.
There are times, however when a one-liner seems more natural. Take the following:
int Total = (Something ? 1 : 2)
+ (SomethingElse ? (AnotherThing ? x : y) : z);
Or the equivalent (slightly less readable?):
int Total = Something ? 1 : 2;
Total += SomethingElse ? (AnotherThing ? x : y) : z;
IMHO, I would prefer either of the above to the following:
int Total;
if (Something)
Total = 1;
else
Total = 2;
if (SomethingElse)
if (AnotherThing)
Total += x;
else
Total += y;
else
Total += z
With the nested if-statements, I have a harder time figuring out the final result without tracing through it. The one-liner feels more like the math formula it was intended to be, and consequently easier to follow.
As far as the cool factor, there is a certain feeling of accomplishment / show-off factor in "Look Ma, I wrote a whole program in one line!". But I wouldn't use it in any context other than playing around; I certainly wouldn't want to have to go back and debug it!
Ultimately, with real (production) projects, whatever makes it easiest to understand is best. Because there will come a time that you or someone else will be looking at the code again. What they say is true: time is precious.
That's true in most cases, but in some cases where one-liners are common idioms, then it's acceptable. ? : might be an example. Closure might be another one.
No, it is annoying.
One liners can be more readable and they can be less readable. You'll have to judge from case to case.
And, of course, on the prompt one-liners rule.
VASTLY more important is developing and sticking to a consistent style.
You'll find bugs MUCH faster, be better able to share code with others, and even code faster if you merely develop and stick to a pattern.
One aspect of this is to make a decision on one-liners. Here's one example from my shop (I run a small coding department) - how we handle IFs:
Ifs shall never be all on one line if they overflow the visible line length, including any indentation.
Thou shalt never have else clauses on the same line as the if even if it comports with the line-length rule.
Develop your own style and STICK WITH IT (or, refactor all code in the same project if you change style).
.
The main drawback of "one liners" in my opinion is that it makes it hard to break on the code and debug. For example, pretend you have the following code:
a().b().c(d() + e())
If this isn't working, its hard to inspect the intermediate values. However, it's trivial to break with gdb (or whatever other tool you may be using) in the following, and check each individual variable and see precisely what is failing:
A = a();
B = A.b();
D = d();
E = e(); // here i can query A B D and E
B.C(d + e);
One rule of thumb is if you can express the concept of the one line in plain language in a very short sentence. "If it's true, set it to this, otherwise set it to that"
For a code construct where the ultimate objective of the entire structure is to decide what value to set a single variable, With appropriate formatting, it is almost always clearer to put multiple conditonals into a single statement. With multiple nested if end if elses, the overall objective, to set the variable...
" variableName = "
must be repeated in every nested clause, and the eye must read all of them to see this.. with a singlr statement, it is much clearer, and with the appropriate formatting, the complexity is more easily managed as well...
decimal cost =
usePriority? PriorityRate * weight:
useAirFreight? AirRate * weight:
crossMultRegions? MultRegionRate:
SingleRegionRate;
The prose is an easily understood one liner that works.
The cons is the concatenation of obfuscated gibberish on one line.
Generally, I'd call it a bad idea (although I do it myself on occasion) -- it strikes me as something that's done more to impress on how clever someone is than it is to make good code. "Clever tricks" of that sort are generally very bad.
That said, I personally aim to have one "idea" per line of code; if this burst of logic is easily encapsulated in a single thought, then go ahead. If you have to stop and puzzle it out a bit, best to break it up.

Resources