Building a "Semi-Natural Language" DSL in Ruby - ruby

I'm interested in building a DSL in Ruby for use in parsing microblog updates. Specifically, I thought that I could translate text into a Ruby string in the same way as the Rails gem allows "4.days.ago". I already have regex code that will translate the text
#USER_A: give X points to #USER_B for accomplishing some task
#USER_B: take Y points from #USER_A for not giving me enough points
into something like
Scorekeeper.new.give(x).to("USER_B").for("accomplishing some task").giver("USER_A")
Scorekeeper.new.take(x).from("USER_A").for("not giving me enough points").giver("USER_B")
It's acceptable to me to formalize the syntax of the updates so that only standardized text is provided and parsed, allowing me to smartly process updates. Thus, it seems it's more a question of how to implement the DSL class. I have the following stub class (removed all error checking and replaced some with comments to minimize paste):
class Scorekeeper
attr_accessor :score, :user, :reason, :sender
def give(num)
# Can 'give 4' or can 'give a -5'; ensure 'to' called
self.score = num
self
end
def take(num)
# ensure negative and 'from' called
self.score = num < 0 ? num : num * -1
self
end
def plus
self.score > 0
end
def to (str)
self.user = str
self
end
def from(str)
self.user = str
self
end
def for(str)
self.reason = str
self
end
def giver(str)
self.sender = str
self
end
def command
str = plus ? "giving ##{user} #{score} points" : "taking #{score * -1} points from ##{user}"
"##{sender} is #{str} for #{reason}"
end
end
Running the following commands:
t = eval('Scorekeeper.new.take(4).from("USER_A").for("not giving me enough points").giver("USER_B")')
p t.command
p t.inspect
Yields the expected results:
"#USER_B is taking 4 points from #USER_A for not giving me enough points"
"#<Scorekeeper:0x100152010 #reason=\"not giving me enough points\", #user=\"USER_A\", #score=4, #sender=\"USER_B\">"
So my question is mainly, am I doing anything to shoot myself in the foot by building upon this implementation? Does anyone have any examples for improvement in the DSL class itself or any warnings for me?
BTW, to get the eval string, I'm mostly using sub/gsub and regex, I figured that's the easiest way, but I could be wrong.

Am I understanding you correctly: you want to take a string from a user and cause it to trigger some behavior?
Based on the two examples you listed, you probably can get by with using regular expressions.
For example, to parse this example:
#USER_A: give X points to #USER_B for accomplishing some task
With Ruby:
input = "#abe: give 2 points to #bob for writing clean code"
PATTERN = /^#(.+?): give ([0-9]+) points to #(.+?) for (.+?)$/
input =~ PATTERN
user_a = $~[1] # => "abe"
x = $~[2] # => "2"
user_b = $~[3] # => "bob"
why = $~[4] # => "writing clean code"
But if there is more complexity, at some point you might find it easier and more maintainable to use a real parser. If you want a parser that works well with Ruby, I recommend Treetop: http://treetop.rubyforge.org/
The idea of taking a string and converting it to code to be evaled makes me nervous. Using eval is a big risk and should be avoided if possible. There are other ways to accomplish your goal. I'll be happy to give some ideas if you want.
A question about the DSL you suggest: are you going to use it natively in another part of your application? Or do just plan on using it as part of the process to convert the string into the behavior you want? I'm not sure what is best without knowing more, but you may not need the DSL if you are just parsing the strings.

This echoes some of my thoughts on a tangental project (an old-style text MOO).
I'm not convinced that a compiler-style parser is going to be the best way for the program to deal with the vaguaries of english text. My current thoughts have me splitting up the understanding of english into seperate objects -- so a box understands "open box" but not "press button", etc. -- and then having the objects use some sort of DSL to call centralised code that actually makes things happen.
I'm not sure that you've got to the point where you understand how the DSL is actually going to help you. Maybe you need to look at how the english text gets turned into DSL, first. I'm not saying that you don't need a DSL; you might very well be right.
As for hints as to how to do that? Well, I think if I were you I would be looking for specific verbs. Each verb would "know" what sort of thing it should expect from the text around it. So in your example "to" and "from" would expect a user immediately following.
This isn't especially divergent from the code you've posted here, IMO.
You might get some milage out of looking at the answers to my question. One commenter pointed me to the Interpreter Pattern, which I found especially enlightening: there's a nice Ruby example here.

Building on #David_James' answer, I've come up with a regex-only solution to this since I'm not actually using the DSL anywhere else to build scores and am merely parsing out points to users. I've got two patterns that I'll use to search:
SEARCH_STRING = "#Scorekeeper give a healthy 4 to the great #USER_A for doing something
really cool.Then give the friendly #USER_B a healthy five points for working on this.
Then take seven points from the jerk #USER_C."
PATTERN_A = /\b(give|take)[\s\w]*([+-]?[0-9]|one|two|three|four|five|six|seven|eight|nine|ten)[\s\w]*\b(to|from)[\s\w]*#([a-zA-Z0-9_]*)\b/i
PATTERN_B = /\bgive[\s\w]*#([a-zA-Z0-9_]*)\b[\s\w]*([+-]?[0-9]|one|two|three|four|five|six|seven|eight|nine|ten)/i
SEARCH_STRING.scan(PATTERN_A) # => [["give", "4", "to", "USER_A"],
# ["take", "seven", "from", "USER_C"]]
SEARCH_STRING.scan(PATTERN_B) # => [["USER_B", "five"]]
The regex might be cleaned up a bit, but this allows me to have syntax that allows a few fun adjectives while still pulling the core information using both "name->points" and "points->name" syntaxes. It does not allow me to grab the reason, but that's so complex that for now I'm going to just store the entire update, since the whole update will be related to the context of each score anyway in all but outlier cases. Getting the "giver" username can be done elsewhere as well.
I've written up a description of these expressions as well, in hopes that other people might find that useful (and so that I can go back to it and remember what that long string of gobbledygook means :)

Related

Specify Ruby method namespace for readability

This is a bit of a weird question, but I'm not quite sure how to look it up. In our project, we already have an existing concept of a "shift". There's a section of code that reads:
foo.shift
In this scenario, it's easy to read this as trying to access the shift variable of object foo. But it could also be Array#shift. Is there a way to specify which class we expect the method to belong to? I've tried variations such as:
foo.send(Array.shift)
Array.shift(foo)
to make it more obvious which method was being called, but I can't get it to work. Is there a way to be more explicit about which class the method you're trying to call belongs to to help in code readability?
On a fundamental level you shouldn't be concerned about this sort of thing and you absolutely can't tell the Array shift method to operate on anything but an Array object. Many of the core Ruby classes are implemented in C and have optimizations that often depend on specific internals being present. There's safety measures in place to prevent you from trying to do something too crazy, like rebinding and applying methods of that sort arbitrarily.
Here's an example of two "shifty" objects to help illustrate a real-world situation and how that applies:
class CharacterArray < Array
def initialize(*args)
super(args.flat_map(&:chars))
end
def inspect
join('').inspect
end
end
class CharacterList < String
def shift
slice!(0, 1)
end
end
You can smash Array#shift on to the first and it will work by pure chance because you're dealing with an Array. It won't work with the second one because that's not an Array, it's missing significant methods that the shift method likely depends on.
In practice it doesn't matter what you're using, they're both the same:
list_a = CharacterArray.new("test")
list_a.shift
# => "t"
list_a.shift
# => "e"
list_a << "y"
# => "sty"
list_b = CharacterList.new("test")
list_b.shift
# => "t"
list_b.shift
# => "e"
list_b << "y"
# => "sty"
These both implement the same interfaces, they both produce the same results, and as far as you're concerned, as the caller, that's good enough. This is the foundation of Duck Typing which is the philosophy Ruby has deeply embraced.
If you try the rebind trick on the CharacterList you're going to end up in trouble, it won't work, yet that class delivers on all your expectations as far as interface goes.
Edit: As Sergio points out, you can't use the rebind technique, Ruby abruptly explodes:
Array.instance_method(:shift).bind(list_b).call
# => Error: bind argument must be an instance of Array (TypeError)
If readability is the goal then that has 35 more characters than list_b.shift which is usually going dramatically in the wrong direction.
After some discussion in the comments, one solution is:
Array.instance_method(:shift).bind(foo).call
Super ugly, but gets across the idea that I wanted which was to completely specify which instance method was actually being called. Alternatives would be to rename the variable to something like foo_array or to call it as foo.to_a.shift.
The reason this is difficult is that Ruby is not strongly-typed, and this question is all about trying to bring stronger typing to it. That's why the solution is gross! Thanks to everybody for their input!

Selecting key words in a string (that are included in an Array) to change their format in Ruby

Select key words in a string to change their format in Ruby
I have a big string (text) and an Array of strings (key_words) as below:
text = 'So in this election, we cannot sit back and hope that everything works out for the best. We cannot afford to be tired or frustrated or cynical. No, hear me. Between now and November, we need to do what we did eight years ago and four years ago…'
key_words = ['frustrated', 'tired', 'hope']
My objective is to print each word in ‘text’ while changing the colour and case of the words that are included in key_words. I’ve been able to do that by doing:
require 'colorize'
text.split(/\b/).each do |x|
if key_words.include?(x.downcase) ; print '#{x}'.colorize(:red)
else print '#{x}' end
end
However, since I don’t want to include many words in key_words I want to make the selection more sensitive going beyond an exact match. Such as if, for example:
key_words = ['frustrat', 'tire', 'hope'] => the algorithm would select both 'Frustration', 'Frustrated' or 'Tiring' and 'Tired' or 'Hope' and 'Hopeful'.
I’ve tried playing with word lengths in both the string and the array as below but it’s seems very inefficient solution and I’m getting very confused with the usage of .any? and .include? methods in this scenario.
key_words = ['frustrated', 'tired', 'hope']
key_words_abb = []
key_words.each { |x| key_words_abb << x.downcase[0][0..x.length-2]}
text.split(/\b/).each do |x|
if key_words_abb.include?(x.downcase[0][0..x.length-2]); print '#{x}'.colorize(:red)
else print x
end
end
Since I can’t find a specific solution online I would appreciate your help.
It's worth noting that when doing repeated substitutions on strings, especially longer ones, you'll want your substitution method to be as efficient as possible. Spinning through an array of things to switch out is painfully expensive, especially as that list grows.
Here's a variation on your approach:
replacement = Regexp.new('\b%s\b' % [ Regexp.union(key_words) ])
replaced = text.gsub(replacement) do |s|
s.colorize(:red)
end
puts replaced
If you're using that substitution repeatedly you should persist the Regexp object into a constant. That avoids having to compile it for each string you're adjusting. If the list changes based on factors hard to predict, leave it like this and produce it dynamically.
One thing to note about using Ruby is it's often best to express your code as a series of transformations with output as a final step. Putting things like print in the middle of a loop complicates things unnecessarily. If you want to add an additional step to your loop you have to do a lot of extra work to move that print to a later stage. With the approach here you can just chain on the end and do whatever you want.

Ruby send method with rails associations

I have been messing about with making a sortable table module thing. I know some might exist but want to get experience doing this myself. I had the idea of have it like so:
SortedTable.new(ModelName, Hash_Of_Columns_And_Fields, ID)
example
SortedTable.new(Post, {"Title" => "title", "Body" => "body", "Last Comment" => "comment.last.title"}, params[:id])
I am planning to do something like:
def initialize(model, fields, id)
data = {}
model = model.capitalize.constantize
model.find(id)
fields.each do |column, field|
data[column] = model.send(field)
end
end
This works fine for title and body but when it comes to getting the Last Comment with comment.last.title it errors out. I have tried doing Post.send("comments.last.title") but says NoMethodError: undefined method 'comments.last.title' for #<Post:0x0000010331d220>
I know I can do Post.send("comments").send("last").send("title") and that works but I can not think of how to do that dynamically by taking the fields and spliting the on . then chaining the sends. Can anyone give me advice on how to do this? If I am doing this completely wrong also then please say or point me in the direction of code that does something similar. I am not a expert ruby developer but I am trying.
P.S The above code might not work as I am not at a computer with ruby/rails to test, but hopefully you get the concept.
Cheers
first and dirtiest solution is eval
fields.each do |column, field|
data[column] = eval("#{model}.#{field}")
end
next solution little more functional
fields.each do |column, field|
data[column] = field.split(".").inject(model){|obj, met| obj.send(met)}
end
PS
And your design is ugly
EDIT
The inject can be written more concisely as field.split('.').inject(model, :send). And I'd strongly discourage the eval way — unnecessary evals are one more case where you can slip up and allow arbitrary code execution, and they're also slow. (Also, I'm pretty sure that should just be eval("model.#{field}") — you don't want to interpolate the string value of model. Yet another example of how easy it is to slip up an eval expression.) – #Chuck

"In Ruby there's more than one way of doing the same thing" - what does this mean?

Feel free to delete this topic if it's discussed or quite obvious. I hail from C# background and I'm planning to learn Ruby. Everything I read about it seems quite intriguing. But I'm confused over this basic philosophy of Ruby that "there's more than one way to do one thing". Can someone provide 2 or 3 simple arithmetic or string examples to make this point clear, like if its about the syntaxes or logics etc.
Thanks
"More than one way of doing something" means having the choice of doing something the way you want it. That way you can use various programming styles, no matter what background you're coming from.
Iteration using for vs. blocks
You can iterate over an array of things like so. This is pretty basic, and if you're from a Java background, this feels kind of natural.
for something in an_array
print something
end
A more Ruby-like way would be the following:
an_array.each do |something|
print something
end
The first is a rather well known way of doing things. The second one is using blocks, a very powerful concept that you'll find in many Ruby idioms. Basically, the array knows how to iterate over its contents, so you can modify this and add something like:
an_array.each_with_index do |something, index|
print "At #{index}, there is #{something}"
end
You could have done it like this too, but now you see that the above one looks easier:
index = 0
for something in an_array
print "At #{index}, there is #{something}"
index += 1
end
Passing arguments as usual or using Hashes
Normally, you would pass arguments like so:
def foo(arg1, arg2, arg3)
print "I have three arguments, which are #{arg1}, #{arg2} and #{arg3}"
end
foo("very", "easy", "classic")
=> "I have three arguments, which are very easy and classic"
However, you may also use a Hash to do that:
def foo(args)
print "I have multiple arguments, they are #{args[:arg1]}, #{args[:arg2]} and #{args[:arg3]}"
end
foo :arg1 => "in a", :arg2 => "hash", :arg3 => "cool"
=> "I have three arguments, which are in a hash and cool"
The second form is one used excessively by Ruby on Rails. The nice thing is that you now have named parameters. When you are passing them, you will more easily remember what they are used for.
It means a lot of confusion, style wars, and bugs due to subtle differences, all in the name of freedom of choice.
A somewhat trivial example is the use of alias/alias_method (also note that there are two similar ways for almost the same thing, e. g. alias versus alias_method).
Consider that you are working in a project and you forgot which API to use.
What was the name of the method again?
Well, you can just remain within the domain logic of your program at hand, and continue to work with it the way you want to; then you are going to simply add an alias in the main entry point of your other program.
People can use by default .collect or they can use .map, it makes little difference what you personally would use (I use .map since it is shorter).
The use of aliases helped me because after some months, I often can not remember how to use something. Yes, I could look it up, but why would I have to bother anyway? I can just use an alias instead. (Note that I do try to remain as simple as possible with aliases and APIs.)

Why is "#{String}" a common idiom in Ruby

A Ruby dev I know asked this; my answer is below... Are there other, better reasons?
Why do so many Ruby programmers do
"#{string}"
rather than
string
since the second form is simpler and more efficient?
Is this a common idiom for Ruby developers? I don't see it that much.
Smaller changes when you later need to do more than simply get the value of the string, but also prepend/append to it at the point of use seems to be the best motivation I can find for that idiom.
There is only one case where this is a recommended idiom :
fname = 'john'
lname = 'doe'
name = "#{fname} #{lname}"
The code above is more efficient than :
name = fname + ' ' + lname
or
name = [fname, lname].join(' ')
What's the broader context of some of the usages? The only thing I can come up with beyond what's already been mentioned is as a loose attempt at type safety; that is, you may receive anything as an argument, and this could ensure that whatever you pass in walks like a duck..or, well, a string (though string.to_s would arguably be clearer).
In general though, this is probably a code smell that someone along the way thought was Best Practices.
I use this kind of code, so that I can pass nil as string and it still will work on a string, rather than seeing some exceptions flying:
def short(string = nil)
"#{string}"[0..7]
end
And it's easier/faster to append some debug code, if it's already in quotes.
So in short: It's more convenient.
Interesting answers, everyone. I'm the developer who asked the original question. To give some more context, I see this occasionally at my current job, and also sometimes in sample code on the Rails list, with variables that are known in advance to contain strings. I could sort of understand it as a substitute for to_s, but I don't think that's what's going on here; I think people just forget that you don't need the interpolation syntax if you're just passing a string variable.
If anyone tried to tell me this was a best practice, I'd run away at top speed.
maybe it is easy way to convert any to string? Because it is the same as call to_s method. But it is quite strange way :).
a = [1,2,3]
"#{a}"
#=> "123"
a.to_s
#=> "123"
I could image this being useful in cases where the object being interpolated is not always a String, as the interpolation implicitly calls #to_s:
"#{'bla'}" => "bla"
"#{%r([a-z])}" => "(?-mix:[a-z])"
"#{{:bla => :blub}}" => "blablub"
May make sense when logging something, where you don't care so much about the output format, but never want an error because of a wrong argument type.

Resources