I am developing a parser in Ruby using the parslet library.
The language I am parsing has a lot of keywords that can be merged into a single parsing rule like this:
rule(:keyword) {
str('keyword1') |
str('keyword2') |
str('keyword2') ...
}
Is there a good way to generate this set of lines of code dynamically, by reading a text file with all the keywords?
This would help me keep my parser clean and small, making it easier to add new keywords without modifying the code.
The pseudo-code of what I want to embed inside the rule(:keyword) would be somethings like this:
File.read("keywords.txt").each { |k| write_line " str(\'#{k}\') "}
So far, the workaround I have found is to have a separate ruby program loading the parser code as:
keywords = ["keyword1", "keyword2","keyword3"]
subs = {:keyword_list => keywords .inject("") { |a,k| a << "str('#{k}') | \n"} }
eval( File.read("parser.rb") % subs)
where the parser code has the following lines:
rule(:keywords){
%{keyword_list}
}
Is there a more elegant way to achieve this?
You can try something like this:
rule(:keyword) {
File.readlines("keywords.txt").map { |k| str(k.chomp) }.inject(&:|)
}
In this case, you don't really need to "generate lines of code". As #Uri tried to explain in his answer, there's nothing special about the contents of that rule method; it's just plain Ruby code. Because of this, anything you can do in Ruby you can do inside that rule method as well, including read files, dynamically call methods, and call methods on objects.
Let me break down your existing code, so I can better explain how a dynamic solution to the same problem would work:
rule(:keyword) {
# Stuff here
}
This code right here calls a rule method and passes it :keyword and a block of code. At some point, parslet will call that block and check its return value. Parslet might choose to call the block using instance_exec, which can change the context the block is being executed in to make methods not available outside the block (like str, perhaps) available inside it.
str('keyword1')
Here, inside the context of the rule block, you are calling a method named str with the string "keyword1", and getting the result. Nothing special here, this is just a normal method call.
str('keyword1') | str('keyword2')
Here, the | operator is actually just a method being called on whatever str('keyword1') is returning. This code is equivalent to str('keyword1').send(:'|', str('keyword2')).
str('keyword1') |
str('keyword2') |
str('keyword2')
Same as before, except this time we're calling | on whatever str('keyword1').send(:'|', str('keyword2')) returned. The result of this method call is returned to the rule method when it calls the block.
So now that you know how all this works, you can perform exactly the same operations (calling str with each keyword, and using the | method to "add up" the results) dynamically, based on the contents of a file perhaps:
rule(:keyword) {
File.readlines("keywords.txt").map(&:chomp).map { |k| str(k) }.inject(:|)
}
Breakdown:
rule(:keyword) { # Call the rule method with the `:keyword` argument, and pass
# it this block of code.
File.readlines("keywords.txt"). # Get an array of strings containing all the
# keywords
map(&:chomp). # Remove surrounding whitespace from each keyword in the array,
# by calling `chomp` on them. (The strings returned by
# `File.readlines` include the newline character at the end of
# each string.)
map { |k| str(k) }. # Convert each keyword in the array into whatever is
# returned by calling `str` with that keyword.
inject(:|) # Reduce the returned objects to a single one using the `|`
# method on each object. (Equivalent to obj1 | obj2 | obj3...)
}
And that's it! See? No need to generate any lines of code, just do what the real code is doing, but do it dynamically!
Related
So I have a string from a rendered template that looks like
"Dear {{user_name}},\r\n\r\nThank you for your purchase. If you have any questions, we are happy to help.\r\n\r\n\r\n{{company_name}}\r\n{{company_phone_number}}\r\n"
All those variables like {{user_name}} are optional and do not need to be included but I want to check that if they are, they have {{ in front of the variable name. I am using liquid to parse and render the template and couldn't get it to catch if the user only uses 1 (or no) opening brackets. I was only able to catch the proper number of closing brackets. So I wrote a method to check that if these variables exist, they have the correct opening brackets. It only works, however, if all those variables are found.
here is my method:
def validate_opening_brackets?(template)
text = %w(user_name company_name company_phone_number)
text.all? do |variable|
next unless template.include? variable
template.include? "{{#{variable}"
end
end
It works, but only if all variables are present. If, for example, the template created by the user does not include user_name, then it will return false. I've also done this loop using each, and creating a variable outside of the block that I assign false if the conditions are not met. I would really, however, like to get this to work using the all? method, as I can just return a boolean and it's cleaner.
If the question is about how to rewrite the all? block to make it return true if all present variable names have two brackets before them and false otherwise then you could use something like this:
def validate_opening_brackets?(template)
variables = %w(user_name company_name company_phone_number)
variables.all? do |variable|
!template.include?(variable) || template.include?("{{#{variable}")
end
end
TL;DR
There are multiple ways to do this, but the easiest way I can think of is to simply prefix/postfix a regular expression with the escaped characters used by Mustache/Liquid, and using alternation to check for each of your variable names within the template variable characters (e.g. double curly braces). You can then use String#scan and then return a Boolean from Enumerable#any? based on the contents of the Array returned by from #scan.
This works with your posted example, but there may certainly be other use cases where you need a more complex solution. YMMV.
Example Code
This solution escapes the leading and trailing { and } characters to avoid having them treated as special characters, and then interpolates the variable names with | for alternation. It returns a Boolean depending on whether templated variables are found.
def template_string_has_interpolations? str
var_names = %w[user_name company_name company_phone_number]
regexp = /\{\{#{var_names.join ?|}\}\}/
str.scan(regexp).any?
end
Tested Examples
template_string_has_interpolations? "Dear {{user_name}},\r\n\r\nThank you for your purchase. If you have any questions, we are happy to help.\r\n\r\n\r\n{{company_name}}\r\n{{company_phone_number}}\r\n"
#=> true
template_string_has_interpolations? "Dear Customer,\r\n\r\nThank you for your purchase. If you have any questions, we are happy to help.\r\n\r\n\r\nCompany, Inc.\r\n(555) 555-5555\r\n"
#=> false
I have a question that I've already found the solution to (or perhaps it is just chance), but I'm hoping someone can explain why it works, and what Ruby is doing being the scenes here.
I'm doing something with fixed width output text and ANSI color codes. I don't want the escaped characters to count towards my length, so I wrote a little method for the String class to calculate the length excluding the color codes:
def length_minus_codes
color_codes = [ "\033[30m",
"\033[0m" ,
"\033[31m",
"\033[32m",
"\033[33m",
"\033[34m",
"\033[35m",
"\033[36m",
"\033[37m",
"\033[40m",
"\033[41m",
"\033[42m",
"\033[43m",
"\033[44m",
"\033[45m",
"\033[46m",
"\033[47m",
"\033[1m",
"\033[22m",
"\033[7m",
"\033[27m"]
#Create new variable to strip
stripped_self = self
#loop through color code array
for index in 0 ... color_codes.size
#strip color codes from string
stripped_self.gsub!(color_codes[index],"")
end
#return variance of self to stripped self to
#get length of string not including color codes
return self.length - (self.length - stripped_self.length)
end
end
I thought it was working fine, until I realized that after it was called, the string it was called on had the character codes stripped from it.
I tried a few things, before decided to change this:
stripped_self.gsub!(color_codes[index],"")
To this:
stripped_self = stripped_self.gsub(color_codes[index],"")
Now it is working fine.
What I don't understand is why? I understand the basic concept of in place methods (!) which I was using on the gsub, but it wasn't modifying self, but rather a variable that I set in the method, and second I only want to return the length of the string, not an actual string.
Can anyone explain what is happening here?
When you do
stripped_self = self
you are simply creating a new reference to the self string object, you are not creating a new string. So any in-place modifications (by gsub! in this case) will be reflected on the self object.
If you want to create a new object that is not a reference, you need to duplicate the object:
stripped_self = self.dup
Possibly a simpler solution here is just to use the non-bang version of gsub and save that to a variable. gsub! changes the receiver as bang methods often do, gsub will simply return a modified object safely without effecting the receiver.
I am parsing a Wiki text from an XML dump, for a string named 'section' which includes templates in double braces, including some arguments, which I want to reorganize.
This has an example named TextTerm:
section="Sample of a text with a first template {{TextTerm|arg1a|arg2a|arg3a...}} and then a second {{TextTerm|arg1b|arg2b|arg3b...}} etc."
I can use scan and a regex to get each template and work on it on a loop using:
section.scan(/\{\{(TextTerm)\|(.*?)\|(.*?)\}\}/i).each { |item| puts "1=" + item[1] # arg1a etc.}
And, I have been able to extract the database of the first argument of the template.
Now I also want to replace the name of the template "NewTextTerm" and reorganize its arguments by placing the second argument in place of the first.
Can I do it in the same loop? For example by changing scan by a gsub(rgexp){ block}:
section.gsub!(/\{\{(TextTerm)\|(.*?)\|(.*?)\}\}/) { |item| '{{NewTextTerm|\2|\1}}'}
I get:
"Sample of a text with a first template {{NewTextTerm|\\2|\\1}} and then a second {{NewTextTerm|\\2|\\1}} etc."
meaning that the arguments of the regexp are not recognized. Even if it worked, I would like to have some place within the gsub block to work on the arguments. For example, I can't have a puts in the gsub block similar to the scan().each block but only a string to be substituted.
Any ideas are welcome.
PS: Some editing: braces and "section= added", code is complete.
When you have the replacement as a string argument, you can use '\1', etc. like this:
string.gsub!(regex, '...\1...\2...')
When you have the replacement as a block, you can use "#$1", etc. like this:
string.gsub!(regex){"...#$1...#$2..."}
You are mixing the uses. Stick to either one.
Yes, changing the quote by a double quote isn't enough, #$1 is the answer. Here is the complete code:
section="Sample of a text with a first template {{TextTerm|arg1a|arg2a|arg3a...}} and then a second {{TextTerm|arg1b|arg2b|arg3b...}} etc."
section.gsub(/\{\{(TextTerm)\|(.*?)\|(.*?)\}\}/) { |item| "{{New#$1|#$3|#$2}}"}
"Sample of a text with a first template {{NewTextTerm|arg2a|arg3a...|arg1a}} and then a second {{NewTextTerm|arg2b|arg3b...|arg1b}} etc."
Thus, it works. Thanks.
But now I have to replace the string, by a "function" returning the changed string:
def stringreturn(arg1,arg2,arg3) strr = "{{New"+arg1 + arg3 +arg2 + "}}"; return strr ; end
and
section.gsub(/\{\{(TextTerm)\|(.*?)\|(.*?)\}\}/) { |item| stringreturn("#$1","|#$2","|#$3") }
will return:
"Sample of a text with a first template {{NewTextTerm|arg2a|arg3a...|arg1a}} and then a second {{NewTextTerm|arg2b|arg3b...|arg1b}} etc."
Thanks to all!
There is probably a better way to manipulate arguments in MediaWiki templates using Ruby.
comics = load_comics( '/comics.txt' )
Popup.make do
h1 "Comics on the Web"
list do
comics.each do |name, url|
link name, url
end
end
end
I am new to ruby. This is a piece of code from a ruby website.
I cant find what 'link' and 'list' keyword in the menu.
can someone explain it a little bit those two keywords, and where is the definition of those two keyword .
I am also confused on how they read the variables name and url, they are reading it by the space at the same line or what?
so if I have
Comics1 link_of_comics_site_1
Comics2 link_of_comics_site_2
Comics3 link_of_comics_site_3
so for the first iteration, name=Comics1, and url =link_of_comics_site_1
Thanks.
That's not just Ruby. That's a template for a webpage using ruby add-on methods for HTML generation.
But presumably, the result of the call to load_comics is a Hash, where the keys are names and the values are URLs. You could make one of those yourself:
my_comics_hash = { "name1" => "url1", "name2" => "url2" }
which you can then iterate over the same way:
my_comics_hash.each do |name, url|
puts "Name #{name} goes with URL #{url}"
end
In your code, it's building up an HTML list inside a popup window, but it's the same idea. The each method iterates over a collection - in this case a Hash - and runs some code on every item in that collection - in this case, each key/value pair. When you call each, you pass it a block of code inside do ... end; that's the code that gets run on each item. The current item is passed to the code block, which declares a variable to hold it inside the pipes right after the word do. Since we're iterating over key/value pairs, we can declare two variables, and the key goes in the first and the value in the second.
In ruby function, parenthesis is optional and the ";" end of statement is also optional. ej
link "click here" , "http://myweb.com"
is equivalent to :
link("click here", "http://myweb.com");
But If you have more than one statement in a line the ";" is a must, ej
link("click here1", "http://myweb.com"); link("click here2", "http://myweb.com");
In your code it could be written in
link(name, url)
or just
link(name, url);
or
link name, url
But it is highly recommended to put parenthesis around function parameters for readability unless you have other reason . The ";" is not common in ruby world .
Everything in this code works properly, except the contents of the $1 variable aren't being properly displayed. According to my tests, all the matching is being done properly, I am just having trouble figuring out how to actually output the contents of $1.
codeTags = {
/\[b\](.+?)\[\/b\]/m => "<strong>#{$1}</strong>",
/\[i\](.+?)\[\/i\]/m => "<em>#{$1}</em>"
}
regexp = Regexp.new(/(#{Regexp.union(codeTags.keys)})/)
message = (message).gsub(/#{regexp}/) do |match|
codeTags[codeTags.keys.select {|k| match =~ Regexp.new(k)}[0]]
end
return message.html_safe
Thank you!
As soon as you do this:
codeTags = {
/\[b\](.+?)\[\/b\]/m => "<strong>#{$1}</strong>",
/\[i\](.+?)\[\/i\]/m => "<em>#{$1}</em>"
}
The #{$1} bits in the values are interpolated using whatever happens to be in $1 at the time. The values will most likely be "<strong></strong>" and "<em></em>" and those aren't very useful.
And regexp is already a regular expression object so gsub(/#{regexp}/) should be just gsub(regexp). Similar things apply to the keys of codeTags, they're already regular expression objects so you don't need to Regexp.new(k).
I'd change the whole structure, you're overcomplicating things. Just something simple like this would be fine for only two replacements:
message = message.gsub(/\[b\](.*?)\[\/b\]/) { '<strong>' + $1 + '</strong>' }
message = message.gsub(/\[i\](.*?)\[\/i\]/) { '<em>' + $1 + '</em>' }
If you try to do it all at once you'll have problems with nesting in something like this:
message = 'Where [b]is[/b] pancakes [b]house [i]and[/i] more[/b] stuff?'
You'd end up having to use a recursive gsub and possibly some lambdas if you wanted to properly handle things like that with a single expression.
There are better things to spend your time on than trying to be clever on something like this.
Response to comments: If you have more bb-tags and some smilies to worry about and several messages per page then you should HTMLify each message when you create it. You could store only the HTML version or both HTML and BB-Code versions if you want the BB-Code stuff around for some reason. This way you'd only pay for the HTMLification once per message and producing your big lists would be nearly free.