How to see if string matches any regex keys in ruby hash - ruby

I have a JSON file full of regex keys with responses based on the message sent (eg. Hello, Dragnflier!). The file contains values like this:
{
"/hello/i" : "Why hello there!",
"/how are you.*dragnflier/i" : "I'm good thank you! How are you?"
}
I load these into a hash at the start of my ruby program. Is there a more efficient way to see if the message matches any of the regular expressions in my hash than just running a loop over it with all of the keys? I want to get the value that the key returns, not a list of keys or a boolean value.

The solution ended up being, based on other answers for the opposite case:
val = myhash.keys.select {|key| message.to_s.match(key)}

Yes, there is more efficient way:
hash = {
/hello/i => "Why hello there!",
/how are you.*dragnflier/i => "I'm good thank you! How are you?"
}
message =~ Regexp.union(hash.keys)
You stated that the goal is to check “if the message matches any of the regular expressions in my hash.” The above is way more efficient than the solution you came up with.
After this preliminary check is done, one might do whatever she wants to detect the respective key (this claim appeared in the questions after I have it answered.) This approach will be more efficient than just bruteforce detect on keys.
Please note, that the answer you have provided is not correct, since
Regexp.new '/foo/i'
becomes
#⇒ /\/foo\/i/i
and not
#⇒ /foo/i
as you probably expected.

Related

How do I alphabetize an array ignoring case?

I'm using Chris Pine's Learn to Program and am stumped on his relatively simple challenge to take user input in the form of a list of random words and then alphabetize them in an array. Questions about this challenge have come up before, but I haven't been able to find my specific question on SO, so I'm sorry if it's a duplicate.
puts "Here's a fun trick. Type as many words as you want (one per line) and
I'll sort them in...ALPHABETICAL ORDER! Hold on to your hats!"
wordlist = Array.new
while (userInput = gets.chomp) != ''
wordlist.push(userInput)
end
puts wordlist.sort
While this does the trick, I'm trying to figure out how to alphabetize the array without case-sensitivity. This is hard to wrap my head around.
I learned about casecmp but that seems to be a method for comparing a specific string, as opposed to an array of strings.
So far I've been trying things like:
wordlist.to_s.downcase.to_a.sort!
which, in addition to looking bad, doesn't work for multiple reasons, including that Ruby 2.0 doesn't allow strings to be converted to arrays.
How about:
wordlist.sort_by { |word| word.downcase }
Or even shorter:
wordlist.sort_by(&:downcase)
In general, sort_by is not efficient for keys that are simple to compute. A more efficient comparison is to use sort with a block and replace the default comparison operator <=> with casecmp
wordlist.sort { |w1, w2| w1.casecmp(w2) }
For more information about efficiency gains, consult the official Ruby documentation for the sort_by method: http://www.ruby-doc.org/core-2.1.2/Enumerable.html#method-i-sort_by
I had the same question at my Ruby coding bootcamp. Here's what worked for me:
puts "Type in a sentence."
sentence = gets.chomp.downcase
puts sentence.split(" ").sort

better way to do assignment and check result

I have to use String.scan function, which returns empty array if there is no match.
I wanted to assign a variable with the scan function and check it there is a match, but unfortunately I cannot do that because it won't return nil or false on no match.
I wanted to do this (1 line):
if ip = str.scan(/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/)
...
#use ip
end
but because it won't return nil on no match I must do:
ip_match = str.scan(/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/)
unless ip_match.empty?
#use ip
end
Is there some more elegant way to write this - to be able to do assignment and empty check at the same time or some other way to beautify the code?
Thanks
Since scan returns an array, and even if you are sure there would be only one result, you could do this.
str.scan(/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/).each do |ip|
#use ip
end
There's a difference between elegant and cryptic or "concise".
In Perl you'll often see people write something equivalent to:
if (!(ip = str.scan(/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/)).empty?)
It's a bit more concise, terse, tight, whatever you want to call it. It also leads to maintenance issues because of the = (equate) vs. what should normally be an equality test. If the code is passed to someone who doesn't understand the logic, they might mistakenly "correct" that, and then break the code.
In Ruby it's idiomatic to not use equate in a conditional test, because of the maintenance issue, and instead use the assignment followed by a test. It's clearer code.
Personally, I prefer to not use unless in that sort of situation. It's an ongoing discussion whether unless helps generate more understandable code; I prefer if (!ip_match.empty?) because it reads more like we'd normally talk -- I seldom start a statement with unless in conversation. Your mileage might vary.
I would preferably do something like this using String helper match
ip_validator = /^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/
# match return nil if no match
if str.match ip_validator
# blah blah blah.....
end
help me keep code dry and clean.
May be this is not the most elegant , looking for others if any :)
Your ip_validator regex seems to be week check this out Rails 3: Validate IP String

Rails 3 - Check if string/text includes a certain word/character via regex in controller

I am working on a quoting mechanism in my app, where it should be possible to simply type #26, for example, in the comment form in order to quote comment 26 of that topic.
To check if a user wants to quote one or more comments in the first place, I put an if condition after my current_user.comments.build and before #comment.save.
But, just to make my question a bit more general and easier to adapt:
if #comment.content.include?(/\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\z/i)
I want something like this. That example was for checking if the comment's content includes emails. But logically I get an "can't convert regexp to string" error.
How can you do the include? method in rails with an regexp? So, to check whether a text includes a string of a certain regex format?
Or is the controller the wrong place for such regex actions?
I do ruby regex'es this way:
stringObj.match(/regex/)
There's also
if #comment.content =~ /regex/
If you had an array of all previous comments #prev_comments and wanted to replace them all in one shot, you could:
pattern = /#(\d+)/
#comment.content.gsub(pattern) do
cur_match = Regexp.last_match
idx = cur_match[1].to_i - 1
#prev_comments[idx]
end
Trick is using Regexp.last_match to get the current match, which made me wonder if it was thread safe. It is, apparently.
adapted (stolen) from the below more general String extension
class String
def js_replace(pattern, &block)
gsub(pattern) do |_|
md = Regexp.last_match
args = [md.to_s, md.captures, md.begin(0), self].flatten
block.call(*args)
end
end
end
Source: http://vemod.net/string-js_replace
To match the nature of .include?
stringObj.match(/regex/).present?
Would give similar true/false outcomes if you're using Rails (or ActiveSupport)

Why is "#{String}" a common idiom in Ruby

A Ruby dev I know asked this; my answer is below... Are there other, better reasons?
Why do so many Ruby programmers do
"#{string}"
rather than
string
since the second form is simpler and more efficient?
Is this a common idiom for Ruby developers? I don't see it that much.
Smaller changes when you later need to do more than simply get the value of the string, but also prepend/append to it at the point of use seems to be the best motivation I can find for that idiom.
There is only one case where this is a recommended idiom :
fname = 'john'
lname = 'doe'
name = "#{fname} #{lname}"
The code above is more efficient than :
name = fname + ' ' + lname
or
name = [fname, lname].join(' ')
What's the broader context of some of the usages? The only thing I can come up with beyond what's already been mentioned is as a loose attempt at type safety; that is, you may receive anything as an argument, and this could ensure that whatever you pass in walks like a duck..or, well, a string (though string.to_s would arguably be clearer).
In general though, this is probably a code smell that someone along the way thought was Best Practices.
I use this kind of code, so that I can pass nil as string and it still will work on a string, rather than seeing some exceptions flying:
def short(string = nil)
"#{string}"[0..7]
end
And it's easier/faster to append some debug code, if it's already in quotes.
So in short: It's more convenient.
Interesting answers, everyone. I'm the developer who asked the original question. To give some more context, I see this occasionally at my current job, and also sometimes in sample code on the Rails list, with variables that are known in advance to contain strings. I could sort of understand it as a substitute for to_s, but I don't think that's what's going on here; I think people just forget that you don't need the interpolation syntax if you're just passing a string variable.
If anyone tried to tell me this was a best practice, I'd run away at top speed.
maybe it is easy way to convert any to string? Because it is the same as call to_s method. But it is quite strange way :).
a = [1,2,3]
"#{a}"
#=> "123"
a.to_s
#=> "123"
I could image this being useful in cases where the object being interpolated is not always a String, as the interpolation implicitly calls #to_s:
"#{'bla'}" => "bla"
"#{%r([a-z])}" => "(?-mix:[a-z])"
"#{{:bla => :blub}}" => "blablub"
May make sense when logging something, where you don't care so much about the output format, but never want an error because of a wrong argument type.

Building a "Semi-Natural Language" DSL in Ruby

I'm interested in building a DSL in Ruby for use in parsing microblog updates. Specifically, I thought that I could translate text into a Ruby string in the same way as the Rails gem allows "4.days.ago". I already have regex code that will translate the text
#USER_A: give X points to #USER_B for accomplishing some task
#USER_B: take Y points from #USER_A for not giving me enough points
into something like
Scorekeeper.new.give(x).to("USER_B").for("accomplishing some task").giver("USER_A")
Scorekeeper.new.take(x).from("USER_A").for("not giving me enough points").giver("USER_B")
It's acceptable to me to formalize the syntax of the updates so that only standardized text is provided and parsed, allowing me to smartly process updates. Thus, it seems it's more a question of how to implement the DSL class. I have the following stub class (removed all error checking and replaced some with comments to minimize paste):
class Scorekeeper
attr_accessor :score, :user, :reason, :sender
def give(num)
# Can 'give 4' or can 'give a -5'; ensure 'to' called
self.score = num
self
end
def take(num)
# ensure negative and 'from' called
self.score = num < 0 ? num : num * -1
self
end
def plus
self.score > 0
end
def to (str)
self.user = str
self
end
def from(str)
self.user = str
self
end
def for(str)
self.reason = str
self
end
def giver(str)
self.sender = str
self
end
def command
str = plus ? "giving ##{user} #{score} points" : "taking #{score * -1} points from ##{user}"
"##{sender} is #{str} for #{reason}"
end
end
Running the following commands:
t = eval('Scorekeeper.new.take(4).from("USER_A").for("not giving me enough points").giver("USER_B")')
p t.command
p t.inspect
Yields the expected results:
"#USER_B is taking 4 points from #USER_A for not giving me enough points"
"#<Scorekeeper:0x100152010 #reason=\"not giving me enough points\", #user=\"USER_A\", #score=4, #sender=\"USER_B\">"
So my question is mainly, am I doing anything to shoot myself in the foot by building upon this implementation? Does anyone have any examples for improvement in the DSL class itself or any warnings for me?
BTW, to get the eval string, I'm mostly using sub/gsub and regex, I figured that's the easiest way, but I could be wrong.
Am I understanding you correctly: you want to take a string from a user and cause it to trigger some behavior?
Based on the two examples you listed, you probably can get by with using regular expressions.
For example, to parse this example:
#USER_A: give X points to #USER_B for accomplishing some task
With Ruby:
input = "#abe: give 2 points to #bob for writing clean code"
PATTERN = /^#(.+?): give ([0-9]+) points to #(.+?) for (.+?)$/
input =~ PATTERN
user_a = $~[1] # => "abe"
x = $~[2] # => "2"
user_b = $~[3] # => "bob"
why = $~[4] # => "writing clean code"
But if there is more complexity, at some point you might find it easier and more maintainable to use a real parser. If you want a parser that works well with Ruby, I recommend Treetop: http://treetop.rubyforge.org/
The idea of taking a string and converting it to code to be evaled makes me nervous. Using eval is a big risk and should be avoided if possible. There are other ways to accomplish your goal. I'll be happy to give some ideas if you want.
A question about the DSL you suggest: are you going to use it natively in another part of your application? Or do just plan on using it as part of the process to convert the string into the behavior you want? I'm not sure what is best without knowing more, but you may not need the DSL if you are just parsing the strings.
This echoes some of my thoughts on a tangental project (an old-style text MOO).
I'm not convinced that a compiler-style parser is going to be the best way for the program to deal with the vaguaries of english text. My current thoughts have me splitting up the understanding of english into seperate objects -- so a box understands "open box" but not "press button", etc. -- and then having the objects use some sort of DSL to call centralised code that actually makes things happen.
I'm not sure that you've got to the point where you understand how the DSL is actually going to help you. Maybe you need to look at how the english text gets turned into DSL, first. I'm not saying that you don't need a DSL; you might very well be right.
As for hints as to how to do that? Well, I think if I were you I would be looking for specific verbs. Each verb would "know" what sort of thing it should expect from the text around it. So in your example "to" and "from" would expect a user immediately following.
This isn't especially divergent from the code you've posted here, IMO.
You might get some milage out of looking at the answers to my question. One commenter pointed me to the Interpreter Pattern, which I found especially enlightening: there's a nice Ruby example here.
Building on #David_James' answer, I've come up with a regex-only solution to this since I'm not actually using the DSL anywhere else to build scores and am merely parsing out points to users. I've got two patterns that I'll use to search:
SEARCH_STRING = "#Scorekeeper give a healthy 4 to the great #USER_A for doing something
really cool.Then give the friendly #USER_B a healthy five points for working on this.
Then take seven points from the jerk #USER_C."
PATTERN_A = /\b(give|take)[\s\w]*([+-]?[0-9]|one|two|three|four|five|six|seven|eight|nine|ten)[\s\w]*\b(to|from)[\s\w]*#([a-zA-Z0-9_]*)\b/i
PATTERN_B = /\bgive[\s\w]*#([a-zA-Z0-9_]*)\b[\s\w]*([+-]?[0-9]|one|two|three|four|five|six|seven|eight|nine|ten)/i
SEARCH_STRING.scan(PATTERN_A) # => [["give", "4", "to", "USER_A"],
# ["take", "seven", "from", "USER_C"]]
SEARCH_STRING.scan(PATTERN_B) # => [["USER_B", "five"]]
The regex might be cleaned up a bit, but this allows me to have syntax that allows a few fun adjectives while still pulling the core information using both "name->points" and "points->name" syntaxes. It does not allow me to grab the reason, but that's so complex that for now I'm going to just store the entire update, since the whole update will be related to the context of each score anyway in all but outlier cases. Getting the "giver" username can be done elsewhere as well.
I've written up a description of these expressions as well, in hopes that other people might find that useful (and so that I can go back to it and remember what that long string of gobbledygook means :)

Resources