Need a regular expression to match a dynamically-determined repeat count - ruby

I need a ruby regexp pattern that matches a string containing a letter (for simplicity say 'a') n times and then n at the end.
For example, it should match "aaa3", "aaaa4" etc but not "a2" or "aaa1", etc.

I can do it in Perl, but not in Ruby.
/^(a+)(??{length($1)})$/
Fun, eh?
Check it out: http://ideone.com/ShB6C

That is not possible in regex since it is not a regular language (that's easy to prove with the Pumping Lemma for Regular Languages). I'm not sure how much more powerful ruby regex is than a true Regular Expression, but I doubt it's powerful enough for this. You can set a finite limit on it and state each possibility like:
a1|aa2|aaa3|aaaa4|aaaaa5||aaaaaa6||aaaaaaa7||aaaaaaaa8||aaaaaaaaa9
Since all finite lanugages are regular, but it would be much easy to use string operations to count the number of times a letter appears and then parse for that integer in the string right after the last of that letter.

I just woke up, so take this with a grain of salt, but instead of doing it with a single regex, an easy way to do it would be
def f(s)
s =~ /(a+)(\d)/
$1.size == $2.to_i
end #=> nil
f 'aaa3' #=> true
f 'aa3' #=> false

Related

Perform subtraction within regular expression

I have the following RSpec output:
30 examples, 15 failures
I would like to subtract the second number from the first. I have this code:
def capture_passing_score(output)
captures = output.match(/^(?<total>\d+)\s*examples,\s*(?<failed>\d+)\s*failures$/)
captures[:total].to_i - captures[:failed].to_i
end
I am wondering if there is a way to do the calculation within a regular expression. Ideally, I'd avoid the second step in my code, and subtract the numbers within a regex. Performing mathematical operations may not be possible with Ruby's (or any) regex engine, but I couldn't find an answer either way. Is this possible?
Nope.
By every definition I have ever seen, Regular Expressions are about text processing. It is character based pattern matching. Numbers are a class of textual characters in Regex and do not represent their numerical values. While syntactic sugar may mask what is actually being done, you still need to convert the text to a numeric value to perform the subtraction.
WikiPedia
RubyDoc
If you know the format is going to remain consistent, you could do something like this:
output.scan(/\d+/).map(&:to_i).inject(:-)
It's not doing the subtraction via regex, but it does make it more concise.

Count Number of Sentence Ruby

I happened to search around everywhere and did not managed to find a solution to count number of sentence in a String using Ruby. Does anyone how to do it?
Example
string = "The best things in an artist’s work are so much a matter of intuition, that there is much to be said for the point of view that would altogether discourage intellectual inquiry into artistic phenomena on the part of the artist. Intuitions are shy things and apt to disappear if looked into too closely. And there is undoubtedly a danger that too much knowledge and training may supplant the natural intuitive feeling of a student, leaving only a cold knowledge of the means of expression in its place. For the artist, if he has the right stuff in him ... "
This string should return number 4.
You can split the text into sentences and count them. Here:
string.scan(/[^\.!?]+[\.!?]/).map(&:strip).count # scan has regex to split string and strip will remove trailing spaces.
# => 4
Explaining regex:
[^\.!?]
Caret inside of a character class [^ ] is the negation operator. Which means we are looking for characters which are not present in list: ., ! and ?.
+
is a greedy operator that returns matches between 1 and unlimited times. (capturing our sentences here and ignoring repetitions like ...)
[\.!?]
matching characters ., ! or ?.
In a nutshell, we are capturing all characters that are not ., ! or ? till we get characters that are ., ! or ?. Which basically can be treated as a sentence (in broad senses).
I think it makes sense to consider a word char followed by a ?! or . the delimiter of a sentence:
string.strip.split(/\w[?!.]/).length
#=> 4
So I'm not considering the ... a delimiter when it hangs on it's own like that:
"I waited a while ... and then I went home"
But then again, maybe I should...
It also occurs to me that maybe a better delimiter is a punctuation followed by some space and a capital letter:
string.split(/[?!.]\s+[A-Z]/).length
#=> 4
Sentences end with full stops, question marks, and exclamation marks. They can also be
separated with dashes and other punctuation, but we won’t worry about these rare cases here.
The split is simple. Instead of asking Ruby to split the text on one type of character, you simply
ask it to split on any of three types of characters, like so:
txt = "The best things in an artist’s work are so much a matter of intuition, that there is much to be said for the point of view that would altogether discourage intellectual inquiry into artistic phenomena on the part of the artist. Intuitions are shy things and apt to disappear if looked into too closely. And there is undoubtedly a danger that too much knowledge and training may supplant the natural intuitive feeling of a student, leaving only a cold knowledge of the means of expression in its place. For the artist, if he has the right stuff in him ... "
sentence_count = txt.split(/\.|\?|!/).length
puts sentence_count
#=> 7
string.squeeze('.!?').count('.!?')
#=> 4

Is there a method to find the most specific pattern for a string?

I'm wondering whether there is a way to generate the most specific regular expression (if such a thing exists) that matches a given string. Here's an illustration of what I want the method to do:
str = "(17 + 31)"
find_pattern(str)
# => /^\(\d+ \+ \d+\)$/ (or something more specific)
My intuition was to use Regex.new to accumulate the desired pattern by looping through str and checking for known patterns like \d, \s, and so on. I suspect there is an easy way for doing this.
This is in essence an algorithm compression problem. The simplest way to match a list of known strings is to use Regexp.union factory method, but that just tries each string in turn, it does not do anything "clever":
combined_rx = Regexp.union( "(17 + 31)", "(17 + 45)" )
=> /\(17\ \+\ 31\)|\(17\ \+\ 45\)/
This can still be useful to construct multi-stage validators, without you needing to write loops to check them all.
However, a generic pattern matcher that could figure out what you mean to match from examples is not really possible. There are too many ways in which you could consider strings to be similar or not. The closest I could think of would be genetic programming where you supply a large list of should match/should not match strings and the code guesses at the best regex by constructing random Regexp objects (a challenge in itself) and seeing how accurately they match and don't match your examples. The best matchers could be combined and mutated and tried again until you got 100% accuracy. This might be a fun project, but ultimately much more effort for most purposes than writing the regular expressions yourself from a description of the problem.
If your problem is heavily constrained - e.g. any example integer could always be replaced by \d+, any example space by \s+ etc, then you could work through the string replacing "matchable units", in fact using the same regular expressions checked in turn. E.g. if you match \A\d+ then consume the match from the string, and add \d+ to your regex. Then take the remainder of the string and look for next matching pattern. Working this way will have its limitations (you must know the full set of patterns you want to match in advance, and all examples would have to be unambiguous). However, it is more tractable than a genetic program.

Ruby evaluate an expression with regex, no eval

Sorry if this is duplicated. I thought I'd reword my question a little bit.
How could I use regex to evaluate a mathematical expression? Without using the eval function.
Example expressions:
math1 = "1+1"
math2 = "3+2-1"
I would like it to work for a variable number of numbers in the expression like I showed in the example.
This is just a bad idea. Regexp is not a parser, nor an evaluator.
Use a grammar to describe your expressions. Parse it with a formal parser like the lovely ruby gem Treetop. Then evaluate the abstract syntax tree (AST) produced by the parser.
Gosh, Treetop's arithmetic example practically gives you the solution for free.
This is a little late, but I wrote a gem for evaluating arbitrary mathematical expressions (and it doesn't use eval internally): https://github.com/rubysolo/dentaku
For addition and subtraction, this should work
(?:(/d+)([-+]))+(/d+)
This means:
one or more digits, followed by exactly one plus or minus
the above can be repeated as many times as required (this is a non capturing group)
and then must end with one or more digits.
Note that each individual number and sign are captured in groups 1..n
So to evaluate, you could take captures 1 and 3, applying the sign from capture 2. Then apply the sign from capture 4 (if it exists) with the previous result and the number from capture 5 (which must exist if capture 4 exists) and so on...
So to evaluate, in psuedo code:
i=1
result=capture(i)
loop while i <= (n-2) (where n is the capture count):
If capture(i+1) == "-" // is subtraction
result = result - capture(i+2)
Else // is addition
result = result + capture(i+2)
End if
i = i + 2
End while
This is only going to work for simple addition and subtraction like in the examples you provided, as it relies on left to right associativity. As others have suggested, you'll probably need to properly parse anything more complex, eg by building a tree of nodes that can then be evaluated in the correct (depth-first?) order.
This is really messy…
math2 = "12+3-4"
head, *tail = math2.scan(/(?<digits>\d+)(?<op>[\+\-\*\/])?/)
.map{|(digits,op)|
[digits.to_i,op]
}
.reverse
tail.inject(head.first){|sum,(digits,op)|
op.nil? ?
digits :
digits.send(op,sum)
}
# => 11
You should really consider a parser though.

Regular Expression help in Ruby

Can anybody help me write a regular expression which could find all the instances of the following in a long string >
type="array" count="x" total="y"
where x and y could be any numbers from 1 to 100.
language is ruby.
First, since we'll use the regex for a number twice, we'll save it as its own variable. Note that the number regex is comprised of three separate pieces: one-digit numbers, two-digit numbers, and three-digit numbers. This is a good rule of thumb to use when trying to make a regex to match a range of numbers. It's easy to get it wrong otherwise (allowing strings like "07").
Once you have the number regex, the rest is easy.
number = /[1-9]|[1-9][0-9]|100/
regex = /type="array" count="#{number}" total="#{number}"/
string.scan(regex)
This will return an array of matches
long_string.scan(/type="array" count="(?:[1-9]\d?|100)" total="(?:[1-9]\d?|100)")

Resources