Ruby regular expressions - ruby

I understand how to check for a pattern in string with regexp in ruby. What I am confused about is how to save the pattern found in string as a separate string.
I thought I could say something like:
if string =~ /regexp/
pattern = string.grep(/regexp/)
and then I could be on with my life. However, this isn't working as expected and is returning the entire original string. Any advice?

You're looking for string.match() in ruby.
irb(main):003:0> a
=> "hi"
irb(main):004:0> a=~/(hi)/
=> 0
irb(main):005:0> a.match(/hi/)
=> #<MatchData:0x5b6e8>
irb(main):006:0> a.match(/hi/)[0]
=> "hi"
irb(main):007:0> a.match(/h(i)/)[1]
=> "i"
irb(main):008:0>
But also for working with what you just matched in the if condition you can use $& $1..$9 and $~ as such:
irb(main):009:0> if a =~ /h(i)/
irb(main):010:1> puts("%s %s %s %s"%[$&,$1,$~[0],$~[1]])
irb(main):011:1> end
hi i hi i
=> nil
irb(main):012:0>

You can also use the special variables $& and $1-$n, like so:
if "regex" =~ /reg(ex)/
puts $&
puts $1
end
Outputs:
regex
ex
$~ also contains the MatchData object. See also: http://www.regular-expressions.info/ruby.html.

I prefer some shortcuts like:
email = "Khaled Al Habache <khellls#gmail.com>"
email[/<(.*?)>/, 1] # => "khellls#gmail.com"

Related

Ruby - regex that matches string to pattern and detects unwanted occurrences [duplicate]

How do I a string against a regex such that it will return true if the whole string matches (not a substring)?
eg:
test( \ee\ , "street" ) #=> returns false
test( \ee\ , "ee" ) #=> returns true!
Thank you.
You can match the beginning of the string with \A and the end with \Z. In ruby ^ and $ match also the beginning and end of the line, respectively:
>> "a\na" =~ /^a$/
=> 0
>> "a\na" =~ /\Aa\Z/
=> nil
>> "a\na" =~ /\Aa\na\Z/
=> 0
This seems to work for me, although it does look ugly (probably a more attractive way it can be done):
!(string =~ /^ee$/).nil?
Of course everything inside // above can be any regex you want.
Example:
>> string = "street"
=> "street"
>> !(string =~ /^ee$/).nil?
=> false
>> string = "ee"
=> "ee"
>> !(string =~ /^ee$/).nil?
=> true
Note: Tested in Rails console with ruby (1.8.7) and rails (3.1.1)
So, what you are asking is how to test whether the two strings are equal, right? Just use string equality! This passes every single one of the examples that both you and Tomas cited:
'ee' == 'street' # => false
'ee' == 'ee' # => true
"a\na" == 'a' # => false
"a\na" == "a\na" # => true

Regular expression for only 2 letters

I need to create regular expression for 2 and only 2 letters. I understood it has to be the following /[a-z]{2}/i, but it matches any string with 2 or more letters. Here is what I get:
my_reg_exp = /[a-z]{2}/i
my_reg_exp.match('aa') # => #<MatchData "aa">
my_reg_exp.match('AA') # => #<MatchData "AA">
my_reg_exp.match('a') # => nil
my_reg_exp.match('aaa') # => #<MatchData "aa">
Any suggestion?
You can add the anchors like this:
my_reg_exp = /^[a-z]{2}$/i
Test:
my_reg_exp.match('aaa')
#=> nil
my_reg_exp.match('aa')
#=> #<MatchData "aa">
Hao's solution matches isn't locale sensitive. If this is important for your use case:
/\a[[:alpha:]]{2}\z/
2.0.0-p451 :005 > 'aba' =~ /\A[[:alpha:]]{2}\Z/
=> nil
2.0.0-p451 :006 > 'ab' =~ /\A[[:alpha:]]{2}\Z/
=> 0
2.0.0-p451 :007 > 'xy' =~ /\A[[:alpha:]]{2}\Z/
=> 0
2.0.0-p451 :008 > 'zxy' =~ /\A[[:alpha:]]{2}\Z/
=> nil
Per usual, if you need further assistance, leave a comment.
You can use /\b[a-z]{2}\b/i to match a two-letter string. /b Matches a word-break.
This means you can scan a string to find all occurrences:
'Foo is a bar'.scan(/\b[a-z]{2}\b/i) #=> ["is"]
Or find the first match in a string using:
'a bc def'[/\b[a-z]{2}\b/i] # => "bc"

Basic Matching in Ruby

I am working through a book and it gives this example
x = "This is a test".match(/(\w+) (\w+)/)
We are looking at the parentheses and being able to access what is passed separately.
When I put the expression above into my IRB I get:
MatchData "This is" 1:"This" 2:"is">
Why doesn't this also include a and Test?
Would I have to include .match(/(\w+) (\w+) (\w+) (\w+)/) ?
The 'match' method is not matching the regex globally. It is only returning the first match. You can use the 'scan' method rather than 'match' and it should return an array of all matches of the regex.
[~]$ irb
1.8.7-p371 :001 > x = "This is a test".match(/(\w+) (\w+)/)
=> #<MatchData "This is" 1:"This" 2:"is">
1.8.7-p371 :002 > x = "This is a test".scan(/(\w+) (\w+)/)
=> [["This", "is"], ["a", "test"]]

Ruby regular expression method !~

I don't remember where I learned the !~ method of the String class. However I know it compares a string to a regex and check whether the string not match the regex. See my below example.
C:\>irb
irb(main):001:0> "abba" =~ /(\w)(\w)\2\1/i
=> 0
irb(main):002:0> "xxxx" =~ /(\w)(\w)\2\1/i
=> 0
irb(main):003:0> "asdf" =~ /(\w)(\w)\2\1/i
=> nil
irb(main):004:0> "asdf" !~ /(\w)(\w)\2\1/i
=> true
irb(main):005:0> "asdf" !~ /asdf/i
=> false
irb(main):006:0>
I want to find more information of the method but I can't find it in the rdoc of both String and Regexp. Anyone can give some help?
Thanks.
Since this is the method you can find it here in the Methods filter.
I've found this description.
obj !~ other → true or false
Returns true if two objects do not match (using the =~ method), otherwise false.

How can I check a word is already all uppercase?

I want to be able to check if a word is already all uppercase. And it might also include numbers.
Example:
GO234 => yes
Go234 => no
You can compare the string with the same string but in uppercase:
'go234' == 'go234'.upcase #=> false
'GO234' == 'GO234'.upcase #=> true
a = "Go234"
a.match(/\p{Lower}/) # => #<MatchData "o">
b = "GO234"
b.match(/\p{Lower}/) # => nil
c = "123"
c.match(/\p{Lower}/) # => nil
d = "µ"
d.match(/\p{Lower}/) # => #<MatchData "µ">
So when the match result is nil, it is in uppercase already, else something is in lowercase.
Thank you #mu is too short mentioned that we should use /\p{Lower}/ instead to match non-English lower case letters.
I am using the solution by #PeterWong and it works great as long as the string you're checking against doesn't contain any special characters (as pointed out in the comments).
However if you want to use it for strings like "Überall", just add this slight modification:
utf_pattern = Regexp.new("\\p{Lower}".force_encoding("UTF-8"))
a = "Go234"
a.match(utf_pattern) # => #<MatchData "o">
b = "GO234"
b.match(utf_pattern) # => nil
b = "ÜÖ234"
b.match(utf_pattern) # => nil
b = "Über234"
b.match(utf_pattern) # => #<MatchData "b">
Have fun!
You could either compare the string and string.upcase for equality (as shown by JCorc..)
irb(main):007:0> str = "Go234"
=> "Go234"
irb(main):008:0> str == str.upcase
=> false
OR
you could call arg.upcase! and check for nil. (But this will modify the original argument, so you may have to create a copy)
irb(main):001:0> "GO234".upcase!
=> nil
irb(main):002:0> "Go234".upcase!
=> "GO234"
Update: If you want this to work for unicode.. (multi-byte), then string#upcase won't work, you'd need the unicode-util gem mentioned in this SO question

Resources