Basic Matching in Ruby - ruby

I am working through a book and it gives this example
x = "This is a test".match(/(\w+) (\w+)/)
We are looking at the parentheses and being able to access what is passed separately.
When I put the expression above into my IRB I get:
MatchData "This is" 1:"This" 2:"is">
Why doesn't this also include a and Test?
Would I have to include .match(/(\w+) (\w+) (\w+) (\w+)/) ?

The 'match' method is not matching the regex globally. It is only returning the first match. You can use the 'scan' method rather than 'match' and it should return an array of all matches of the regex.
[~]$ irb
1.8.7-p371 :001 > x = "This is a test".match(/(\w+) (\w+)/)
=> #<MatchData "This is" 1:"This" 2:"is">
1.8.7-p371 :002 > x = "This is a test".scan(/(\w+) (\w+)/)
=> [["This", "is"], ["a", "test"]]

Related

Regular expression for only 2 letters

I need to create regular expression for 2 and only 2 letters. I understood it has to be the following /[a-z]{2}/i, but it matches any string with 2 or more letters. Here is what I get:
my_reg_exp = /[a-z]{2}/i
my_reg_exp.match('aa') # => #<MatchData "aa">
my_reg_exp.match('AA') # => #<MatchData "AA">
my_reg_exp.match('a') # => nil
my_reg_exp.match('aaa') # => #<MatchData "aa">
Any suggestion?
You can add the anchors like this:
my_reg_exp = /^[a-z]{2}$/i
Test:
my_reg_exp.match('aaa')
#=> nil
my_reg_exp.match('aa')
#=> #<MatchData "aa">
Hao's solution matches isn't locale sensitive. If this is important for your use case:
/\a[[:alpha:]]{2}\z/
2.0.0-p451 :005 > 'aba' =~ /\A[[:alpha:]]{2}\Z/
=> nil
2.0.0-p451 :006 > 'ab' =~ /\A[[:alpha:]]{2}\Z/
=> 0
2.0.0-p451 :007 > 'xy' =~ /\A[[:alpha:]]{2}\Z/
=> 0
2.0.0-p451 :008 > 'zxy' =~ /\A[[:alpha:]]{2}\Z/
=> nil
Per usual, if you need further assistance, leave a comment.
You can use /\b[a-z]{2}\b/i to match a two-letter string. /b Matches a word-break.
This means you can scan a string to find all occurrences:
'Foo is a bar'.scan(/\b[a-z]{2}\b/i) #=> ["is"]
Or find the first match in a string using:
'a bc def'[/\b[a-z]{2}\b/i] # => "bc"

Ruby regular expression method !~

I don't remember where I learned the !~ method of the String class. However I know it compares a string to a regex and check whether the string not match the regex. See my below example.
C:\>irb
irb(main):001:0> "abba" =~ /(\w)(\w)\2\1/i
=> 0
irb(main):002:0> "xxxx" =~ /(\w)(\w)\2\1/i
=> 0
irb(main):003:0> "asdf" =~ /(\w)(\w)\2\1/i
=> nil
irb(main):004:0> "asdf" !~ /(\w)(\w)\2\1/i
=> true
irb(main):005:0> "asdf" !~ /asdf/i
=> false
irb(main):006:0>
I want to find more information of the method but I can't find it in the rdoc of both String and Regexp. Anyone can give some help?
Thanks.
Since this is the method you can find it here in the Methods filter.
I've found this description.
obj !~ other → true or false
Returns true if two objects do not match (using the =~ method), otherwise false.

putting enumeration with spaces in rails collection

irb(main):001:0> t = %w{this is a test}
=> ["this", "is", "a", "test"]
irb(main):002:0> t.size
=> 4
irb(main):003:0> t = %w{"this is" a test}
=> ["\"this", "is\"", "a", "test"]
irb(main):004:0> t.size
=> 4
In the end I expected t.size to be 3.
As suggested, each space has to be escaped ...which turns out to be a lot of work. What other options are there? I have a list of about 30 words that I need to put in a collection because I am showing them as checkboxes using simple_form
Why not just use a normal array so no one has to visually parse all the escaping to figure out what's going on? This is pretty clear:
t = [
'this is',
'a',
'test'
]
and the people maintaining your code won't hate you for using %w{} when it isn't appropriate or when they mess things up because they didn't see your escaped whitespace.
You need to escape the space with a '\', like t = %w{this\ is a test} if you dont want that space to be a splitter.
Escape the space using \:
%w{this\ is a test}
You can escape the space %w{this\ is a test} to get ['this is', 'a', 'test'], but in general I wouldn't use %w unless then intention is to split on whitespace.
As others have pointed out use the %w{} construct when spaces are the separator for the words. If you have items that must be quoted and still want to use the construct you can do:
> %w{a test here}.unshift("This is")
=> ["This is", "a", "test", "here"]
require 'csv'
str = '"this is" a test'
p CSV.parse_line(str,{:col_sep=>' '})
#=> ["this is", "a", "test"]

How do I use a ruby regex to get a substring?

I want to get the numbers out of strings such as:
person_3
person_34
person_356
city_4
city_15
etc...
It seems to me that the following should work:
string[/[0-9]*/]
but this always spits out an empty string.
[0-9]* successfully matches "0 or more" digits at the beginning of the string, so it returns "". [0-9]+ will match "1 or more" digits, and works as you expect:
irb(main):001:0> x = "test 92"
=> "test 92"
irb(main):003:0> x[/\d*/]
=> ""
irb(main):005:0> x.index(/\d*/)
=> 0
irb(main):004:0> x[/\d+/]
=> "92"

Ruby regular expressions

I understand how to check for a pattern in string with regexp in ruby. What I am confused about is how to save the pattern found in string as a separate string.
I thought I could say something like:
if string =~ /regexp/
pattern = string.grep(/regexp/)
and then I could be on with my life. However, this isn't working as expected and is returning the entire original string. Any advice?
You're looking for string.match() in ruby.
irb(main):003:0> a
=> "hi"
irb(main):004:0> a=~/(hi)/
=> 0
irb(main):005:0> a.match(/hi/)
=> #<MatchData:0x5b6e8>
irb(main):006:0> a.match(/hi/)[0]
=> "hi"
irb(main):007:0> a.match(/h(i)/)[1]
=> "i"
irb(main):008:0>
But also for working with what you just matched in the if condition you can use $& $1..$9 and $~ as such:
irb(main):009:0> if a =~ /h(i)/
irb(main):010:1> puts("%s %s %s %s"%[$&,$1,$~[0],$~[1]])
irb(main):011:1> end
hi i hi i
=> nil
irb(main):012:0>
You can also use the special variables $& and $1-$n, like so:
if "regex" =~ /reg(ex)/
puts $&
puts $1
end
Outputs:
regex
ex
$~ also contains the MatchData object. See also: http://www.regular-expressions.info/ruby.html.
I prefer some shortcuts like:
email = "Khaled Al Habache <khellls#gmail.com>"
email[/<(.*?)>/, 1] # => "khellls#gmail.com"

Resources