Match a string against multiple patterns - ruby

How can I match a string against multiple patterns using regular expression in ruby.
I am trying to see if a string is included in an array of prefixes, This is not working but I think it demonstrates at least what I am trying to do.
# example:
# prefixes.include?("Mrs. Kirsten Hess")
prefixes.include?(name) # should return true / false
prefixes = [
/Ms\.?/i,
/Miss/i,
/Mrs\.?/i,
/Mr\.?/i,
/Master/i,
/Rev\.?/i,
/Reverend/i,
/Fr\.?/i,
/Father/i,
/Dr\.?/i,
/Doctor/i,
/Atty\.?/i,
/Attorney/i,
/Prof\.?/i,
/Professor/i,
/Hon\.?/i,
/Honorable/i,
/Pres\.?/i,
/President/i,
/Gov\.?/i,
/Governor/i,
/Coach/i,
/Ofc\.?/i,
/Officer/i,
/Msgr\.?/i,
/Monsignor/i,
/Sr\.?/i,
/Sister\.?/i,
/Br\.?/i,
/Brother/i,
/Supt\.?/i,
/Superintendent/i,
/Rep\.?/i,
/Representative/i,
/Sen\.?/i,
/Senator/i,
/Amb\.?/i,
/Ambassador/i,
/Treas\.?/i,
/Treasurer/i,
/Sec\.?/i,
/Secretary/i,
/Pvt\.?/i,
/Private/i,
/Cpl\.?/i,
/Corporal/i,
/Sgt\.?/i,
/Sargent/i,
/Adm\.?/i,
/Administrative/i,
/Maj\.?/i,
/Major/i,
/Capt\.?/i,
/Captain/i,
/Cmdr\.?/i,
/Commander/i,
/Lt\.?/i,
/Lieutenant/i,
/^Lt Col\.?$/i,
/^Lieutenant Col$/i,
/Col\.?/i,
/Colonel/i,
/Gen\.?/i,
/General/i
]

Use Regexp.union to combine them:
union(pats_ary) → new_regexp
Return a Regexp object that is the union of the given patterns, i.e., will match any of its parts.
So this will do:
re = Regexp.union(prefixes)
then you use re as your regex:
if name.match(re)
#...

If you can use a single string, it might be faster to write a regex containing the possible values.
e.g.
/(Mr\.|Mrs\.| ... )/.match(name)

Related

How do I remove a common substring using Ruby?

I have read How do I remove substring after a certain character in a string using Ruby?. This is close, but different.
I have these emails with a mask:
email1 = 'giovanna.macedo#lojas100.com.br-215000695716b.ct.domain.com.br'
email2 = 'alvaro-neves#stockshop.com-215000695716b.ct.domain.com.br'
email3 = 'filiallojas123#filiallojas.net-215000695716b.ct.domain.com.br'
I want to remove the substrings that are after .br, .com and .net. The return must be:
email1 = 'giovanna.macedo#lojas100.com.br'
email2 = 'alvaro-neves#stockshop.com'
email3 = 'filiallojas123#filiallojas.net'
You can do that with the method String#[] with an argument that is a regular expression.
r = /.*?\.(?:rb|com|net|br)(?!\.br)/
'giovanna.macedo#lojas100.com.br-215000695716b.ct.domain.com.br'[r]
#=> "giovanna.macedo#lojas100.com.br"
'alvaro-neves#stockshop.com-215000695716b.ct.domain.com.br'[r]
#=> "alvaro-neves#stockshop.com"
'filiallojas123#filiallojas.net-215000695716b.ct.domain.com.br'[r]
#=> "filiallojas123#filiallojas.net"
The regular expression reads as follows: "Match zero or more characters non-greedily (?), follow by a period, followed by 'rb' or 'com' or 'net' or 'br', which is not followed by .br. (?!\.br) is a negative lookahead.
Alternatively the regular expression can be written in free-spacing mode to make it self-documenting:
r = /
.*? # match zero or more characters non-greedily
\. # match '.'
(?: # begin a non-capture group
rb # match 'rb'
| # or
com # match 'com'
| # or
net # match 'net'
| # or
br # match 'br'
) # end non-capture group
(?! # begin a negative lookahead
\.br # match '.br'
) # end negative lookahead
/x # invoke free-spacing regex definition mode
This should work for your scenario:
expr = /^(.+\.(?:br|com|net))-[^']+(')$/
str = "email = 'giovanna.macedo#lojas100.com.br-215000695716b.ct.domain.com.br'"
str.gsub(expr, '\1\2')
Use the String#delete_suffix Method
This was tested with Ruby 3.0.2. Your mileage may vary with other versions that don't support String#delete_suffix or its related bang method. Since you're trying to remove the exact same suffix from all your emails, you can simply invoke #delete_suffix! on each of your strings. For example:
common_suffix = "-215000695716b.ct.domain.com.br".freeze
emails = [email1, email2, email3]
emails.each { _1.delete_suffix! common_suffix }
You can then validate your results with:
emails
#=> ["giovanna.macedo#lojas100.com.br", "alvaro-neves#stockshop.com", "filiallojas123#filiallojas.net"]
email1
#=> "giovanna.macedo#lojas100.com.br"
email2
#=> "alvaro-neves#stockshop.com"
email3
#=> "filiallojas123#filiallojas.net"
You can see that the array has replaced each value, or you can call each of the array's variables individually if you want to check that the strings have actually been modified in place.
String Methods are Usually Faster, But Your Mileage May Vary
Since you're dealing with String objects instead of regular expressions, this solution is likely to be faster at scale, although I didn't bother to benchmark all solutions to compare. If you care about performance, you can measure larger samples using IRB's new measure command, it took only 0.000062s to process the strings this way on my system, and String methods generally work faster than regular expressions at large scales. You'll need to do more extensive benchmarking if performance is a core concern, though.
Making the Call Shorter
You can even make the call shorter if you want. I left it a bit verbose above so you could see what the intent was at each step, but you can trim this to a single one-liner with the following block:
# one method chain, just wrapped to prevent scrolling
[email1, email2, email3].
map { _1.delete_suffix! "-215000695716b.ct.domain.com.br" }
Caveats
You Need Fixed-String Suffixes
The main caveat here is that this solution will only work when you know the suffix (or set of suffixes) you want to remove. If you can't rely on the suffixes to be fixed, then you'll likely need to pursue a regex solution in one way or another, even if it's just to collect a set of suffixes.
Dealing with Frozen Strings
Another caveat is that if you've created your code with frozen string literals, you'll need to adjust your code to avoid attempting in-place changes to frozen strings. There's more than one way to do this, but a simple destructuring assignment is probably the easiest to follow given your small code sample. Consider the following:
# assume that the strings in email1 etc. are frozen, but the array
# itself is not; you can't change the strings in-place, but you can
# re-assign new strings to the same variables or the same array
emails = [email1, email2, email3]
email1, email2, email3 =
emails.map { _1.delete_suffix "-215000695716b.ct.domain.com.br" }
There are certainly other ways to work around frozen strings, but the point is that while the now-common use of the # frozen_string_literal: true magic comment can improve VM performance or memory usage in large programs, it isn't always the best option for string-mangling code. Just keep that in mind, as tools like RuboCop love to enforce frozen strings, and not everyone stops to consider the consequences of such generic advice to the given problem domain.
I would just use the chomp(string) method like so:
mask = "-215000695716b.ct.domain.com.br"
email1.chomp(mask)
#=> "giovanna.macedo#lojas100.com.br"
email2.chomp(mask)
#=> "alvaro-neves#stockshop.com"
email3.chomp(mask)
#=> "filiallojas123#filiallojas.net"

best way to find substring in ruby using regular expression

I have a string https://stackverflow.com. I want a new string that contains the domain from the given string using regular expressions.
Example:
x = "https://stackverflow.com"
newstring = "stackoverflow.com"
Example 2:
x = "https://www.stackverflow.com"
newstring = "www.stackoverflow.com"
"https://stackverflow.com"[/(?<=:\/\/).*/]
#⇒ "stackverflow.com"
(?<=..) is a positive lookbehind.
If string = "http://stackoverflow.com",
a really easy way is string.split("http://")[1]. But this isn't regex.
A regex solution would be as follows:
string.scan(/^http:\/\/(.+)$/).flatten.first
To explain:
String#scan returns the first match of the regex.
The regex:
^ matches beginning of line
http: matches those characters
\/\/ matches //
(.+) sets a "match group" containing any number of any characters. This is the value returned by the scan.
$ matches end of line
.flatten.first extracts the results from String#scan, which in this case returns a nested array.
You might want to try this:
#!/usr/bin/env ruby
str = "https://stackoverflow.com"
if mtch = str.match(/(?::\/\/)(/S)/)
f1 = mtch.captures
end
There are two capturing groups in the match method: the first one is a non-capturing group referring to your search pattern and the second one referring to everything else afterwards. After that, the captures method will assign the desired result to f1.
I hope this solves your problem.

Combine Regexp and set of values (hash/array) to compare if a string matches in ruby

I have the following pattern to check:
"MODEL_NAME"-"ID"."FORMAT_TYPE"
where, for example:
MODEL_NAME = [:product, :brand]
FORMAT_TYPE = [:jpg, :png]
First I wanted to check if the regexp is something like:
/^\w+-\d+.\w+$/
and I have also to check if the part of my string is part of my arrays. I want something more flexible than:
/^(product|brand)-\d+.(jpg|png)$/
which I could manage through my arrays. What is a good solution to do it?
/^(#{MODEL_NAME.join '|'})-\d+\.(#{FORMAT_TYPE.join '|'})$/
# => /^(product|brand)-\d+\.(jpg|png)$/

regular expression in ruby Regexp

I'm using ruby 1.9.2
string = "asufasu isaubfusabiu safbsua fbisaufb sa {{hello}} uasdhfa s asuibfisubibas {{manish}} erieroi"
Now I have to find {{anyword}}
How many times it will come and the name with curly braces.
After reading Regexp
I am using
/{{[a-z]}}/.match(string)
but it return nil everytime.
You need to apend a * to the [a-z] pattern to tell it to match any number of letters inside the {s, and then use scan to get all occurrences of the match in the string:
string.scan(/{{[a-z]*}}/)
=> ["{{hello}}", "{{manish}}"]
To get the number of times matches occur, just take the size of the resulting array:
string.scan(/{{[a-z]*}}/).size
=> 2
The regular expression matching web application Rubular can be an incredibly helpful tool for doing realtime regular expression parsing.

Regex to leave desired string remaining and others removed

In Ruby, what regex will strip out all but a desired string if present in the containing string? I know about /[^abc]/ for characters, but what about strings?
Say I have the string "group=4&type_ids[]=2&type_ids[]=7&saved=1" and want to retain the pattern group=\d, if it is present in the string using only a regex?
Currently, I am splitting on & and then doing a select with matching condition =~ /group=\d/ on the resulting enumerable collection. It works fine, but I'd like to know the regex to do this more directly.
Simply:
part = str[/group=\d+/]
If you want only the numbers, then:
group_str = str[/group=(\d+)/,1]
If you want only the numbers as an integer, then:
group_num = str[/group=(\d+)/,1].to_i
Warning: String#[] will return nil if no match occurs, and blindly calling nil.to_i always returns 0.
You can try:
$str =~ s/.*(group=\d+).*/\1/;
Typically I wouldn't really worry too much about a complex regex. Simply break the string down into smaller parts and it becomes easier:
asdf = "group=4&type_ids[]=2&type_ids[]=7&saved=1"
asdf.split('&').select{ |q| q['group'] } # => ["group=4"]
Otherwise, you can use regex a bunch of different ways. Here's two ways I tend to use:
asdf.scan(/group=\d+/) # => ["group=4"]
asdf[/(group=\d+)/, 1] # => "group=4"
Try:
str.match(/group=\d+/)[0]

Resources