Simple regex - ignoring certain characters - ruby

I'm trying to use the match method with an argument of a regex to select a valid phone number, by definition, any string with nine digits.
For example:
9347584987 is valid,
(456)322-3456 is valid,
(324)5688890 is valid.
But
(340)HelloWorld is NOT valid and
456748 is NOT valid.
So far, I'm able to use \d{9} to select the example string of 9 digit characters in a row, but I'm not sure how to specifically ignore any character, such as '-' or '(' or ')' in the middle of the sequence.
What kind of Regex could I use here?

Given:
nums=['9347584987','(456)322-3456','(324)5688890','(340)HelloWorld', '456748 is NOT valid']
You can split on a NON digit and rejoin to remove non digits:
> nums.map {|s| s.split(/\D/).join}
["9347584987", "4563223456", "3245688890", "340", "456748"]
Then filter on the length:
> nums.map {|s| s.split(/\D/).join}.select {|s| s.length==10}
["9347584987", "4563223456", "3245688890"]
Or, you can grab a group of numbers that look 'phony numbery' by using a regex to grab digits and common delimiters:
> nums.map {|s| s[/[\d\-()]+/]}
["9347584987", "(456)322-3456", "(324)5688890", "(340)", "456748"]
And then process that list as above.
That would delineate:
> '123 is NOT a valid area code for 456-7890'[/[\d\-()]+/]
=> "123" # no match
vs
> '123 is NOT a valid area code for 456-7890'.split(/\D/).join
=> "1234567890" # match

I suggest using one regular expression for each valid pattern rather than constructing a single regex. It would be easier to test and debug, and easier to maintain the code. If, for example, "123-456-7890" or 123-456-7890 x231" were in future deemed valid numbers, one need only add a single, simple regex for each to the array VALID_PATTERS below.
VALID_PATTERS = [/\A\d{10}\z/, /\A\(\d{3}\)\d{3}-\d{4}\z/, /\A\(\d{3}\)\d{7}\z/]
def valid?(str)
VALID_PATTERS.any? { |r| str.match?(r) }
end
ph_nbrs = %w| 9347584987 (456)322-3456 (324)5688890 (340)HelloWorld 456748 |
ph_nbrs.each { |s| puts "#{s.ljust(15)} \#=> #{valid?(s)}" }
9347584987 #=> true
(456)322-3456 #=> true
(324)5688890 #=> true
(340)HelloWorld #=> false
456748 #=> false
String#match? made its debut in Ruby v2.4. There are many alternatives, including str.match(r) and str =~ r.

"9347584987" =~ /(?:\d.*){9}/ #=> 0
"(456)322-3456" =~ /(?:\d.*){9}/ #=> 1
"(324)5688890" =~ /(?:\d.*){9}/ #=> 1
"(340)HelloWorld" =~ /(?:\d.*){9}/ #=> nil
"456748" =~ /(?:\d.*){9}/ #=> nil

Pattern: (Rubular Demo)
^\(?\d{3}\)?\d{3}-?\d{4}$ # this makes the expected symbols optional
This pattern will ensure that an opening ( at the start of the string is followed by 3 numbers the a closing ).
^(\(\d{3}\)|\d{3})\d{3}-?\d{4}$
On principle, though, I agree with melpomene in advising that you remove all non-digital characters, test for 9 character length, then store/handle the phone numbers in a single/reliable/basic format.

Related

How to return the whole array instead of a single string

I am trying to return all words which have more than four letters in the below exercise.
def timed_reading(max_length, text)
var_b = text.split(" ")
var_b.map do |i|
if i.length >= max_length
return i
end
end
end
print timed_reading(4,"The Fox asked the stork, 'How is the soup?'")
# >> asked
I seem to get only one word.
If you want to filter a list and select only certain kinds of entries, use the select method:
var_b.select do |i|
i.length >= max_length
end
Where that's all you need.
The return i in the middle is confusing things, as that breaks out of the loop and returns a single value from the method itself. Remember that in Ruby, unlike others such as JavaScript, return is often implied and doesn't need to be spelled out explicitly.
Blocks don't normally have return in them for this reason unless they need to interrupt the flow and break out of the method itself.
You don't need to first extract all words from the string and then select those having at least four letters. Instead you can just extract the desired words using String#scan with a regular expression.
str = "The Fox asked the stork, 'How is the soup?'? Très bon?"
str.scan /\p{Alpha}{4,}/
#=> ["asked", "stork", "soup", "Très"]
The regular expression reads, "Match strings containing 4 or more letters". I've used \p{Alpha} (same as \p{L} and [[:alpha:]]) to match unicode letters. (These are documented in Regexp. Search for these expressions there.) You could replace \p{Alpha} with [a-zA-Z], but in that case "Très" would not be matched.
If you wish to also match digits, use \p{Alnum} or [[:alnum:]] instead. While \w also matches letters (English only) and digits, it also matches underscores, which you probably don't want in this situation.
Punctuation can be a problem when words are extracted from the string by splitting on whitespace.
arr = "That is a cow.".split
#=> ["That", "is", "a", "cow."]
arr.select { |word| word.size >= 4 }
#=> ["That", "cow."]
but "cow" has only three letters. If you instead used String#scan to extract words from the string you obtain the desired result.
arr = "That is a cow?".scan /\p{Alpha}+/
#=> ["That", "is", "a", "cow"]
arr.select { |word| word.size >= 4 }
#=> ["That"]
However, if you use scan you may as well use a regular expression to retrieve only words having at least 4 characters, and skip the extra step.

Regex pattern to see if a string contains a number in a range

I'm trying to match a number between 400 and 499 anywhere in a string. For example, both:
string = "[401] One too three"
string2 = "Yes 450 sir okay"
should match. Both:
string3 = "[123] Test"
string4 = "This is another string"
should fail.
What's the best way to write the regex? I wrote:
string =~ /\d{3}/
to see if the string contains a three digit integer. How can I see if that's within the range?
If you don't actually need the number afterwords, and just need to determine yes or no the string contains a number in the range of 400-499, you can:
Check that you are at the beginning of a line, or have a non-digit character followed by
the digit '4' followed by
Any 2 digits followed by
the end of a line or a non-digit character
so you end up with a regex looking something like
regex = /(?:^|\D)4\d{2}(?:\D|$)/
or, by using negative look ahead/look behinds:
regex = /(?<!\d)4\d{2}(?!\d)/
and you need step 1 and 4 above to make rule out numbers such as 1400-1499 and 4000-4999 (and other such numbers with more than 3 digits that have 400-499 somewhere buried in them). You can then make use of String#match? in newer ruby versions to get just a simple boolean:
string.match?(regex) # => true
string2.match?(regex) # => true
string3.match?(regex) # => false
string4.match?(regex) # => false
"1400".match?(regex) # => false
"400".match?(regex) # => true
"4000".match?(regex) # => false
"[1400]".match?(regex) # => false
"[400]".match?(regex) # => true
"[4000]".match?(regex) # => false
Fairly simple regex, no need to pull out the match and convert it to an integer if you just need a simple yes or no
def doit(str, rng)
str.gsub(/-?\d+/).find { |s| rng.cover?(s.to_i) }
end
doit "[401] One too three", 400..499 #=> "401"
doit "Yes 450 sir okay", 400..499 #=> "450"
doit "Yes -450 sir okay", -499..400 #=> "-450"
doit "[123] Test", 400..499 #=> nil
doit "This is another string", 400..499 #=> nil
Recall that String#gsub returns an enumerator, when used with a single argument and no block. The enumerator merely generates matches and performs no substitutions. I've found a number of situations, as here, where this form of the method can be used to advantage.
if str may contain the representations of multiple integers within the specified range, and all such are desired, simply replace Enumerable#find with Enumerable#select:
"401, 532 and -126".gsub(/-?\d+/).select { |s| (-127..451).cover?(s.to_i) }
#=> ["401", "-126"]
I recommend using a general regex to first extract the number from each line. Then, use a regular script to check the range:
s = "[404] Yes sir okay"
data = s.match(/\[(\d+)\]/)
data.captures
num = data[1].to_i
if (num >= 400 && num < 500)
print "Match"
else
print "No Match"
end
Demo
The pattern I wrote should actually work to match any number in brackets, anywhere in the string.
Extract the digits with a regex, convert the capture group to integer and ask Ruby if they're between your bounds:
s = "[499] One too three"
lo = 400
hi = 499
puts s =~ /(\d{3})/ && $1.to_i.between?(lo, hi)

Regex too Capture certain words at start of string Ruby

Looking for help in writing a regex for capturing whether a particular string starts with certain strings and capture the start and remaining string. E.g
Let's say the possible starts of strings are 'P', 'RO', 'RPX' and the sample string is 'PIXR' or 'ROXP' or 'RPX'.
I am looking to write a regex which captures the start and trailing part of string if it starts with the given possible strings e.g
'PIXRT' =~ // outputs 'P' and 'IXRT'
Not very conversant with regexes so any help is really appreciated.
You may use a regex with 2 capturing groups, one capturing the known values at the start and the rest will capture the rest of the string:
rx = /\A(RPX|RO|P)(.*)/m
"PIXRT".scan(rx)
# => [P, IXRT]
See the Ruby demo
Details:
\A - start of string
(RPX|RO|P) - one of the values that must be at the start of the string (mind the order of these alternatives: the longer ones come first!)
(.*) - any 0+ chars up to the end of the string (m modifier will make . match line breaks, too).
def split_after_start_string(str, *start_strings)
a = str.split(/(?<=\A#{start_strings.join('|')})/)
if a.size == 2
a
elsif start_strings.include?(str)
a << ''
else
nil
end
end
start_strings = %w| P RO RPX | #=> ["P", "RO", "RPX"]
split_after_start_string('PIXR', *start_strings) #=> ["P", "IXR"]
split_after_start_string('IPXR', *start_strings) #=> nil
split_after_start_string('ROXP', *start_strings) #=> ["RO", "XP"]
split_after_start_string('RPX', *start_strings) #=> ["RPX", ""]
The regex reads, "match one element of start_stringx at the beginning of the string in a positive lookbehind". For smart_strings in the examples, the regex is:
/(?<=\A#{start_strings.join('|')})/ #=> /(?<=\AP|RO|RPX)/

Finding the first duplicate character in the string Ruby

I am trying to call the first duplicate character in my string in Ruby.
I have defined an input string using gets.
How do I call the first duplicate character in the string?
This is my code so far.
string = "#{gets}"
print string
How do I call a character from this string?
Edit 1:
This is the code I have now where my output is coming out to me No duplicates 26 times. I think my if statement is wrongly written.
string "abcade"
puts string
for i in ('a'..'z')
if string =~ /(.)\1/
puts string.chars.group_by{|c| c}.find{|el| el[1].size >1}[0]
else
puts "no duplicates"
end
end
My second puts statement works but with the for and if loops, it returns no duplicates 26 times whatever the string is.
The following returns the index of the first duplicate character:
the_string =~ /(.)\1/
Example:
'1234556' =~ /(.)\1/
=> 4
To get the duplicate character itself, use $1:
$1
=> "5"
Example usage in an if statement:
if my_string =~ /(.)\1/
# found duplicate; potentially do something with $1
else
# there is no match
end
s.chars.map { |c| [c, s.count(c)] }.drop_while{|i| i[1] <= 1}.first[0]
With the refined form from Cary Swoveland :
s.each_char.find { |c| s.count(c) > 1 }
Below method might be useful to find the first word in a string
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
I believe the question can be interpreted in either of two ways (neither involving the first pair of adjacent characters that are the same) and offer solutions to each.
Find the first character in the string that is preceded by the same character
I don't believe we can use a regex for this (but would love to be proved wrong). I would use the method suggested in a comment by #DaveNewton:
require 'set'
def first_repeat_char(str)
str.each_char.with_object(Set.new) { |c,s| return c unless s.add?(c) }
nil
end
first_repeat_char("abcdebf") #=> b
first_repeat_char("abcdcbe") #=> c
first_repeat_char("abcdefg") #=> nil
Find the first character in the string that appears more than once
r = /
(.) # match any character in capture group #1
.* # match any character zero of more times
? # do the preceding lazily
\K # forget everything matched so far
\1 # match the contents of capture group 1
/x
"abcdebf"[r] #=> b
"abccdeb"[r] #=> b
"abcdefg"[r] #=> nil
This regex is fine, but produces the warning, "regular expression has redundant nested repeat operator '*'". You can disregard the warning or suppress it by doing something clunky, like:
r = /([^#{0.chr}]).*?\K\1/
where ([^#{0.chr}]) means "match any character other than 0.chr in capture group 1".
Note that a positive lookbehind cannot be used here, as they cannot contain variable-length matches (i.e., .*).
You could probably make your string an array and use detect. This should return the first char where the count is > 1.
string.split("").detect {|x| string.count(x) > 1}
I'll use positive lookahead with String#[] method :
"abcccddde"[/(.)(?=\1)/] #=> c
As a variant:
str = "abcdeff"
p str.chars.group_by{|c| c}.find{|el| el[1].size > 1}[0]
prints "f"

How to find string which is started with ".."?

I was trying to find strings out which is followed by only "..",but couldn't get that :
["..ab","...cc","..ps","....kkls"].each do |x|
puts x if /../.match(x)
end
..ab
...cc
..ps
....kkls
=> ["..ab", "...cc", "..ps", "....kkls"]
["..ab","...cc","..ps","....kkls"].each do |x|
puts x if /(.)(.)/.match(x)
end
..ab
...cc
..ps
....kkls
=> ["..ab", "...cc", "..ps", "....kkls"]
Expected output:
["..ab","..ps"]
What you want is
/^\.\.(?!\.)/
The caret ^ at the beginning means match the beginning of the string; periods must be escaped by a backslash as \. because in regular expressions a plain period . matches any character; the (?!\.) is a negative look-ahead meaning the next character is not a period. So the expression means, "at the beginning of the string, match two periods, which must be followed by a character which is not a period."
>> /^\.\.(?!\.)/.match "..ab"
=> #<MatchData "..">
>> /^\.\.(?!\.)/.match "...cc"
=> nil
Try selecting on /^\.\.[^\.]/ (starts with two dots and then not a dot).
ss = ["..ab","...cc","..ps","....kkls"]
ss.select { |x| x =~ /^\.\.[^\.]/ } # => ["..ab", "..ps"]
Try using /^\.{2}\w/ as the regular expression.
A quick explanation:
^ means the start of the string. Without this, it can match dots that are found in the middle of the string.
\. translates to . -- if you use the dot on its own, it will match any non-newline character
{2} means that you're looking for two of the dots. (you could rewrite /\.{2}/ as /\.\./)
Finally, the \w matches any word character (letter, number, underscore).
A really good place to test Ruby regular expressions is http://rubular.com/ -- it lets you play with the regex and test it right in your browser.
You don't need regex for this at all, you can just extract the appropriate leading chunks using String#[] or String#slice and do simple string comparisons:
>> a = ["..ab", "...cc", "..ps", "....kkls", ".", "..", "..."]
>> a.select { |s| s[0, 2] == '..' && s[0, 3] != '...' }
=> ["..ab", "..ps", ".."]
Maybe this:
["..ab","...cc","..ps","....kkls"].each {|x| puts x if /^\.{2}\w/.match(x) }
Or if you want to make sure the . doesn't match:
["..ab","...cc","..ps","....kkls"].each {|x| puts x if /^\.{2}[^\.]/.match(x) }

Resources