writing ruby regular expression

writing ruby regular expression - ruby

I would like to know how I can write a Ruby regular expression that indicates it should start with alphanumeric characters followed by only alphanumeric and - (in any sequence) after that.
So to begin with alphanumeric I know it's:
/\A[A-Za-z0-9]/
How do I say I can only allow alphanumeric characters and - after this? I am new to Ruby and regular expressions. Any suggestions ?
Are there any links I can look into for learning about regular expressions and Ruby in much more depth? I found http://rubylearning.com/satishtalim/ruby_regular_expressions.html to be useful.

You already have the initial alphanumeric character class [A-Za-z0-9]. For the next characters, you just want to add - to this: [A-Za-z0-9-]. Hence the final regex is:
[A-Za-z0-9][A-Za-z0-9-]*
Note that X* means "X 0 or more times". If you want "X 1 or more times", use X+.

r1 = /^[A-Za-z0-9][A-Za-z0-9-]*$/ #=> /^[A-Za-z0-9][A-Za-z0-9-]*$/
r2 = /^[A-Za-z0-9-]*$/ #=> /[A-Za-z0-9-]*$/
str = "3birdsweresittingonawire-nowtherearebuttwo"
str =~ r1 #=> 0 (truey)
or
(str =~ r2) and str[0] != '-' #=> true
str = " % 3birdsweresittingonawire-nowtherearebuttwo"
str =~ r1 #=> nil
(str =~ r2) and str[0] != '-' #=> nil
The second example shows why you need the anchors ^ and $.

Related

ruby: Grab numbers only within quotes

I would like the following sub-string
1100110011110000
from
foo = "bar9-9 '11001100 11110000 A'A\n"
I have so far used the below, which yields
puts foo.split(',').map!(&:strip)[0].gsub(/\D/, '')
>> 991100110011110000
Getting rid of the 2 leading 9's is not too difficult in this scenario, but I would like a general solution which grabs numbers only within the ' ' single quotes

You can find the quoted part first with scan and then remove non-digits:
> results = "bar9-9 '11001100 11110000 A'A\n".scan(/'[^']*'/).map{|m| m.gsub(/\D/, '')}
# => ["1100110011110000"]
> results[0]
# => "1100110011110000"

The zeros and ones within the quoted string can be extracted using String#gsub with a regular expression, as opposed to methods that convert the string to an array of strings, modify the array and converted it back to a string. Here are three ways of doing that.
str ="bar9-9 '11001100 11110000 A'A\n"
#1: Extract the substring of interest and then remove characters other than zero and one
def extract(str)
str[str.index("'")+1, str.rindex("'")-1].gsub(/[^01]/,'')
end
extract str
#=> "1100110011110000"
#2 Use a flag to indicate when zeroes and ones are to be kept
def extract(str)
found = false
str.gsub(/./m) do |c|
found = !found if c == "'"
(found && (c =~ /[01]/)) ? c : ''
end
end
extract str
#=> "1100110011110000"
Here the regular expression requires the m modifier (to enable multiline mode) in order to convert the newline character to an empty string. (One could alternatively write str.chomp.gsub(/./)....)
Notice that this second method works when there are multiple single-quoted substrings.
extract "bar9-9 '11001100 11110000 A'A'10x1y'\n"
#=> "1100110011110000101"
#3 Use the flip-flop operator (variant of #2)
def extract(str)
str.gsub(/./m) do |c|
next '' if (c=="'") .. (c=="'")
c =~ /[01]/ ? c : ''
end
end
extract str
#=> "1100110011110000"
extract "bar9-9 '11001100 11110000 A'A'10x1y'\n"
#=> "1100110011110000101"

foo.slice(/'.*?'/).scan(/\d+/).join
#=> "1100110011110000"

Simple regex - ignoring certain characters

I'm trying to use the match method with an argument of a regex to select a valid phone number, by definition, any string with nine digits.
For example:
9347584987 is valid,
(456)322-3456 is valid,
(324)5688890 is valid.
But
(340)HelloWorld is NOT valid and
456748 is NOT valid.
So far, I'm able to use \d{9} to select the example string of 9 digit characters in a row, but I'm not sure how to specifically ignore any character, such as '-' or '(' or ')' in the middle of the sequence.
What kind of Regex could I use here?

Given:
nums=['9347584987','(456)322-3456','(324)5688890','(340)HelloWorld', '456748 is NOT valid']
You can split on a NON digit and rejoin to remove non digits:
> nums.map {|s| s.split(/\D/).join}
["9347584987", "4563223456", "3245688890", "340", "456748"]
Then filter on the length:
> nums.map {|s| s.split(/\D/).join}.select {|s| s.length==10}
["9347584987", "4563223456", "3245688890"]
Or, you can grab a group of numbers that look 'phony numbery' by using a regex to grab digits and common delimiters:
> nums.map {|s| s[/[\d\-()]+/]}
["9347584987", "(456)322-3456", "(324)5688890", "(340)", "456748"]
And then process that list as above.
That would delineate:
> '123 is NOT a valid area code for 456-7890'[/[\d\-()]+/]
=> "123" # no match
vs
> '123 is NOT a valid area code for 456-7890'.split(/\D/).join
=> "1234567890" # match

I suggest using one regular expression for each valid pattern rather than constructing a single regex. It would be easier to test and debug, and easier to maintain the code. If, for example, "123-456-7890" or 123-456-7890 x231" were in future deemed valid numbers, one need only add a single, simple regex for each to the array VALID_PATTERS below.
VALID_PATTERS = [/\A\d{10}\z/, /\A\(\d{3}\)\d{3}-\d{4}\z/, /\A\(\d{3}\)\d{7}\z/]
def valid?(str)
VALID_PATTERS.any? { |r| str.match?(r) }
end
ph_nbrs = %w| 9347584987 (456)322-3456 (324)5688890 (340)HelloWorld 456748 |
ph_nbrs.each { |s| puts "#{s.ljust(15)} \#=> #{valid?(s)}" }
9347584987 #=> true
(456)322-3456 #=> true
(324)5688890 #=> true
(340)HelloWorld #=> false
456748 #=> false
String#match? made its debut in Ruby v2.4. There are many alternatives, including str.match(r) and str =~ r.

"9347584987" =~ /(?:\d.*){9}/ #=> 0
"(456)322-3456" =~ /(?:\d.*){9}/ #=> 1
"(324)5688890" =~ /(?:\d.*){9}/ #=> 1
"(340)HelloWorld" =~ /(?:\d.*){9}/ #=> nil
"456748" =~ /(?:\d.*){9}/ #=> nil

Pattern: (Rubular Demo)
^\(?\d{3}\)?\d{3}-?\d{4}$ # this makes the expected symbols optional
This pattern will ensure that an opening ( at the start of the string is followed by 3 numbers the a closing ).
^(\(\d{3}\)|\d{3})\d{3}-?\d{4}$
On principle, though, I agree with melpomene in advising that you remove all non-digital characters, test for 9 character length, then store/handle the phone numbers in a single/reliable/basic format.

Ruby, looping through a string deleting groups of characters until a desired output is achieved

I have a coding problem I solved and want to refactor. I know there has to be a cleaner way of doing what I did.
The goal is to write a method that takes a string of "!" and "?" and reduces the string by eliminating all odd groupings of each symbol.
Example - a string "????!!!" would have an odd grouping of "!!!" because there are three in a row. These would be deleted from the string.
If there is only one "!" or "?" its left because it is not in a group.
Ex -
remove("!????!!!?") answer == "!"
# => ("!????!!!?" --> "!?????" --> "!")
In the first string, the only odd grouping is "!!!", once removed, it leaves a new string with an odd grouping "?????". You remove the next odd grouping so you're left with "!". This fits the desired output.
Another example
remove("!???!!") == ""
# => ("!???!!" --> "!!!" --> "")
Current code:
def remove(s)
arr = [s]
i = 0
until i == arr[0].length
s = s.chars.chunk{|c|c}.map{ |n,a| a.join }.select{|x| x if x.length.even? || x.length <= 1}.join
arr << s
i += 1
end
return arr[-1]
end
My code solves this problem and all test cases. I have a suspicion that my until loop can be removed/refactored so that I could solve this problem in one line and have spent hours trying to figure it out with no luck.

Suppose
str = "???!!!???!"
If we first remove the two groups "???" we are left with "!!!!", which cannot be reduced further.
If we first remove the group "!!!" we are left with "??????!", which cannot be reduced further.
If we are permitted to remove all odd groups of either character without reference to the effect that either has on the other, we obtain !, which cannot be reduced further.
It's not clear what rule is to be used. Here are three possibilities and code to implement each.
I will use the following two regular expressions, and in the first two cases a helper method.
Rq = /
(?<!\?) # do not match a question mark, negative lookbehind
\? # match a question mark
(\?{2})+ # match two question marks one or more times
(?!\?) # do not match a question mark, negative lookahead
/x # free-spacing regex definition mode
which is commonly written /(?<!\?)\?(\?{2})+(?!\?)/.
Similarly,
Rx = /(?<!!)!(!{2})+(?!!)/
def sequential(str, first_regex, second_regex)
s = str.dup
loop do
size = s.size
s = s.gsub(first_regex,'').gsub(second_regex,'')
return s if s.size == size
end
end
I apply each of the three methods below to two example strings:
str1 = "???!!!???!"
str2 = 50.times.map { ['?', '!'].sample }.join
#=> "?!!!?!!!?!??????!!!?!!??!!???!?!????!?!!!?!?!???!?"
Replace all odd groups of "?" then odd groups of "!" then repeat until no further removals are possible
def question_before_exclamation(str)
sequential(str, Rq, Rx)
end
question_before_exclamation str1 #=> "!!!!"
question_before_exclamation str2 #=> "??!??!?!!?!?!!?"
Replace all odd groups of "!" then odd groups of "?" then repeat until no further removals are possible
def exclamation_before_question(str)
sequential(str, Rx, Rq)
end
exclamation_before_question str1 #=> "??????!"
exclamation_before_question str2 #=> "??!????!!?!?!!?!?!!?"
Replace all odd groups of both "?" and "!" then repeat until no further removals are possible
Rqx = /#{Rq}|#{Rx}/
#=> /(?-mix:(?<!\?)\?(\?{2})+(?!\?))|(?-mix:(?<!!)!(!{2})+(?!!))/
def question_and_explanation(str)
s = str.dup
loop do
size = s.size
s = s.gsub(Rqx,'')
return s if s.size == size
end
end
question_and_explanation str1 #=> "!"
question_and_explanation str2 #=> "??!?!!?!?!!?!?!!?"

I don't know the exact Ruby syntax for this, but you could simplify your solution by using regular expressions:
Gather all matches of consecutive characters
if all matches are of even length or 1 exit
Test if matches are an odd length
if an odd length, replace with the empty string
else do nothing
Goto step 1
A solution in Perl would be:
#!perl
use strict;
use warnings;
use feature qw(say);
my $string = '!????!!!?';
sub reduce {
my ($s) = #_;
while ( my #matches = $s =~ m/((.)\2+)/g ) {
last if ! grep { length($_) > 1 && length($_) % 2 == 1 } #matches;
foreach my $match ( #matches ) {
$s =~ s/\Q$match// if length($match) > 1 && length($match) % 2 == 1;
}
}
return $s;
}
say reduce($string);

I could be wrong (this is ruby, after all) but I don't think you'll find a one-liner for this because ruby's utility functions generally aren't recursive. But you can use regex to simplify your logic, at the very least:
def remove(s)
while s =~ /(?<!\!)\!([\!]{2})+(?!\!)/ || s =~ /(?<!\?)\?([\?]{2})+(?!\?)/
s.gsub! /(?<!\!)\!([\!]{2})+(?!\!)/, "" # remove odd !
s.gsub! /(?<!\?)\?([\?]{2})+(?!\?)/, "" # remove odd ?
end
return s
end
To make the regex less mind-boggling, it helps to look at them with 'a' instead of '?' and '!':
/(?<!a)a([a]{2})+(?!a)/ #regex for 'a'
(?<!a) #negative lookbehind: the match cannot start with an 'a'
a([a]{2})+ #the match should be an 'a' followed by 1 or more pairs
(?!a) #negative lookahead: the match cannot end with an 'a'

It should be simple enough with a regular expression replacement
def remove(string)
begin
original = string
string.gsub!(/(\!{3,})|(\?{3,})/) { |s| s.length.even? ? s : '' }
end until original == string
string
end
puts remove("!????!!!?").inspect # answer == "!"
puts remove("!???!!").inspect # answer == ""
puts remove("!????!!").inspect # answer == "!????!!"

Regex too Capture certain words at start of string Ruby

Looking for help in writing a regex for capturing whether a particular string starts with certain strings and capture the start and remaining string. E.g
Let's say the possible starts of strings are 'P', 'RO', 'RPX' and the sample string is 'PIXR' or 'ROXP' or 'RPX'.
I am looking to write a regex which captures the start and trailing part of string if it starts with the given possible strings e.g
'PIXRT' =~ // outputs 'P' and 'IXRT'
Not very conversant with regexes so any help is really appreciated.

You may use a regex with 2 capturing groups, one capturing the known values at the start and the rest will capture the rest of the string:
rx = /\A(RPX|RO|P)(.*)/m
"PIXRT".scan(rx)
# => [P, IXRT]
See the Ruby demo
Details:
\A - start of string
(RPX|RO|P) - one of the values that must be at the start of the string (mind the order of these alternatives: the longer ones come first!)
(.*) - any 0+ chars up to the end of the string (m modifier will make . match line breaks, too).

def split_after_start_string(str, *start_strings)
a = str.split(/(?<=\A#{start_strings.join('|')})/)
if a.size == 2
a
elsif start_strings.include?(str)
a << ''
else
nil
end
end
start_strings = %w| P RO RPX | #=> ["P", "RO", "RPX"]
split_after_start_string('PIXR', *start_strings) #=> ["P", "IXR"]
split_after_start_string('IPXR', *start_strings) #=> nil
split_after_start_string('ROXP', *start_strings) #=> ["RO", "XP"]
split_after_start_string('RPX', *start_strings) #=> ["RPX", ""]
The regex reads, "match one element of start_stringx at the beginning of the string in a positive lookbehind". For smart_strings in the examples, the regex is:
/(?<=\A#{start_strings.join('|')})/ #=> /(?<=\AP|RO|RPX)/

Finding the first duplicate character in the string Ruby

I am trying to call the first duplicate character in my string in Ruby.
I have defined an input string using gets.
How do I call the first duplicate character in the string?
This is my code so far.
string = "#{gets}"
print string
How do I call a character from this string?
Edit 1:
This is the code I have now where my output is coming out to me No duplicates 26 times. I think my if statement is wrongly written.
string "abcade"
puts string
for i in ('a'..'z')
if string =~ /(.)\1/
puts string.chars.group_by{|c| c}.find{|el| el[1].size >1}[0]
else
puts "no duplicates"
end
end
My second puts statement works but with the for and if loops, it returns no duplicates 26 times whatever the string is.

The following returns the index of the first duplicate character:
the_string =~ /(.)\1/
Example:
'1234556' =~ /(.)\1/
=> 4
To get the duplicate character itself, use $1:
$1
=> "5"
Example usage in an if statement:
if my_string =~ /(.)\1/
# found duplicate; potentially do something with $1
else
# there is no match
end

s.chars.map { |c| [c, s.count(c)] }.drop_while{|i| i[1] <= 1}.first[0]
With the refined form from Cary Swoveland :
s.each_char.find { |c| s.count(c) > 1 }

Below method might be useful to find the first word in a string
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end

I believe the question can be interpreted in either of two ways (neither involving the first pair of adjacent characters that are the same) and offer solutions to each.
Find the first character in the string that is preceded by the same character
I don't believe we can use a regex for this (but would love to be proved wrong). I would use the method suggested in a comment by #DaveNewton:
require 'set'
def first_repeat_char(str)
str.each_char.with_object(Set.new) { |c,s| return c unless s.add?(c) }
nil
end
first_repeat_char("abcdebf") #=> b
first_repeat_char("abcdcbe") #=> c
first_repeat_char("abcdefg") #=> nil
Find the first character in the string that appears more than once
r = /
(.) # match any character in capture group #1
.* # match any character zero of more times
? # do the preceding lazily
\K # forget everything matched so far
\1 # match the contents of capture group 1
/x
"abcdebf"[r] #=> b
"abccdeb"[r] #=> b
"abcdefg"[r] #=> nil
This regex is fine, but produces the warning, "regular expression has redundant nested repeat operator '*'". You can disregard the warning or suppress it by doing something clunky, like:
r = /([^#{0.chr}]).*?\K\1/
where ([^#{0.chr}]) means "match any character other than 0.chr in capture group 1".
Note that a positive lookbehind cannot be used here, as they cannot contain variable-length matches (i.e., .*).

You could probably make your string an array and use detect. This should return the first char where the count is > 1.
string.split("").detect {|x| string.count(x) > 1}

I'll use positive lookahead with String#[] method :
"abcccddde"[/(.)(?=\1)/] #=> c

As a variant:
str = "abcdeff"
p str.chars.group_by{|c| c}.find{|el| el[1].size > 1}[0]
prints "f"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

writing ruby regular expression - ruby

You already have the initial alphanumeric character class [A-Za-z0-9]. For the next characters, you just want to add - to this: [A-Za-z0-9-]. Hence the final regex is: [A-Za-z0-9][A-Za-z0-9-]* Note that X* means "X 0 or more times". If you want "X 1 or more times", use X+.

Related

ruby: Grab numbers only within quotes

Simple regex - ignoring certain characters

Ruby, looping through a string deleting groups of characters until a desired output is achieved

Regex too Capture certain words at start of string Ruby

Finding the first duplicate character in the string Ruby

Categories

Resources