Use regular expression to fetch 3 groups from string - ruby

This is my expected result.
Input a string and get three returned string.
I have no idea how to finish it with Regex in Ruby.
this is my roughly idea.
match(/(.*?)(_)(.*?)(\d+)/)
Input and expected output
# "R224_OO2003" => R224, OO, 2003
# "R2241_OOP2003" => R2244, OOP, 2003

If the example description I gave in my comment on the question is correct, you need a very straightforward regex:
r = /(.+)_(.+)(\d{4})/
Then:
"R224_OO2003".scan(r).flatten #=> ["R224", "OO", "2003"]
"R2241_OOP2003".scan(r).flatten #=> ["R2241", "OOP", "2003"]

Assuming that your three parts consist of (R and one or more digits), then an underbar, then (one or more non-whitespace characters), before finally (a 4-digit numeric date), then your regex could be something like this:
^(R\d+)_(\S+)(\d{4})$
The ^ indicates start of string, and the $ indicates end of string. \d+ indicates one or more digits, while \S+ says one or more non-whitespace characters. The \d{4} says exactly four digits.
To recover data from the matches, you could either use the pre-defined globals that line up with your groups, or you could could use named captures.
To use the match globals just use $1, $2, and $3. In general, you can figure out the number to use by counting the left parentheses of the specific group.
To use the named captures, include ? right after the left paren of a particular group. For example:
x = "R2241_OOP2003"
match_data = /^(?<first>R\d+)_(?<second>\S+)(?<third>\d{4})$/.match(x)
puts match_data['first'], match_data['second'], match_data['third']
yields
R2241
OOP
2003
as expected.

As long as your pattern covers all possibilities, then you just need to use the match object to return the 3 strings:
my_match = "R224_OO2003".match(/(.*?)(_)(.*?)(\d+)/)
#=> #<MatchData "R224_OO2003" 1:"R224" 2:"_" 3:"OO" 4:"2003">
puts my_match[0] #=> "R224_OO2003"
puts my_match[1] #=> "R224"
puts my_match[2] #=> "_"
puts my_match[3] #=> "00"
puts my_match[4] #=> "2003"
A MatchData object contains an array of each match group starting at index [1]. As you can see, index [0] returns the entire string. If you don't want the capture the "_" you can leave it's parentheses out.
Also, I'm not sure you are getting what you want with the part:
(.*?)
this basically says one or more of any single character followed by zero or one of any single character.

Related

Convert string to camel case in Ruby

Working on a Ruby challenge to convert dash/underscore delimited words into camel casing. The first word within the output should be capitalized only if the original word was capitalized (known as Upper Camel Case).
My solution so far..:
def to_camel_case(str)
str.split('_,-').collect.camelize(:lower).join
end
However .camelize(:lower) is a rails method I believe and doesn't work with Ruby. Is there an alternative method, equally as simplistic? I can't seem to find one. Or do I need to approach the challenge from a completely different angle?
main.rb:4:in `to_camel_case': undefined method `camelize' for #<Enumerator: []:collect> (NoMethodError)
from main.rb:7:in `<main>'
I assume that:
Each "word" is made up of one or more "parts".
Each part is made of up characters other than spaces, hypens and underscores.
The first character of each part is a letter.
Each successive pair of parts is separated by a hyphen or underscore.
It is desired to return a string obtained by modifying each part and removing the hypen or underscore that separates each successive pair of parts.
For each part all letters but the first are to be converted to lowercase.
All characters in each part of a word that are not letters are to remain unchanged.
The first letter of the first part is to remain unchanged.
The first letter of each part other than the first is to be capitalized (if not already capitalized).
Words are separated by spaces.
It this describes the problem correctly the following method could be used.
R = /(?:(?<=^| )|[_-])[A-Za-z][^ _-]*/
def to_camel_case(str)
str.gsub(R) do |s|
c1 = s[0]
case c1
when /[A-Za-z]/
c1 + s[1..-1].downcase
else
s[1].upcase + s[2..-1].downcase
end
end
end
to_camel_case "Little Miss-muffet sat_on_HE$R Tuffett eating-her_cURDS And_whey"
# => "Little MissMuffet satOnHe$r Tuffett eatingHerCurds AndWhey"
The regular expression is can be written in free-spacing mode to make it self-documenting.
R = /
(?: # begin non-capture group
(?<=^| ) # use a positive lookbehind to assert that the next character
# is preceded by the beginning of the string or a space
| # or
[_-] # match '_' or '-'
) # end non-capture group
[A-Za-z] # match a letter
[^ _-]* # match 0+ characters other than ' ', '_' and '-'
/x # free-spacing regex definition mode
Most Rails methods can be added into basic Ruby projects without having to pull in the whole Rails source.
The trick is to figure out the minimum amount of files to require in order to define the method you need. If we go to APIDock, we can see that camelize is defined in active_support/inflector/methods.rb.
Therefore active_support/inflector seems like a good candidate to try. Let's test it:
irb(main)> require 'active_support/inflector'
=> true
irb(main)> 'foo_bar'.camelize
=> "FooBar"
Seems to work. Note that this assumes you already ran gem install activesupport earlier. If not, then do it first (or add it to your Gemfile).
In pure Ruby, no Rails, given str = 'my-var_name' you could do:
delimiters = Regexp.union(['-', '_'])
str.split(delimiters).then { |first, *rest| [first, rest.map(&:capitalize)].join }
#=> "myVarName"
Where str = 'My-var_name' the result is "MyVarName", since the first element of the splitting result is untouched, while the rest is mapped to be capitalized.
It works only with "dash/underscore delimited words", no spaces, or you need to split by spaces, then map with the presented method.
This method is using string splitting by delimiters, as explained here Split string by multiple delimiters,
chained with Object#then.

Positive Lookahead and Non-capturing group difference

When you want to match either of two patterns but not capture it, you would use a noncapturing group ?::
/(?:https?|ftp)://(.+)/
But what if I want to capture '_1' in the string 'john_1'. It could be '2' or '' followed by anything else. First I tried a non-capturing group:
'john_1'.gsub(/(?:.+)(_.+)/, "")
=> ""
It does not work. I am telling it to not capture one or more characters but to capture _ and all characters after it.
Instead the following works:
'john_1'.gsub(/(?=.+)(_.+)/, "")
=> "john"
I used a positive lookahead. The definition I found for positive lookahead was as follows:
q(?=u) matches a q that is
followed by a u, without making the u part of the match. The positive
lookahead construct is a pair of parentheses, with the opening
parenthesis followed by a question mark and an equals sign.
But that definition doesn't really fit my example. What makes the Positive Lookahead work but not the Non-capturing group work in the example I provide?
Capturing and matching are two different things. (?:expr) doesn't capture expr, but it's still included in the matched string. Zero-width assertions, e.g. (?=expr), don't capture or include expr in the matched string.
Perhaps some examples will help illustrate the difference:
> "abcdef"[/abc(def)/] # => abcdef
> $1 # => def
> "abcdef"[/abc(?:def)/] # => abcdef
> $1 # => nil
> "abcdef"[/abc(?=def)/] # => abc
> $1 # => nil
When you use a non-capturing group in your String#gsub call, it's still part of the match, and gets replaced by the replacement string.
Your first example doesn't work because a non-capturing group is still part of the overall capture, whereas the lookbehind is only used for matching but isn't part of the overall capture.
This is easier to understand if you get the actual match data:
# Non-capturing group
/(?:.+)(_.+)/.match 'john_1'
=> #<MatchData "john_1" 1:"_1">
# Positive Lookbehind
/(?=.+)(_.+)/.match 'john_1'
=> #<MatchData "_1" 1:"_1">
EDIT: I should also mention that sub and gsub work on the entire capture, not individual capture groups (although those can be used in the replacement).
'john_1'.gsub(/(?:.+)(_.+)/, 'phil\1')
=> "phil_1"
Let's consider a couple of situations.
The string preceding the underscore must be "john" and the underscore is followed by one or more characters
str = "john_1"
You have two choices.
Use a positive lookbehind
str[/(?<=john)_.+/]
#=> "_1"
The positive lookbehind requires that "john" must appear immediately before the underscore, but it is not part of the match that is returned.
Use a capture group:
str[/john(_.+)/, 1]
#=> "_1"
This regular expression matches "john_1", but "_.+" is captured in capture group 1. By examining the doc for the method String#[] you will see that one form of the method is str[regexp, capture], which returns the contents of the capture group capture. Here capture equals 1, meaning the first capture group.
Note that the string following the underscore may contain underscores: "john_1_a"[/(?<=john)_.+/] #=> "_1_a".
If the underscore can be at the end of the string replace + with * in the above regular expressions (meaning match zero or more characters after the underscore).
The string preceding the underscore can be anything and and the underscore is followed by one or more characters
str = "john_mary_tom_julie"
We may consider two cases.
The string returned is to begin with the first underscore
In this case we could write:
str[/_.+/]
#=> "_mary_tom_julie"
This works because the regex is by default greedy, meaning it will begin at the first underscore encountered.
The string returned is to begin with the last underscore
Here we could write:
str[/_[^_]+\z/]
#=> "_julie"
This regex matches an underscore followed by one or more characters that are not underscores, followed by the end-of-string anchor (\z).
Aside: the method String#[]
[] may seem an odd name for a method but it is a method nevertheless, so it can be invoked in the conventional way:
str.[](/john(_.+)/, 1)
#=> "_1"
The expression str[/john(_.+)/, 1] is an example (of which there are many in Ruby) of syntactic sugar. When written str[...] Ruby converts it to the conventional expression for methods before evaluating it.

How do I specify in Ruby that I want to match a character provided that a sequence following that character does not match a pattern?

I'm using Ruby on Rails 5.1. In Ruby, how do I say taht I want to match a string if the first character matches something but the sequence that follows does NOT match a pattern? That is, I want to match a number provided that the sequence taht follows is not a character from an array I have followed by two other numbers. Here's my character array ...
2.4.0 :010 > TOKENS
=> [":", ".", "'"]
So this string would NOT match
3:00
since ":00" matches the pattern of a character from my array followed by two numbers. But this string
3400
would match. This string would also match
3:0
and this would match
3
since nothing follows the above. How do I write the appropriate regex in Ruby?
string =~ /\A\d+(?!:\d{2})/
This regular expression means:
\A anchors the match to the start of the string.
\d+ means "one or more digits".
(?!...) is a negative look-ahead. It checks that the pattern contained in the brackets does not match., looking ahead from the current position.
:\d{2} means : followed by two digits.
Consideration should be given to testing the first character and the remaining characters separately.
def match_it?(str, first_char_regex, no_match_regex)
str[0].match?(first_char_regex) && !str[1..-1].match?(no_match_regex)
end
match_it?("0:00", /0/, /\A[:. ]cat\z/) #=> true
match_it?("0:00", /\d/, /\A[:. ]\d+\z/) #=> false
match_it?("0:00", /[[:alpha:]]/, /\A[:. ]\d+\z/) #=> false
I believe this reads well and it simplifies testing when compared to methods that employ a single regular expression.

Regex to grab full firstname and first letter of last name

I have a list of users grabbed by the Etc Ruby library:
Thomas_J_Perkins
Jennifer_Scanner
Amanda_K_Loso
Aaron_Cole
Mark_L_Lamb
What I need to do is grab the full first name, skip the middle name (if given), and grab the first character of the last name. The output should look like this:
Thomas P
Jennifer S
Amanda L
Aaron C
Mark L
I'm not sure how to do this, I've tried grabbing all of the characters: /\w+/ but that will grab everything.
You don't always need regular expressions.
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. Jamie Zawinski
You can do it with some simple Ruby code
string = "Mark_L_Lamb"
string.split('_').first + ' ' + string.split('_').last[0]
=> "Mark L"
I think its simpler without regex:
array = "Thomas_J_Perkins".split("_") # split at _
array.first + " " + array.last[0] # .first prints first name .last[0] prints first char of last name
#=> "Thomas P"
You can use
^([^\W_]+)(?:_[^\W_]+)*_([^\W_])[^\W_]*$
And replace with \1_\2. See the regex demo
The [^\W_] matches a letter or a digit. If you want to only match letters, replace [^\W_] with \p{L}.
^(\p{L}+)(?:_\p{L}+)*_(\p{L})\p{L}*$
See updated demo
The point is to match and capture the first chunk of letters up to the first _ (with (\p{L}+)), then match 0+ sequences of _ + letters inside (with (?:_\p{L}+)*_) and then match and capture the last word first letter (with (\p{L})) and then match the rest of the string (with \p{L}*).
NOTE: replace ^ with \A and $ with \z if you have independent strings (as in Ruby ^ matches the start of a line and $ matches the end of the line).
Ruby code:
s.sub(/^(\p{L}+)(?:_\p{L}+)*_(\p{L})\p{L}*$/, "\\1_\\2")
I'm in the don't-use-a-regex-for-this camp.
str1 = "Alexander_Graham_Bell"
str2 = "Sylvester_Grisby"
"#{str1[0...str1.index('_')]} #{str1[str1.rindex('_')+1]}"
#=> "Alexander B"
"#{str2[0...str2.index('_')]} #{str2[str2.rindex('_')+1]}"
#=> "Sylvester G"
or
first, last = str1.split(/_.+_|_/)
#=> ["Alexander", "Bell"]
first+' '+last[0]
#=> "Alexander B"
first, last = str2.split(/_.+_|_/)
#=> ["Sylvester", "Grisby"]
first+' '+last[0]
#=> "Sylvester G"
but if you insist...
r = /
(.+?) # match any characters non-greedily in capture group 1
(?=_) # match an underscore in a positive lookahead
(?:.*) # match any characters greedily in a non-capture group
(?:_) # match an underscore in a non-capture group
(.) # match any character in capture group 2
/x # free-spacing regex definition mode
str1 =~ r
$1+' '+$2
#=> "Alexander B"
str2 =~ r
$1+' '+$2
#=> "Sylvester G"
You can of course write
r = /(.+?)(?=_)(?:.*)(?:_)(.)/
This is my attempt:
/([a-zA-Z]+)_([a-zA-Z]+_)?([a-zA-Z])/
See demo
Let's see if this works:
/^([^_]+)(?:_\w)?_(\w)/
And then you'll have to combine the first and second matches into the format you want. I don't know Ruby, so I can't help you there.
And another attempt using a replacement method:
result = subject.gsub(/^([^_]+)(?:_[^_])?_([^_])[^_]+$/, '\1 \2')
We capture the entire string, with the relevant parts in capturing groups. Then just return the two captured groups
using the split method is much better
full_names.map do |full_name|
parts = full_name.split('_').values_at(0,-1)
parts.last.slice!(1..-1)
parts.join(' ')
end
/^[A-Za-z]{5,15}\s[A-Za-z]{1}]$/i
This will have the following criteria:
5-15 characters for first name then a whitespace and finally a single character for last name.

Split by multiple delimiters

I'm receiving a string that contains two numbers in a handful of different formats:
"344, 345", "334,433", "345x532" and "432 345"
I need to split them into two separate numbers in an array using split, and then convert them using Integer(num).
What I've tried so far:
nums.split(/[\s+,x]/) # split on one or more spaces, a comma or x
However, it doesn't seem to match multiple spaces when testing. Also, it doesn't allow a space in the comma version shown above ("344, 345").
How can I match multiple delimiters?
You are using a character class in your pattern, and it matches only one character. [\s+,x] matches 1 whitespace, or a +, , or x. You meant to use (?:\s+|x).
However, perhaps, a mere \D+ (1 or more non-digit characters) should suffice:
"345, 456".split(/\D+/).map(&:to_i)
R1 = Regexp.union([", ", ",", "x", " "])
#=> /,\ |,|x|\ /
R2 = /\A\d+#{R1}\d+\z/
#=> /\A\d+(?-mix:,\ |,|x|\ )\d+\z/
def split_it(s)
return nil unless s =~ R2
s.split(R1).map(&:to_i)
end
split_it("344, 345") #=> [344, 345]
split_it("334,433") #=> [334, 433]
split_it("345x532") #=> [345, 532]
split_it("432 345") #=> [432, 345]
split_it("432&345") #=> nil
split_it("x32 345") #=> nil
Your original regex would work with a minor adjustment to move the '+' symbol outside the character class:
"344 ,x 345".split(/[\s,x]+/).map(&:to_i) #==> [344,345]
If the examples are actually the only formats that you'll encounter, this will work well. However, if you have to be more flexible and accommodate unknown separators between the numbers, you're better off with the answer given by Wiktor:
"344 ,x 345".split(/\D+/).map(&:to_i) #==> [344,345]
Both cases will return an array of Integers from the inputs given, however the second example is both more robust and easier to understand at a glance.
it doesn't seem to match multiple spaces when testing
Yeah, character class (square brackets) doesn't work like this. You apply quantifiers on the class itself, not on its characters. You could use | operator instead. Something like this:
.split(%r[\s+|,\s*|x])

Resources