ruby regex to extract two parts: digits, then whatever comes after - ruby

My user input is a string I need to split into two parts, (1) a partial phone number [any sequence of digits - . space, parens so I assume that is represented by /[\d\. \-\(\)]/ ] and (2) whatever follows (if anything).
For example
"88 comment" -> "88" & "comment"
"415-915 second part" --> "415-915" & "second part"
"(415) 915 part 2" --> "(415) 915" & "part 2"
"a note" --> "" & "a note"
"part 2" --> "" & "part 2"
As a relative newbie to ruby and regex, I have no idea how to extract multiple parts, and how to define the second part as being whatever comes after the first part (which basically means whatever comes after anything that doesn't match the first part)

Here's the regex (I'll explain below):
/^([-\d. ()]*)(.*)$/
^ means "start at the beginning of the string"
In ([-\d. ()]*), the * means "match any number of the previous character, and the parens mean to create a match group (this is how you will get the value later). So this is the first sequence.
In (.*), . means "match any single character", so .* means "match any number of any characters", it's basically a catch-all. The parens create a second match group.
$ means "finish at the end of the string"
So in ruby:
string =~ /^([-\d. ()]*)(.*)$/
puts $1.strip # is the phone number (with excess whitespace removed)
puts $2.strip # is the rest (with excess whitespace removed)

Try /([\d.\s()/-]*)(.+)/ The first group will capture the number, the second one the "other" part. I don't know ruby, so you have to implement that pattern yourself.

Related

How to extract substring between two characters/substrings

I have a string:
string1 = "my name is fname.lname and i live in xyz. my lname is not common"
I want to extract a substring from string1 that is anything between the first empty space " " and ".lname". In the case above, the answer should be "fname.lname"`.
string1[/(?<= ).*?(?=\.lname\b)/]
#=> "name is fname"
(?<= ) is a positive lookbehind that requires the first character matched be immediately preceded by a space, but that space is not part of the match.
(?=\.lname\b) is a positive lookahead that requires the last character matched is immediately followed by the string ".lname"1
, which is itself followed by a word break (\b), but that string is not part of the match. That ensures, for example, that "\.lnamespace" is not matched. If that should be matched, remove \b.
.*? matches zero more characters (.*), non-greedily (?). (Matches are by default greedy.) The non-greedy qualifier has the following effect:
"my name is fname.lname and fname.lname"[/(?<= ).*(?=\.lname\b)/]
#=> "name is fname.lname and fname"
"my name is fname.lname and fname.lname"[/(?<= ).*?(?=\.lname\b)/]
#=> "name is fname"
In other words, the non-greedy (greedy) match matches the first (last) occurrence of ".lname" in the string.
This could alternatively be written with a capture group and no lookarounds:
string1[/ (.*?)\.lname\b/, 1]
#=> "name is fname"
This regular expression reads, "mactch a space followed by zero or more characters, saved in capture group 1, followed by the string ".name" followed by a word break. This uses the form of String#[] that has two arguments, a reference to a capture group.
Yet another way follows.
string1[(string1 =~ / /)+1..(string1 =~ /\.lname\b/)-1]
#=> "name is fname"
1 The period in ".lname" must be escaped because an unescaped period in a regular expression (except in a character class) matches any character.

regular expression in Ruby with parentheses and match

In Ruby,
x = "this is a test".match(/(\w+) (\w+)/)
puts x[0], x[1], x[2]
why is the output
this is
this
is
Nothing special is going on here. You have the pattern
(\w+) (\w+)
namely two words separated by a space. That would be "this is" in your example (since we start looking for matches from the beginning of the string). The full match goes into the zeroth element of the return value, in your case x[0].
Now parentheses capture matches. The first left parenthesis starts at the first word, namely "this" so that value goes into x[1]. The second left parenthesis starts a group that matches the word "is", which will be captured into x[2].
Again, nothing special. This is how regular expression matching and grouping work in many, many languages.

Ruby regex to capture any string with a number

I am looking for a regular expression in Ruby to capture a sentence that has any sort of number in it.
For instance, I need to capture all of the following:
"5 different ways to do it"
"2 x 2 is certainly 4"
"there are 15 different things"
"Try to get to 10"
I only want to capture sentences with a number within, but that has nothing else before or after the number. I don't want to include things like:
"$2 billion dollars"
"The 5x effect"
It has to be just a sequence for 1 or more numbers at the beginning, middle, or end of a sentence.
Thanks.
You probably want something like:
/^.*(?<!\S)\d+(?!\S).*$/
Which will match a number and "look-around" for a non-space.
This
(s =~ /(^|\s)\d+(\s|$)/) ? s : nil
will return the string s if it contains at least one non-negative integer, that is:
the entire string,
at the beginning of the string followed by a whitespace character,
at the end the string preceded by a whitespace character, or
is both preceded and followed by a whitespace character.

get groups of characters n at a time

I'm trying to group a string by three (but could be any number) characters at a time. Using this code:
"this gets three at a time".scan(/\w\w\w/)
I get:
["thi","get","thr","tim"]
But what I'm trying to get is:
["thi","sge","tst","hre","eat","ati","me"]
\w matches letters digits and underscores (i.e. it's shorthand for [a-zA-Z0-9_]), not spaces. It does not magically skip spaces though, as you seem to expect.
So you'll first have to remove the spaces:
"this gets three at a time".gsub(/\s+/, "").scan(/.../)
or non-word characters:
"this gets three at a time".gsub(/\W+/, "").scan(/.../)
before you match the three characters.
Although you should rather use
"this gets three at a time".gsub(/\W+/, "").scan(/.{1,3}/)
to also obtain the last 1 or 2, if the length is not divisible by 3.
"this gets three at a time".tr(" \t\n\r", "").scan(/.{1,3}/)
You can try these as well:
sentence = "this gets three at a time"
sentence[" "] = ""
sentence.scan(/\w\w\w/) // no change in regex
Or:
sentence = "this gets three at a time"
sentence[" "] = ""
sentence.scan(/.{1,3}/)
Or:
sentence = "this gets three at a time"
sentence[" "] = ""
sentence.scan(/[a-zA-Z]{1,3}/)

How to remove the first 4 characters from a string if it matches a pattern in Ruby

I have the following string:
"h3. My Title Goes Here"
I basically want to remove the first four characters from the string so that I just get back:
"My Title Goes Here".
The thing is I am iterating over an array of strings and not all have the h3. part in front so I can't just ditch the first four characters blindly.
I checked the docs and the closest thing I could find was chomp, but that only works for the end of a string.
Right now I am doing this:
"h3. My Title Goes Here".reverse.chomp(" .3h").reverse
This gives me my desired output, but there has to be a better way. I don't want to reverse a string twice for no reason. Is there another method that will work?
To alter the original string, use sub!, e.g.:
my_strings = [ "h3. My Title Goes Here", "No h3. at the start of this line" ]
my_strings.each { |s| s.sub!(/^h3\. /, '') }
To not alter the original and only return the result, remove the exclamation point, i.e. use sub. In the general case you may have regular expressions that you can and want to match more than one instance of, in that case use gsub! and gsub—without the g only the first match is replaced (as you want here, and in any case the ^ can only match once to the start of the string).
You can use sub with a regular expression:
s = 'h3. foo'
s.sub!(/^h[0-9]+\. /, '')
puts s
Output:
foo
The regular expression should be understood as follows:
^ Match from the start of the string.
h A literal "h".
[0-9] A digit from 0-9.
+ One or more of the previous (i.e. one or more digits)
\. A literal period.
A space (yes, spaces are significant by default in regular expressions!)
You can modify the regular expression to suit your needs. See a regular expression tutorial or syntax guide, for example here.
A standard approach would be to use regular expressions:
"h3. My Title Goes Here".gsub /^h3\. /, '' #=> "My Title Goes Here"
gsub means globally substitute and it replaces a pattern by a string, in this case an empty string.
The regular expression is enclosed in / and constitutes of:
^ means beginning of the string
h3 is matched literally, so it means h3
\. - a dot normally means any character so we escape it with a backslash
is matched literally

Resources