How can I get the last occuring positive integer from a string? - ruby

I want to extract the last occurring positive integer from a string using regex. For example:
get-last-01-number-9.test should return 9
get-last-01-number7 should return 7
How can I accomplish this with regex?

You could try
(\d+)\D*$
Explanation:
(\d+) # a number
\D* # any amount of non-numbers
$ # end of string
This will capture the number in the first capture group.

Use Negative Lookahead
Find a positive integer that isn't followed by another positive integer using a greedy match like:
/\d+(?!.*\d+)/
For example:
'get-last-01-number-9.test'.match /\d+(?!.*\d+)/
#=> #<MatchData "9">
'get-last-01-number7'.match /\d+(?!.*\d+)/
#=> #<MatchData "7">
'get-last-01-number-202.test'.match /\d+(?!.*\d+)/
#=> #<MatchData "202">
'get-last-number'.match /\d+(?!.*\d+)/
#=> nil
This is probably slower than scanning if you have a large text blob, but some people might still find the lookahead assertion useful, especially for shorter strings.
Use Scan
A more straightforward method would be just to extract all integers (if any) with String#scan and then pop the last one. For example:
'get-last-01-number-9.test'.scan(/\d+/).pop
#=> "9"
'get-last-01-number7'.scan(/\d+/).pop
#=> "7"
'get-last-01-number-202.test'.scan(/\d+/).pop
#=> "202"
'get-last-number'.scan(/\d+/).pop
#=> nil
Scope of Answer
Negative integers weren't part of the question as originally posted, and will therefore not be addressed here. If negative integers are an issue for future visitors, and if it hasn't already been asked on Stack Overflow, please ask a separate question about them.

Use this expression to find 1+ digits with only non-digits following it till the end of the string (i.e. the last set of digits):
\d+(?=\D*$)
Demo

["get-last-01-number-9.test", "get-last-01-number7"].each do |e|
e.match(%r{\-number([\-\d]+)}) do |m|
last_no = m[1].gsub(%r{\-}, "")
puts "last_no:#{last_no} ---- #{File.basename __FILE__}:#{__LINE__}"
end
end
# last_no:9 ---- ex.rb:4
# last_no:7 ---- ex.rb:4

this pattern is probably the most efficient:
.*(\d+)
depending on the number of characters after the last digit to the end of string
Demo

Related

How to split a string in half, into two variables, in one statement?

I want to split str in half and assign each half to first and second
Like this pseudo code example:
first,second = str.split( middle )
class String
def halves
chars.each_slice(size / 2).map(&:join)
end
end
Will work, but you will need to adjust to how you want to handle odd-sized strings.
Or in-line:
first, second = str.chars.each_slice(str.length / 2).map(&:join)
first,second = str.partition(/.{#{str.size/2}}/)[1,2]
Explanation
You can use partition. Using a regex pattern to look for X amount of characters (in this case str.size / 2).
Partition returns three elements; head, match, and tail. Because we are matching on any character, the head will always be a blank string. So we only care about the match and tail hence [1,2]
Here are two ways to do that
rgx = /
(?<= # begin a positive lookbehind
\A # match the beginning of the string
.{#{str.size/2}} # match any character #{str.size/2} times
) # end positive lookbehind
/x # invoke free-spacing regex definition mode
def halves(str)
str.split(rgx)
end
first, second = halves('abcdef')
#=> ["abc", "def"]
first, second = halves('abcde')
#=> ["ab", "cde"]
The regular expression is conventionally written
/(?<=\A.{#{str.size/2}})/
Note that the regular expression matches a location between two successive characters.
def halves(str)
[str[0, str.size/2], str[str.size/2..-1]]
end
first, second = halves('abcdef')
#=> ["abc", "def"]
first, second = halves('abcde')
#=> ["ab", "cde"]
Note: This only works with even length strings.
Along the line of your pseudocode,
first, second = string[0...string.length/2], string[string.length/2...string.length]
If string is the original string.

Simple regex - ignoring certain characters

I'm trying to use the match method with an argument of a regex to select a valid phone number, by definition, any string with nine digits.
For example:
9347584987 is valid,
(456)322-3456 is valid,
(324)5688890 is valid.
But
(340)HelloWorld is NOT valid and
456748 is NOT valid.
So far, I'm able to use \d{9} to select the example string of 9 digit characters in a row, but I'm not sure how to specifically ignore any character, such as '-' or '(' or ')' in the middle of the sequence.
What kind of Regex could I use here?
Given:
nums=['9347584987','(456)322-3456','(324)5688890','(340)HelloWorld', '456748 is NOT valid']
You can split on a NON digit and rejoin to remove non digits:
> nums.map {|s| s.split(/\D/).join}
["9347584987", "4563223456", "3245688890", "340", "456748"]
Then filter on the length:
> nums.map {|s| s.split(/\D/).join}.select {|s| s.length==10}
["9347584987", "4563223456", "3245688890"]
Or, you can grab a group of numbers that look 'phony numbery' by using a regex to grab digits and common delimiters:
> nums.map {|s| s[/[\d\-()]+/]}
["9347584987", "(456)322-3456", "(324)5688890", "(340)", "456748"]
And then process that list as above.
That would delineate:
> '123 is NOT a valid area code for 456-7890'[/[\d\-()]+/]
=> "123" # no match
vs
> '123 is NOT a valid area code for 456-7890'.split(/\D/).join
=> "1234567890" # match
I suggest using one regular expression for each valid pattern rather than constructing a single regex. It would be easier to test and debug, and easier to maintain the code. If, for example, "123-456-7890" or 123-456-7890 x231" were in future deemed valid numbers, one need only add a single, simple regex for each to the array VALID_PATTERS below.
VALID_PATTERS = [/\A\d{10}\z/, /\A\(\d{3}\)\d{3}-\d{4}\z/, /\A\(\d{3}\)\d{7}\z/]
def valid?(str)
VALID_PATTERS.any? { |r| str.match?(r) }
end
ph_nbrs = %w| 9347584987 (456)322-3456 (324)5688890 (340)HelloWorld 456748 |
ph_nbrs.each { |s| puts "#{s.ljust(15)} \#=> #{valid?(s)}" }
9347584987 #=> true
(456)322-3456 #=> true
(324)5688890 #=> true
(340)HelloWorld #=> false
456748 #=> false
String#match? made its debut in Ruby v2.4. There are many alternatives, including str.match(r) and str =~ r.
"9347584987" =~ /(?:\d.*){9}/ #=> 0
"(456)322-3456" =~ /(?:\d.*){9}/ #=> 1
"(324)5688890" =~ /(?:\d.*){9}/ #=> 1
"(340)HelloWorld" =~ /(?:\d.*){9}/ #=> nil
"456748" =~ /(?:\d.*){9}/ #=> nil
Pattern: (Rubular Demo)
^\(?\d{3}\)?\d{3}-?\d{4}$ # this makes the expected symbols optional
This pattern will ensure that an opening ( at the start of the string is followed by 3 numbers the a closing ).
^(\(\d{3}\)|\d{3})\d{3}-?\d{4}$
On principle, though, I agree with melpomene in advising that you remove all non-digital characters, test for 9 character length, then store/handle the phone numbers in a single/reliable/basic format.

Stuck in Abbreviation implementation to ruby string

I want to convert all the words(alphabetic) in the string to their abbreviations like i18n does. In other words I want to change "extraordinary" into "e11y" because there are 11 characters between the first and the last letter in "extraordinary". It works with a single word in the string. But how can I do the same for a multi-word string? And of course if a word is <= 4 there is no point to make an abbreviation from it.
class Abbreviator
def self.abbreviate(x)
x.gsub(/\w+/, "#{x[0]}#{(x.length-2)}#{x[-1]}")
end
end
Test.assert_equals( Abbreviator.abbreviate("banana"), "b4a", Abbreviator.abbreviate("banana") )
Test.assert_equals( Abbreviator.abbreviate("double-barrel"), "d4e-b4l", Abbreviator.abbreviate("double-barrel") )
Test.assert_equals( Abbreviator.abbreviate("You, and I, should speak."), "You, and I, s4d s3k.", Abbreviator.abbreviate("You, and I, should speak.") )
Your mistake is that your second parameter is a substitution string operating on x (the original entire string) as a whole.
Instead of using the form of gsub where the second parameter is a substitution string, use the form of gsub where the second parameter is a block (listed, for example, third on this page). Now you are receiving each substring into your block and can operate on that substring individually.
def short_form(str)
str.gsub(/[[:alpha:]]{4,}/) { |s| "%s%d%s" % [s[0], s.size-2, s[-1]] }
end
The regex reads, "match four or more alphabetic characters".
short_form "abc" # => "abc"
short_form "a-b-c" #=> "a-b-c"
short_form "cats" #=> "c2s"
short_form "two-ponies-c" #=> "two-p4s-c"
short_form "Humpty-Dumpty, who sat on a wall, fell over"
#=> "H4y-D4y, who sat on a w2l, f2l o2r"
I would recommend something along the lines of this:
class Abbreviator
def self.abbreviate(x)
x.gsub(/\w+/) do |word|
# Skip the word unless it's long enough
next word unless word.length > 4
# Do the same I18n conversion you do before
"#{word[0]}#{(word.length-2)}#{word[-1]}"
end
end
end
The accepted answer isn't bad, but it can be made a lot simpler by not matching words that are too short in the first place:
def abbreviate(str)
str.gsub(/([[:alpha:]])([[:alpha:]]{3,})([[:alpha:]])/i) { "#{$1}#{$2.size}#{$3}" }
end
abbreviate("You, and I, should speak.")
# => "You, and I, s4d s3k."
Alternatively, we can use lookbehind and lookahead, which makes the Regexp more complex but the substitution simpler:
def abbreviate(str)
str.gsub(/(?<=[[:alpha:]])[[:alpha:]]{3,}(?=[[:alpha:]])/i, &:size)
end

Can I use positive lookbehind to return a match in Ruby?

Suppose that I want to find all words in a given string that start with b and end with ing . However, I only want to return the portion of the
word that precedes the ing. Thus, if the word is bailing, I should only
match and return bail.
The below Ruby regex will certainly match:
\bt[a-zA-Z]*ing\b
but it doesn't return just the "bail" portion. Can I use some kind of lookahead or lookbehind assertion? If not, what is a good way to do this in Ruby?
words = "booster bailings balling failing"
words.scan /(?<=\b)b\w*?(?=ing\b)/
#⇒ ["ball"]
Here are two ways to extract the desired information.
str = "blathering fumbling blinging bérgering blings"
str.scan(/\bb[[:alpha:]]*(?=ing\b)/)
#=> ["blather", "bling", "bérger"]
str.scan(/\b(b[[:alpha:]]*)ing\b/).flatten
#=> ["blather", "bling", "bérger"]
whereas
str.scan(/\bb[a-zA-Z]*(?=ing\b)/)
#=> ["blather", "bling"]

Use regular expression to fetch 3 groups from string

This is my expected result.
Input a string and get three returned string.
I have no idea how to finish it with Regex in Ruby.
this is my roughly idea.
match(/(.*?)(_)(.*?)(\d+)/)
Input and expected output
# "R224_OO2003" => R224, OO, 2003
# "R2241_OOP2003" => R2244, OOP, 2003
If the example description I gave in my comment on the question is correct, you need a very straightforward regex:
r = /(.+)_(.+)(\d{4})/
Then:
"R224_OO2003".scan(r).flatten #=> ["R224", "OO", "2003"]
"R2241_OOP2003".scan(r).flatten #=> ["R2241", "OOP", "2003"]
Assuming that your three parts consist of (R and one or more digits), then an underbar, then (one or more non-whitespace characters), before finally (a 4-digit numeric date), then your regex could be something like this:
^(R\d+)_(\S+)(\d{4})$
The ^ indicates start of string, and the $ indicates end of string. \d+ indicates one or more digits, while \S+ says one or more non-whitespace characters. The \d{4} says exactly four digits.
To recover data from the matches, you could either use the pre-defined globals that line up with your groups, or you could could use named captures.
To use the match globals just use $1, $2, and $3. In general, you can figure out the number to use by counting the left parentheses of the specific group.
To use the named captures, include ? right after the left paren of a particular group. For example:
x = "R2241_OOP2003"
match_data = /^(?<first>R\d+)_(?<second>\S+)(?<third>\d{4})$/.match(x)
puts match_data['first'], match_data['second'], match_data['third']
yields
R2241
OOP
2003
as expected.
As long as your pattern covers all possibilities, then you just need to use the match object to return the 3 strings:
my_match = "R224_OO2003".match(/(.*?)(_)(.*?)(\d+)/)
#=> #<MatchData "R224_OO2003" 1:"R224" 2:"_" 3:"OO" 4:"2003">
puts my_match[0] #=> "R224_OO2003"
puts my_match[1] #=> "R224"
puts my_match[2] #=> "_"
puts my_match[3] #=> "00"
puts my_match[4] #=> "2003"
A MatchData object contains an array of each match group starting at index [1]. As you can see, index [0] returns the entire string. If you don't want the capture the "_" you can leave it's parentheses out.
Also, I'm not sure you are getting what you want with the part:
(.*?)
this basically says one or more of any single character followed by zero or one of any single character.

Resources