Regex - First Integer in String (skip float) - ruby

**updated
right now I'm doing
a = gets
count = ((a.match(/\d+/)).to_s).to_i.
sample input: 2000 of 3.00
actual output: 2000
sample input: 3.00 of 2000
actual output: 3
objective output: 2000 in both cases (skip float)

"3.00 of 2000"[/(?<![.\d])\d+(?![.\d])/].to_i # => 2000
"2000 of 3.00"[/(?<![.\d])\d+(?![.\d])/].to_i # => 2000

This is one of those cases where you have to know your data. If you know that your input will always have exactly one integer, then the following approach will work:
'3.00 of 2000'.split.select { |e| e =~ /^\d+$/ }.last.to_i
#=> 2000
'2000 of 3.00'.split.select { |e| e =~ /^\d+$/ }.last.to_i
#=> 2000
The idea is to split each line of input into an array, and then select just the array elements that contain nothing but digits. Finally, the last (and hopefully only) element of the array is converted to an integer.
There are many ways this can blow up or fail to achieve the results you want given arbitrary input. However, it certainly works for the specific corpus you provided.

Use the code:
a = gets
a.split(/[\sa-z]+/).select {| v | v !~ /\./ }.last.to_i
# => 2000

No regex but...
'2000 to 3.00'.split.find { |s| s.to_i.to_s == s }.to_i
=> 2000
'3.00 to 2000'.split.find { |s| s.to_i.to_s == s }.to_i
=> 2000

The regex [^0-9.]([0-9]+)[^0-9] will match only numbers adjacent to characters which are not number or dot, and capture the number in the single capture group.
If the numbers can also appear adjacent to beginning or end of string, the fix should be self-evident;
(?:^|[^0-9.])([0-9]+)(?:[^0-9.]|$)

str = '3 of 20.00, +42,31 of 455, -6 of -23.7 .'
str.scan(/(?<![\.\d])(-?\d+)(?!\d*\.)/).flatten.map(&:to_i)
=> [3, 42, 31, 455, -6]
The capture group (-?\d+) consists of one or more digits 0-9, optionally preceded by a minus sign.
(?<![\.\d]) is a negative lookbehind group, meaning the capture group cannot be preceded by a decimal point or digit.
(?!\d*\.)/) is a negative lookahead group, meaning the capture group cannot be followed by zero or more digits followed by a decimal point.
str.scan(/(?<![\.\d])(-?\d+)(?!\d*\.)/) #=> [["3"], ["42"], ["31"], ["455"], ["-6"]], which is why flatten must be applied before conversion to integers.
Initially I tried (?<!\.\d*) as the negative lookbehind group, but that generated an error. The reason: negative lookbehinds cannot be variable-length. I understand the same restriction applies in Perl.
Edit: I somehow overlooked the title of the question. To retrieve just the first integer, either tack .first on the end of str.scan or replace that statement with:
str.match(/(?<![\.\d])(-?\d+)(?!\d*\.)/)[0].to_i

words_containing_non_digits = -> x {x[/\D/]}
p '3.00 of 2000'.split.reject &words_containing_non_digits #=> ["2000"]

Related

How to split a string in half, into two variables, in one statement?

I want to split str in half and assign each half to first and second
Like this pseudo code example:
first,second = str.split( middle )
class String
def halves
chars.each_slice(size / 2).map(&:join)
end
end
Will work, but you will need to adjust to how you want to handle odd-sized strings.
Or in-line:
first, second = str.chars.each_slice(str.length / 2).map(&:join)
first,second = str.partition(/.{#{str.size/2}}/)[1,2]
Explanation
You can use partition. Using a regex pattern to look for X amount of characters (in this case str.size / 2).
Partition returns three elements; head, match, and tail. Because we are matching on any character, the head will always be a blank string. So we only care about the match and tail hence [1,2]
Here are two ways to do that
rgx = /
(?<= # begin a positive lookbehind
\A # match the beginning of the string
.{#{str.size/2}} # match any character #{str.size/2} times
) # end positive lookbehind
/x # invoke free-spacing regex definition mode
def halves(str)
str.split(rgx)
end
first, second = halves('abcdef')
#=> ["abc", "def"]
first, second = halves('abcde')
#=> ["ab", "cde"]
The regular expression is conventionally written
/(?<=\A.{#{str.size/2}})/
Note that the regular expression matches a location between two successive characters.
def halves(str)
[str[0, str.size/2], str[str.size/2..-1]]
end
first, second = halves('abcdef')
#=> ["abc", "def"]
first, second = halves('abcde')
#=> ["ab", "cde"]
Note: This only works with even length strings.
Along the line of your pseudocode,
first, second = string[0...string.length/2], string[string.length/2...string.length]
If string is the original string.

Use regular expression to fetch 3 groups from string

This is my expected result.
Input a string and get three returned string.
I have no idea how to finish it with Regex in Ruby.
this is my roughly idea.
match(/(.*?)(_)(.*?)(\d+)/)
Input and expected output
# "R224_OO2003" => R224, OO, 2003
# "R2241_OOP2003" => R2244, OOP, 2003
If the example description I gave in my comment on the question is correct, you need a very straightforward regex:
r = /(.+)_(.+)(\d{4})/
Then:
"R224_OO2003".scan(r).flatten #=> ["R224", "OO", "2003"]
"R2241_OOP2003".scan(r).flatten #=> ["R2241", "OOP", "2003"]
Assuming that your three parts consist of (R and one or more digits), then an underbar, then (one or more non-whitespace characters), before finally (a 4-digit numeric date), then your regex could be something like this:
^(R\d+)_(\S+)(\d{4})$
The ^ indicates start of string, and the $ indicates end of string. \d+ indicates one or more digits, while \S+ says one or more non-whitespace characters. The \d{4} says exactly four digits.
To recover data from the matches, you could either use the pre-defined globals that line up with your groups, or you could could use named captures.
To use the match globals just use $1, $2, and $3. In general, you can figure out the number to use by counting the left parentheses of the specific group.
To use the named captures, include ? right after the left paren of a particular group. For example:
x = "R2241_OOP2003"
match_data = /^(?<first>R\d+)_(?<second>\S+)(?<third>\d{4})$/.match(x)
puts match_data['first'], match_data['second'], match_data['third']
yields
R2241
OOP
2003
as expected.
As long as your pattern covers all possibilities, then you just need to use the match object to return the 3 strings:
my_match = "R224_OO2003".match(/(.*?)(_)(.*?)(\d+)/)
#=> #<MatchData "R224_OO2003" 1:"R224" 2:"_" 3:"OO" 4:"2003">
puts my_match[0] #=> "R224_OO2003"
puts my_match[1] #=> "R224"
puts my_match[2] #=> "_"
puts my_match[3] #=> "00"
puts my_match[4] #=> "2003"
A MatchData object contains an array of each match group starting at index [1]. As you can see, index [0] returns the entire string. If you don't want the capture the "_" you can leave it's parentheses out.
Also, I'm not sure you are getting what you want with the part:
(.*?)
this basically says one or more of any single character followed by zero or one of any single character.

splitting a string into an array of 3 character from behind

I'm looking to split a numeric random string like "12345567" into the array ["12","345","567"] as simply as possible. basically changing a number into a human readable number array with splits at thousands,million, billions, etc..
my previous solution cuts it from the front rather than back
"'12345567".to_s.scan(/.{1,#{3}}/)
#> ["123","455","67"]
If you are on Rails, you can use the number_with_delimiter helper. In plain Ruby, you can include it.
require 'action_view'
require 'action_view/helpers'
include ActionView::Helpers::NumberHelper
number_with_delimiter("12345567", :delimiter => ',')
# => "12,345,567"
You can do a split on the comma, to get an Array
You could try the below.
> "12345567".scan(/\d+?(?=(?:\d{3})*$)/)
=> ["12", "345", "567"]
\d+? will do a non-greedy match of one or more digits which must be followed by exactly three digits, zero or more times and further followed by the end of a line.
\d+? will do a non-greedy match of one or more digits.
(?=..) called positive lookahead assertion which asserts that the match must be followed by,
(?:\d{3})* exactly three digits of zero or more times. So this would match an empty string or 111 or 111111 like multiples of 3.
$ End of the line anchor which matches the boundary which exists at the last.
OR
> "12345567".scan(/.{1,3}(?=(?:.{3})*$)/)
=> ["12", "345", "567"]
Here's one non-regex solution:
s = "12345567"
sz = s.size
n_first = sz % 3
((n_first>0) ? [s[0,n_first]] : []) + (n_first...sz).step(3).map { |i| s[i,3] }
#=> ["12", "345", "567"]
Another:
s.reverse.chars.each_slice(3).map { |a| a.join.reverse }.reverse
#=> ["12", "345", "567"]
A recursive approach:
def split(str)
str.size <= 3 ? [str] : (split(str[0..-4]) + [str[-3..-1]])
end
Hardly readable, though. Perhaps a more explicit code layout:
def split(str)
if str.size <= 3 then
[str] # Too short, keep it all.
else
split(str[0..-4]) + [str[-3..-1]] # Append the last 3, and recurse on the head.
end
end
Disclaimer: No test whatsoever on performance (or attempt to go for a clear tail recursion)! Just an alternative to explore.
It's hard to tell what you want, but maybe:
"12345567".scan(/^..|.{1,3}/)
=> ["12", "345", "567"]

How can I get the last occuring positive integer from a string?

I want to extract the last occurring positive integer from a string using regex. For example:
get-last-01-number-9.test should return 9
get-last-01-number7 should return 7
How can I accomplish this with regex?
You could try
(\d+)\D*$
Explanation:
(\d+) # a number
\D* # any amount of non-numbers
$ # end of string
This will capture the number in the first capture group.
Use Negative Lookahead
Find a positive integer that isn't followed by another positive integer using a greedy match like:
/\d+(?!.*\d+)/
For example:
'get-last-01-number-9.test'.match /\d+(?!.*\d+)/
#=> #<MatchData "9">
'get-last-01-number7'.match /\d+(?!.*\d+)/
#=> #<MatchData "7">
'get-last-01-number-202.test'.match /\d+(?!.*\d+)/
#=> #<MatchData "202">
'get-last-number'.match /\d+(?!.*\d+)/
#=> nil
This is probably slower than scanning if you have a large text blob, but some people might still find the lookahead assertion useful, especially for shorter strings.
Use Scan
A more straightforward method would be just to extract all integers (if any) with String#scan and then pop the last one. For example:
'get-last-01-number-9.test'.scan(/\d+/).pop
#=> "9"
'get-last-01-number7'.scan(/\d+/).pop
#=> "7"
'get-last-01-number-202.test'.scan(/\d+/).pop
#=> "202"
'get-last-number'.scan(/\d+/).pop
#=> nil
Scope of Answer
Negative integers weren't part of the question as originally posted, and will therefore not be addressed here. If negative integers are an issue for future visitors, and if it hasn't already been asked on Stack Overflow, please ask a separate question about them.
Use this expression to find 1+ digits with only non-digits following it till the end of the string (i.e. the last set of digits):
\d+(?=\D*$)
Demo
["get-last-01-number-9.test", "get-last-01-number7"].each do |e|
e.match(%r{\-number([\-\d]+)}) do |m|
last_no = m[1].gsub(%r{\-}, "")
puts "last_no:#{last_no} ---- #{File.basename __FILE__}:#{__LINE__}"
end
end
# last_no:9 ---- ex.rb:4
# last_no:7 ---- ex.rb:4
this pattern is probably the most efficient:
.*(\d+)
depending on the number of characters after the last digit to the end of string
Demo

Checking string with minimum 8 digits using regex

I have regex as follows:
/^(\d|-|\(|\)|\+|\s){12,}$/
This will allow digits, (, ), space. But I want to ensure string contains atleast 8 digits.
Some allowed strings are as follows:
(1323 ++24)233
24243434 43
++++43435++4554345 434
It should not allow strings like:
((((((1213)))
++++232+++
Use Look ahead within your regex at the start..
/^(?=(.*\d){8,})[\d\(\)\s+-]{8,}$/
---------------
|
|->this would check for 8 or more digits
(?=(.*\d){8,}) is zero width look ahead that checks for 0 to many character (i.e .*) followed by a digit (i.e \d) 8 to many times (i.e.{8,0})
(?=) is called zero width because it doesnt consume the characters..it just checks
To restict it to 14 digits you can do
/^(?=([^\d]*\d){8,14}[^\d]*$)[\d\(\)\s+-]{8,}$/
try it here
Here's a non regular expression solution
numbers = ["(1323 ++24)233", "24243434 43" , "++++43435++4554345 434", "123 456_7"]
numbers.each do |number|
count = 0
number.each_char do |char|
count += 1 if char.to_i.to_s == char
break if count > 7
end
puts "#{count > 7}"
end
No need to mention ^, $, or the "or more" part of {8,}, or {12,}, which is unclear where it comes from.
The following makes the intention transparent.
r = /
(?=(?:.*\d){8}) # First condition: Eight digits
(?!.*[^-\d()+\s]) # Second condition: Characters other than `[-\d()+\s]` should not be included.
/x
resulting in:
"(1323 ++24)233" =~ r #=> 0
"24243434 43" =~ r #=> 0
"++++43435++4554345 434" =~ r #=> 0
"((((((1213)))" =~ r #=> nil
"++++232+++" =~ r #=> nil

Resources