recognize formatted numbers using regex - ruby

1 #valid
1,5 #valid
1,5, #invalid
,1,5 #invalid
1,,5 #invalid
#'nothing' is also invalid
The number of numbers separated by commas can be arbitrary.
I'm trying to use regex to do this. This is what I have tried so far, but none of it worked:
"1,2,," =~ /^[[\d]+[\,]?]+$/ #returned 0
"1,2,," =~ /^[\d\,]+$/ #returned 0
"1,2,," =~ /^[[\d]+[\,]{,1}]+$/ #returned 0
"1,2,," =~ /^[[\d]+\,]+$/ #returned 0
Obviously, I needed the expression to recognize that 1,2,, is invalid, but they all returned 0 :(

Your patternsare not really working because:
^[[\d]+[\,]?]+$ - matches a line that contains one or more digit, +, ,, ? chars (and matches all the strings above but the last empty one)
^[\d\,]+$ - matches a line that consists of 1+ digits or , symbols
^[[\d]+[\,]{,1}]+$ - matches a line that contains one or more digit, +, ,, { and } chars
^[[\d]+\,]+$ - matches a line that contains one or more digit, +, and , chars.
Basically, the issue is that you try to rely on a character class, while you need a grouping construct, (...).
Comma-separated whole numbers can be validated with
/\A\d+(?:,\d+)*\z/
See the Rubular demo.
Details:
\A - start of string
\d+ - 1+ digits
(?:,\d+)* - zero or more occurrences of:
, - a comma
\d+ - 1+ digits
\z - end of string.

Related

ignore a specific \n character while still enabling the m flag

I want to match characters across multiple lines so I enabled the m flag. However, I do not want to match a specific \n. Instead I want to match a space \s only. But it seems like the newline is matching spaces too:
" 41\n6332 Hardin Rd, Bensalem, PA\n 19020" =~ /\s(\d+\s.+,.+,.+\d+)/m
=> 0
" 41\n6332 Hardin Rd, Bensalem, PA\n 19020" =~ /\s(\d+[ ].+,.+,.+\d+)/m
=> 3
Even I try to explicitly ignore the newline:
" 41\n6332 Hardin Rd, Bensalem, PA\n 19020" =~ /\s(\d+[^\n].+,.+,.+\d+)/m
=> 0
Why is the newline matching a space character? And what can I do to ensure that it does not and still matches characters across multiple lines everywhere else?
The /\s(\d+[^\n].+,.+,.+\d+)/m pattern matches " 41\n6332 Hardin Rd, Bensalem, PA\n 19020" because when the regex engine gets to [^\n] after matching 41 with \d+ backtracking occurs: the regex engine tries to match the string differently since it encountered \n and the next char should be a different char. So, it steps back to \d+ and matches 4, and 1 is not a newline, so matching continues.
You may anchor the search at the start of the string and prevent backtracking with a possessive quantifier, also implementing the negative check with a lookahead:
/\A\s*(\d++(?!\n).+,.+,.+\d)/m
See the regex demo
Details
\A - start of string
\s* - 0+ whitespaces
(\d++(?!\n).+,.+,.+\d) - Capturing group 1:
\d++(?!\n) - 1+ digits (matched possessively with ++ quantifier) not followed with a newline (as (?!\n) is a negative lookahead that fails the match if there is a newline immediately to the right of the current location)
.+,.+, - 2 occurrences of any 1+ chars as many as possible, followed with ,
.+\d - any 1+ chars as many as possible followed with a digit.

Split a string by '":"' or a space after a numerical digit

I have a string like:
string = "roll:34 name:joshi ikera"
I want to split this string by the delimiting : and the space between the roll value and the name key. The output should look like this:
[roll, 34, name, joshi ikera]
I tried using:
string.split(/:|\d\s/)
but the output that I get is:
[roll, 3, name, joshi ikera]
How do I include the missing digit and just split by the space after the digit?
The \d\s matches and consumes the digit before a whitespace, and the consumed text is deleted by the Regexp#split() method. You need to use a lookaround, a lookbehind in this case, to make it a non-consuming pattern part, /:|(?<=\d)\s/ (see valtlai's comment). However, a more common approach in this scenario is to match 1 or more whitespace chars that are followed with 1+ word chars (if keys can only contain digits, letters and underscores) followed with : (see Sagar's comment).
I suggest
s.split(/\s+(?=\w+:)|:/)
# => roll
34
name
joshi ikera
Here,
\s+ - consumes 1+ whitespace chars
(?=\w+:) - that are followed with 1+ word chars and :
| - or
: - match and consume :.
Or, if the keys are unique
s.scan(/(\w+):(.*?)(?=\w+:|\z)/).to_h
# => {"roll"=>"34 ", "name"=>"joshi ikera"}
Here,
(\w+) - 1 or more word chars are captured into Group 1
: - a colon is matched
(.*?) - any 0+ chars other than line break chars are captured into Group 2 if immediately followed with
(?=\w+:|\z) - either 1+ word chars and then : (\w+:) or (|) end of string (\z).

How do I match a regex in which the next non-space character is not a "/"?

How do I express in regex the letter "s" whose next non-space character is not a "/"?
These should match: "s", "str"
These should not: "s/m", "s /n"
I tried this
"str" =~ /s[^[[:space:]]]^\// #=> nil
but it does not even match the simple use case.
It seems you need to match any s that is not followed with any 0+ whitespace chars and a / after them.
Use
/s(?![[:space:]]*\/)/
See the Rubular demo.
Details
s - the letter s
(?![[:space:]]*\/) - a negative lookahead that fails the match if, immediately to the right of the current location, there are
[[:space:]]* - 0+ whitespaces
\/ - a /.
If you merely want to know the number of 's' characters that are not followed by zero or more spaces and then a forward slash (as opposed to their indices in the string), you don't have to use a regular expression.
"sea shells /by the sea s/hore".delete(" ").gsub("s/", "").count("s")
#=> 3
If you only want to know if there is at least one such 's' you could replace count("s") with include?("s").
I'm not arguing that this is preferable to the use of a regular expression.

Regex for selecting substrings before and after a string

I am trying to find a right regex expression to select substrings between another substring, which I'd like to exclude. For example in this string:
11 - 12£ in $ + 13
I want to select 12£ and $. Basically, it's substrings around in, until I hit an array of values I want to use as end/start, in this case, arithmetic operators %w(+ - / *)
So far closest I got was using this regex /(.\d\p{Sc})\sin\s(\p{Sc})/
Some more examples:
10 - 12$ in £ - 13$ should return 12$ and £
12 $ in £ should return 12$ and £
100£in$ should return 100£ and $
sentence.match(/[^-+*\/]*in[^-+*\/]*/).to_s.strip.split(/ *in */)
[^-+*\/]* matches multiple non-arithmetic operators
this will hence get everything from the "opening" to the "closing" operator that surround an in
#strip removes the leading and trailing whitespaces
finally, split into two strings, removing in and the spaces around it
r = /
\s+[+*\/-]\s+ # match 1+ whitespaces, 1 char in char class, 1+ whitespaces
(\S+) # match 1+ non-whitespaces in capture group 1
\s+in\s+ # match 1+ whitespaces, 'in', 1+ whitespaces
(\S+) # match 1+ non-whitespaces in capture group 2
\s+[+*\/-]\s # match 1+ whitespaces, 1 char in char class, 1+ whitespaces
/x # free-spacing regex definition mode
str = '11 - 12£ in $ + 13 / 13F in % * 4'
str.scan(r)
#=> [["12£", "$"], ["13F", "%"]]
See the doc for String#scan to see how scan handles capture groups.
Note that '-' must be first or last in the character class [+*\/-].

Regexp for specific matching of character string

I need a regex to match something like
"4f0f30500be4443126002034"
and
"4f0f30500be4443126002034>4f0f31310be4443126005578"
but not like
"4f0f30500be4443126002034>4f0f31310be4443126005578>4f0f31310be4443126005579"
Try:
^[\da-f]{24}(>[\da-f]{24})?$
[\da-f]{24} is exactly 24 characters consisting only of 0-9, a-f. The whole pattern is one such number optionally followed by a > and a second such number.
I think you want something like:
/^[0-9a-f]{24}(>[0-9a-f]{24})?$/
That matches 24 characters in the 0-9a-f range (which matches your first string) followed by zero or one strings starting with a >, followed by 24 characters in the 0-9a-f range (which matches your second string). Here's a RegexPal for this regex.
Don't need a regex.
str = "4f0f30500be4443126002034>4f0f31310be4443126005578"
match = str.count('>') < 2
match will be set to true for matches where there are 1 or 0 '>' in the string. Otherwise match is set to false.

Resources