I have the following strings
ALEXANDRITE OVAL 5.1x7.9 GIA# 6167482443 FINE w:1.16
ALEXANDRITE OVAL 4x6 FINE w:1.16
I want to match the 5.1 and 7.9 and the 4 and 6 and not w:1.16 or w: 1.16 or the 6167482443. So far I managed to come up with these:
Matching the w:1.16 w: 1.16
([w][:]\d\.?\d*|[w][:]\s?\d\.?\d*)
Matching the other digits:
\d+\.?\d{,3}
I kind of expected this not the return the long number sequence because of the {,3} but it still does.
My questions are :
1. How do I combine the two patterns excluding one and returning the other?
2. How do I exclude the long sequence of numbers? Why is it not being excluded now?
Thanks!
You could simply use the below regex.
\b(\d+(?:\.\d+)?)x(\d+(?:\.\d+)?)
DEMO
Explanation:
\b the boundary between a word char (\w) and
something that is not a word char
( group and capture to \1:
\d+ digits (0-9) (1 or more times)
(?: group, but do not capture (optional):
\. '.'
\d+ digits (0-9) (1 or more times)
)? end of grouping
) end of \1
x 'x'
( group and capture to \2:
\d+ digits (0-9) (1 or more times)
(?: group, but do not capture (optional):
\. '.'
\d+ digits (0-9) (1 or more times)
)? end of grouping
) end of \2
([\d\.])+x([\d\.])+
matches
5.1x7.9
4x6
(\d+(?:\.\d+)?)(?=x)|(?<=x)(\d+(?:\.\d+)?)
You can try this.See demo.
http://regex101.com/r/wQ1oW3/6
2)To ignore the long string you have to use \b\d{1,3}\b to specify boundaries.
http://regex101.com/r/wQ1oW3/7
Or else a part of long string will match.
Related
I want to match characters across multiple lines so I enabled the m flag. However, I do not want to match a specific \n. Instead I want to match a space \s only. But it seems like the newline is matching spaces too:
" 41\n6332 Hardin Rd, Bensalem, PA\n 19020" =~ /\s(\d+\s.+,.+,.+\d+)/m
=> 0
" 41\n6332 Hardin Rd, Bensalem, PA\n 19020" =~ /\s(\d+[ ].+,.+,.+\d+)/m
=> 3
Even I try to explicitly ignore the newline:
" 41\n6332 Hardin Rd, Bensalem, PA\n 19020" =~ /\s(\d+[^\n].+,.+,.+\d+)/m
=> 0
Why is the newline matching a space character? And what can I do to ensure that it does not and still matches characters across multiple lines everywhere else?
The /\s(\d+[^\n].+,.+,.+\d+)/m pattern matches " 41\n6332 Hardin Rd, Bensalem, PA\n 19020" because when the regex engine gets to [^\n] after matching 41 with \d+ backtracking occurs: the regex engine tries to match the string differently since it encountered \n and the next char should be a different char. So, it steps back to \d+ and matches 4, and 1 is not a newline, so matching continues.
You may anchor the search at the start of the string and prevent backtracking with a possessive quantifier, also implementing the negative check with a lookahead:
/\A\s*(\d++(?!\n).+,.+,.+\d)/m
See the regex demo
Details
\A - start of string
\s* - 0+ whitespaces
(\d++(?!\n).+,.+,.+\d) - Capturing group 1:
\d++(?!\n) - 1+ digits (matched possessively with ++ quantifier) not followed with a newline (as (?!\n) is a negative lookahead that fails the match if there is a newline immediately to the right of the current location)
.+,.+, - 2 occurrences of any 1+ chars as many as possible, followed with ,
.+\d - any 1+ chars as many as possible followed with a digit.
I am trying to find a right regex expression to select substrings between another substring, which I'd like to exclude. For example in this string:
11 - 12£ in $ + 13
I want to select 12£ and $. Basically, it's substrings around in, until I hit an array of values I want to use as end/start, in this case, arithmetic operators %w(+ - / *)
So far closest I got was using this regex /(.\d\p{Sc})\sin\s(\p{Sc})/
Some more examples:
10 - 12$ in £ - 13$ should return 12$ and £
12 $ in £ should return 12$ and £
100£in$ should return 100£ and $
sentence.match(/[^-+*\/]*in[^-+*\/]*/).to_s.strip.split(/ *in */)
[^-+*\/]* matches multiple non-arithmetic operators
this will hence get everything from the "opening" to the "closing" operator that surround an in
#strip removes the leading and trailing whitespaces
finally, split into two strings, removing in and the spaces around it
r = /
\s+[+*\/-]\s+ # match 1+ whitespaces, 1 char in char class, 1+ whitespaces
(\S+) # match 1+ non-whitespaces in capture group 1
\s+in\s+ # match 1+ whitespaces, 'in', 1+ whitespaces
(\S+) # match 1+ non-whitespaces in capture group 2
\s+[+*\/-]\s # match 1+ whitespaces, 1 char in char class, 1+ whitespaces
/x # free-spacing regex definition mode
str = '11 - 12£ in $ + 13 / 13F in % * 4'
str.scan(r)
#=> [["12£", "$"], ["13F", "%"]]
See the doc for String#scan to see how scan handles capture groups.
Note that '-' must be first or last in the character class [+*\/-].
1 #valid
1,5 #valid
1,5, #invalid
,1,5 #invalid
1,,5 #invalid
#'nothing' is also invalid
The number of numbers separated by commas can be arbitrary.
I'm trying to use regex to do this. This is what I have tried so far, but none of it worked:
"1,2,," =~ /^[[\d]+[\,]?]+$/ #returned 0
"1,2,," =~ /^[\d\,]+$/ #returned 0
"1,2,," =~ /^[[\d]+[\,]{,1}]+$/ #returned 0
"1,2,," =~ /^[[\d]+\,]+$/ #returned 0
Obviously, I needed the expression to recognize that 1,2,, is invalid, but they all returned 0 :(
Your patternsare not really working because:
^[[\d]+[\,]?]+$ - matches a line that contains one or more digit, +, ,, ? chars (and matches all the strings above but the last empty one)
^[\d\,]+$ - matches a line that consists of 1+ digits or , symbols
^[[\d]+[\,]{,1}]+$ - matches a line that contains one or more digit, +, ,, { and } chars
^[[\d]+\,]+$ - matches a line that contains one or more digit, +, and , chars.
Basically, the issue is that you try to rely on a character class, while you need a grouping construct, (...).
Comma-separated whole numbers can be validated with
/\A\d+(?:,\d+)*\z/
See the Rubular demo.
Details:
\A - start of string
\d+ - 1+ digits
(?:,\d+)* - zero or more occurrences of:
, - a comma
\d+ - 1+ digits
\z - end of string.
I am trying to construct a Ruby REGEX that will only allow the following:
some string (read letter only characters)
some string followed by numbers
some string followed by a period and another string
some string followed by a period and another string followed by numbers
period is only allowed if another string follows it
no other periods are allowed afterwards
numbers may only be at the very end
I have got \A[[^0-9.]a-z]*([0-9]*|((.)([[^0-9]a-z]*)[0-9]*))\z but I can't get what I need. This allows:
test.
test..
test.123
What is the correct REGEX? If someone could explain what I am doing wrong to help me understand for future that would be great too.
Edit: update requirements to be more descriptive
So I'm guessing you want identifiers separated by ..
By identifier I mean:
a string consisting of alphanumeric characters
that does not start with a number
and is atleast one characer long.
Written out as a grammar, it would look something like this:
EXPR := IDENT "." EXPR | IDENT
IDENT := [A-Z]\w*
And the regex for this would be the following:
/\A[A-Z]\w*(\.[A-Z]\w*)*\Z/i
Try it out here
Note Due to the behaviour of \w this pattern will also accept _ (underscores) after the first character (i.e. test_123 will also pass).
EDIT to reflect update of question
So the grammar you want is actually like this:
EXPR := IDENT [0-9]*
IDENT := STR | STR "." STR
STR := [A-Z]+
And the regexp then is this:
/\A[A-Z]+(\.[A-Z]+)?[0-9]*\z/i
Try this one out here
The explanation is as follows:
/ # start Regexp
\A # start of string
[A-Z]+ # "some string"
(
\. # followed by a period
[A-Z]+ # and another string
)? # period + another string is optional
[0-9]* # optional digits at the end
\z # end of string
/i # this regexp is case insensitive.
You can try
^[a-z]+\.?[a-z]+[0-9]*$
Here is demo
Note: use \A and \z to match starting and ending of string instead of line.
You need to escape . that matches any single character.
Pattern explanation:
^ the beginning of the line
[a-z]+ any character of: 'a' to 'z' (1 or more times)
\.? '.' (optional)
[a-z]+ any character of: 'a' to 'z' (1 or more times)
[0-9]* any character of: '0' to '9' (0 or more times)
$ the end of the line
This question already has answers here:
Regular expression to match digits and basic math operators
(10 answers)
Closed 8 years ago.
I'm trying to write a program that will take in a string and use RegEx to search for certain mathematical expressions, such as 1 * 3 + 4 / 2. Only operators to look for are [- * + /].
so far:
string = "something something nothing 1/ 2 * 3 nothing hello world"
a = /\d+\s*[\+ \* \/ -]\s*\d+/
puts a.match(string)
produces:
1/ 2
I want to grab the whole equation 1/ 2 * 3. I'm essentially brand new to the world of regex, so any help will be appreciated!
New Information:
a = /\s*-?\d+(?:\s*[-\+\*\/]\s*\d+)+/
Thank you to zx81 for his answer. I had to modify it in order to work. For some reason ^ and $ do not produce any output, or perhaps a nil output, for a.match(string). Also, certain operators need a \ before them.
Version to work with parenthesis:
a = /\(* \s* \d+ \s* (( [-\+\*\/] \s* \d+ \)* \s* ) | ( [-\+\*\/] \s* \(* \s* \d+ \s* ))+/
Regex Calculators
First off, you might want to have a look at this question about Regex Calculators (both RPN and non-RPN version).
But we're not dealing with parentheses, so we can go with something like:
^\s*-?\d+(?:\s*[-+*/]\s*\d+)+$
See demo.
Explanation
The ^ anchor asserts that we are at the beginning of the string
\s* allows optional spaces
-? allows an optional minus before the first digit
\d+ matches the first digits
The non-capturing group (?:\s*[-+*/]\s*\d+) matches optional spaces, an operator, optional spaces and digits
the + quantifier matches that one or more times
The $ anchor asserts that we are at the end of the string