I have the following string that I want to extract from:
/Monovolume/Honda+HR+V+1+6-11399031.htm
What I want to extract is the 8 digit number at the end which I tried with the following regex:
Monovolume\/.+(\d{7,})
It says 7 or more because there are cases where there are only 7 digits. The match, however, is only 7 digits and not 8 as in the above string. When I run the part in parentheses only I get the right result. What is causing this behaviour and how can I fix it?
P.S. I can't put the "-" in the regex, because its appearance is coincidental.
You're very close. Your problem is that your .+ will always consume one of the digits, as regex is by default "greedy".
I'm not sure about your requirements, but you could do a lazy match:
Monovolume\/.+?(\d{7,})
/|\
|
It will essentially repeat as few times as possible (when it reaches 7 or more digits).
See it live
More info here: Regex Lazy Quantification
Related
Why does this regex not match 3a?
(\/\d{1,4}?|\d{1,4}?|\d{1,4}[A-z]{1})
Using \d{1,4}\D{1}, the result is the same.
Streets numbers:
/1
78
3a
89/
-1 (special case)
1
https://regex101.com/r/cYCafR/3
The digits+letter combination is not matched due to the order of alternatives in your pattern. The \d{1,4}? matches the digit before the letter, and \d{1,4}[A-z]{1} does not even have a chance to step in. See the Remember That The Regex Engine Is Eager article.
The \/\d{1,4}? will match a / and a single digit after the slash, and \d{1,4}? will always match a single digit, as {min,max}? is a lazy range/interval/limiting quantifier and as such only matches as few chars as possible. See Laziness Instead of Greediness.
Besides, [A-z] is a typo, it should be [A-Za-z].
It seems you want
\d{1,4}[A-Za-z]|\/?\d{1,4}
See the regex demo. If it should be at the start of a line, use
^(?:\d{1,4}[A-Za-z]|\/?\d{1,4})
See this regex demo.
Details
^ - start of a line
(?: - start of a non-capturing group
\d{1,4}[A-Za-z] - 1 to 4 digits and an ASCII letter
| - or
\/? - an optional /
\d{1,4} - 1 to 4 digits
) - end of the group.
Your regex uses lazy quantifiers like {1,4}?. These will match one character, and stop, because the rest of the pattern (i.e. nothing) matches the rest of the string. See here for how greedy vs lazy quantifiers work.
Another reason is that you put the \d{1,4}[A-z]{1} case last. This case will only be tried if the first two cases don't match. With 3a, the 3 already matches the second case, so the last case won't be considered.
You seem to just want:
^(\d{1,4}[A-Za-z]|\/?\d{1,4})
Note how the \/\d{1,4} case and the \d{1,4} case in your original regex are combined into one case \/?\d{1,4}.
I want to make a REGEX for a phone number validation that only allows:
7 digits (5557865) that's formatted exactly like the example.
I'm pretty unfamiliar with regex or else I would tackle this myself. Hopefully that is enough info let me know if you need anything else.
Working example https://regex101.com/r/oZ2yQ0/1
\(\d{7}\)
\( matches the character ( literally
\d{7} match a digit [0-9]
Quantifier: {7} Exactly 7 times
\) matches the character ) literally
If it is just to match 7 digit number, you can use \d{7} or [0-9]{7}.
can be just with
/\d{7}/ #or
/\d{1..7}/ #or
/\d[0-9]{7}/
the \d matches digits and {7} the number of the digits.
There's a few ways to tackle this:
# Ensure the number consists entirely of seven digits, nothing else.
number.match(/\A\d{7}\z/)
# Remove all non-digit characters (\D) and test that the length is 7.
number.gsub(/\D/, '').length == 7
# Test that this is either NNN-NNNN or NNNNNNN.
number.match(/\A\d\d\d\-\d\d\d\d\z/)
Normally you want to make your validation methods as lenient as possible while still ensuring things are valid.
Jmeter :
I am having a JSON from which I have to fetch value of "ci".
I am using the following RegEx : ci:\s*(.*?)\" and getting the following result RegEx tester:
Match count: 1
Match1[0]=ci: 434547"
Match1=434547
Issue is Match1[0] is having spaces because of which while running the load test it says
: Server Error - Could not convert JSON to Object
Need help is correcting this RegEx.
Basically, your RegEx is fine. This is the way I would look for it too, the first group (Match[1]) would give you 434613, which is the value you are looking for. As I don't know that piece of software you are using, I have no idea why using just that match doesn't work.
Here is an idea to work around that: if the value will always be the only numeric value in the string, you could simplify the RegEx to:
\d+
This will give you a numeric value that is at least 1 digit long. If there are other numeric values in the string though, but these have different lengths, try this:
\d{m,n} --> between m and n digits long
\d{n,} --> at least n digits long
\d{0,n} --> not more than n digits long
This is not as secure / reliable as the original RegEx (since it assumes some certain conditions), but it might work in your case, because you don't have to look for groups but just use the whole matched text. Tell me if it helped!
I am trying to create a spam filter using Regular Expressions that matches the following situation.
There is a group of exactly 8 alphanumeric characters to be matched.
It must contain 2 or more uppercase letters;
AND it must contain 2 or more lowercase letters;
AND it must contain 1 or more numbers.
So far, all I have been able to come up with is this:
(?i)[A-Za-z0-9]{8}
My code does match a mixed case group of 8, but does not force upper or lower case or specify how many times each type must occur. So, I couple it with other must-haves that are always present in the messages in question.
Here is a sample of the pattern I am trying to detect:
WbNDSk9e
This is part of a spam URL. Other groups I have seen follow the same pattern of at least 2 each UC and LC letters and 1 or more numbers and always have exactly 8 characters. I've seen no other characters or variations yet.
To my knowledge, the only switch I am able to use is to turn on Case Sensitivity, with (?i). Some of the other switches I have seen in some replies do not work in the program I use. Am I asking too much from a single line RegExpr rule?
I currently use RegEx Match to test my rules and my anti-spam program uses the same engine.
^(?=.*?[A-Z].*?[A-Z])(?=.*?[a-z].*?[a-z])(?=.*?\d).{8}$
Broken down:
(?=.*?[A-Z].*?[A-Z]) forces at least 2 upper-case letters.
(?=.*?[a-z].*?[a-z]) forces at least 2 lower-case letters.
(?=.*?\d) forces at least 1 digit.
The ^ ... $ caret and dollar force that it matches the whole string.
You don't want the (?i) flag because it will make it case-insensitive.
following string:
23434 5465434
58495 / 46949345
58495 - 46949345
58495 / 55643
d 44444 ssdfsdf
64784
45643 dfgh
58495/55643
48593/48309596
675643235
34565435 34545
it only want to extract the bold ones. its a five digit number(german).
it should not match telephone numbers 43564 366334 or 45433 / 45663,etc as in my example above.
i tried something like ^\b\d{5} but thats not a good beginning.
some hints for me to get this working?
thanks for all hints
You could add a negative look-ahead assertion to avoid the matches with phone numbers.
\b[0124678][0-9]{4}\b(?!\s?[ \/-]\s?[0-9]+)
If you're using Ruby 1.9, you can add a negative look-behind assertion as well.
You haven't specified what distinguishes the number you're trying to search for.
Based on the example string you gave, it looks like you just want:
^(\d{5})\n
Which matches lines that start with 5 digits and contain nothing else.
You might want to permit some spaces after the first 5 digits (but nothing else):
^(\d{5})\s*\n
I'm not completely sure about the specified rules. But if you want lines that start with 5 digits and do not contain additional digits, this may work:
^(\d{5})[^\d]*$
If leading white space is okay, then:
^\s*(\d{5})[^\d]*$
Here is the Rubular link that shows the result.
^\D*(\d{5})(\s(\D)*$|()$)
This should (it's untested) match:
line starting with five digits (or some non-digits and then five digits), then
a space, and ending with some non-numbers
line starting and ending with five
digits (or some non-digits and then five digits)
\1 would be the five digits
\2 would be the whole second half, if any
\3 would be the word after the digits, if any
edited to fit the asker's edited question
edit again: I came up with a much more elegant solution:
^\D*(\d{5})\D*$