In my country the phone numbers follow a format like this (XX)XXXX-XXXX. But enter phone numbers according to the pattern in input texts it's too mainstream. Some people follow, but some people don't. I'd like to make a regex to catch all possible cases. By now it look like this:
/^[\(]?\d{2}?[\)]?\d{4}[. -]?\d{4}$/
And I prepared some test cases to prove the regex's functionality
# GOOD PHONES #
8432115262
843211 5262
843211.5262
843211-5262
32115262
3211.5262
3211 5262
3211-5262
(84)32115262
(84)3211.5262
(84)3211 5262
(84)3211-5262
# BAD PHONES #
!##$%*()
()32115262
()1231 3213
()1231.3213
()1231-3213
().3213
()-3213
()3213.
()3213-
3211-5a62
sakdiihbnmwlzi
Unfortunately, the wrong case ()32115262 is bypassing the regex. Altought it is clear why. this part [\(]?\d{2}?[\)]? is responsable for the mistake. From left to right, you can enter zero or one of (; You can enter zero or two digits; You can enter zero or one of ).
I'd like that part should be like this: If you put (, you will have to enter two digits and ), else you can enter zero or two digits. Something like this or with simmilar semantics is possible in regex world?
Thanks in advance
Something like this perhaps:
/^(?:\(\d{2}\)|\d{2}?)\d{4}[. -]?\d{4}$/
I used a non-matching group (?: ... ) and alternation to provide two possible options for the first part of the phone number.
Either it is \(\d{2}\) which means brackets with exactly two digits, or it is \d{2}? which means two digits or empty string.
Combine these two options together with | (which means OR) and you get the first part of the regex above: (?:\(\d{2}\)|\d{2}?)
It seemed to work for all your test cases!
try with this: ^(?:\(\d\d\)|\d\d)?\d{4}[. -]?\d{4}$
If pattern matches (..) then have to match 2 digits inside.
Related
I want to implement a custom token filter like this:
single words are accepted if they match a specific (regex) pattern - adjacent words are concatenated if one ends in a letter and the other one begins with a digit (or vice versa)
This seems to map to:
step 1 - shingle - adjacent words joined together with a space
step 2 - if token matches pattern /pat1/, keep ... if token matches /pata patb/, replace the whitespace
step 3 - remove everything else.
Is there a way to achieve that?. I have seen https://stackoverflow.com/questions/35742426/how-to-filter-tokens-based-on-a-regex-in-elasticsearch but dont feel like converting a complex pattern into one with lookahead.
the idea is to factor out potential order numbers from user input.
The data is assumed to be normalised, so an order number could be a regular isbn 978<10_more_digits> or something like "ME4713P". Users might input "ME 4713P" or 978-<10_digits_and_some_dashes> instead
Order numbers can be described as "contain both letters and digits, optional dashes" or "contain letters, a dash, more letters" or "contain digits, a dash, more digits"
BTW: sorry to use different email this time...
I want to make a REGEX for a phone number validation that only allows:
7 digits (5557865) that's formatted exactly like the example.
I'm pretty unfamiliar with regex or else I would tackle this myself. Hopefully that is enough info let me know if you need anything else.
Working example https://regex101.com/r/oZ2yQ0/1
\(\d{7}\)
\( matches the character ( literally
\d{7} match a digit [0-9]
Quantifier: {7} Exactly 7 times
\) matches the character ) literally
If it is just to match 7 digit number, you can use \d{7} or [0-9]{7}.
can be just with
/\d{7}/ #or
/\d{1..7}/ #or
/\d[0-9]{7}/
the \d matches digits and {7} the number of the digits.
There's a few ways to tackle this:
# Ensure the number consists entirely of seven digits, nothing else.
number.match(/\A\d{7}\z/)
# Remove all non-digit characters (\D) and test that the length is 7.
number.gsub(/\D/, '').length == 7
# Test that this is either NNN-NNNN or NNNNNNN.
number.match(/\A\d\d\d\-\d\d\d\d\z/)
Normally you want to make your validation methods as lenient as possible while still ensuring things are valid.
I want to be able to match all the following cases below using Ruby 1.8.7.
/pages/multiedit/16801,16809,16817,16825,16833
/pages/multiedit/16801,16809,16817
/pages/multiedit/16801
/pages/multiedit/1,3,5,7,8,9,10,46
I currently have:
\/pages\/multiedit\/\d*
This matches upto the first set of numbers. So for example:
"/pages/multiedit/16801,16809,16817,16825,16833"[/\/pages\/multiedit\/\d*/]
# => "/pages/multiedit/16801"
See http://rubular.com/r/ruFPx5yIAF for example.
Thanks for the help, regex gods.
\/pages\/multiedit\/\d+(?:,\d+)*
Example: http://rubular.com/r/0nhpgki6Gy
Edit: Updated to not capture anything... Although the performance hit would be negligible. (Thanks Tin Man)
The currently accepted answer of
\/pages\/multiedit\/[\d,]+
may not be a good idea because that will also match the following strings
.../pages/multiedit/,,,
.../pages/multiedit/,1,
My answer requires there be at least one digit before the first comma, and at least one digit between commas, and it must end with a digit.
I'd use:
/\/pages\/multiedit\/[\d,]+/
Here's a demonstration of the pattern at http://rubular.com/r/h7VLZS1W1q
[\d,]+ means "find one or more numbers or commas"
The reason \d* doesn't work is it means "find zero or more numbers". As soon as the pattern search runs into a comma it stops. You have to tell the engine that it's OK to find numbers and commas.
This seems like a simple one, but I am missing something.
I have a number of inputs coming in from a variety of sources and in different formats.
Number inputs
123
123.45
123,45 (note the comma used here to denote decimals)
1,234
1,234.56
12,345.67
12,345,67 (note the comma used here to denote decimals)
Additional info on the inputs
Numbers will always be less than 1 million
EDIT: These are prices, so will either be whole integers or go to the hundredths place
I am trying to write a regex and use gsub to strip out the thousands comma. How do I do this?
I wrote a regex: myregex = /\d+(,)\d{3}/
When I test it in Rubular, it shows that it captures the comma only in the test cases that I want.
But when I run gsub, I get an empty string: inputstr.gsub(myregex,"")
It looks like gsub is capturing everything, not just the comma in (). Where am I going wrong?
result = inputstr.gsub(/,(?=\d{3}\b)/, '')
removes commas only if exactly three digits follow.
(?=...) is a lookahead assertion: It needs to be possible to be matched at the current position, but it's not becoming part of the text that is actually matched (and subsequently replaced).
You are confusing "match" with "capture": to "capture" means to save something so you can refer to it later. You want to capture not the comma, but everything else, and then use the captured portions to build your substitution string.
Try
myregex = /(\d+),(\d{3})/
inputstr.gsub(myregex,'\1\2')
In your example, it is possible to tell from the number of digits after the last separator (either , or .) that it is a decimal point, since there are 2 lone digits. For most cases, if the last group of digits does not have 3 digits then you can assume that the separator in front is decimal point. Another sign is the multiple appearance of a separator in big numbers allows us to differentiate between decimal point and separators.
However, I can give a string 123,456 or 123.456 without any sort of context. It is impossible to tell whether they are "123 thousand 456" or "123 point 456".
You need to scan the document to look for clue whether , is used for thousand separator or decimal point, and vice versa for .. With the context provided, then you can safely apply the same method to remove the thousand separators.
You may also want to check out this article on Wikipedia on the less common ways to specify separators or decimal points. Knowing and deciding not to support is better than assuming things will work.
following string:
23434 5465434
58495 / 46949345
58495 - 46949345
58495 / 55643
d 44444 ssdfsdf
64784
45643 dfgh
58495/55643
48593/48309596
675643235
34565435 34545
it only want to extract the bold ones. its a five digit number(german).
it should not match telephone numbers 43564 366334 or 45433 / 45663,etc as in my example above.
i tried something like ^\b\d{5} but thats not a good beginning.
some hints for me to get this working?
thanks for all hints
You could add a negative look-ahead assertion to avoid the matches with phone numbers.
\b[0124678][0-9]{4}\b(?!\s?[ \/-]\s?[0-9]+)
If you're using Ruby 1.9, you can add a negative look-behind assertion as well.
You haven't specified what distinguishes the number you're trying to search for.
Based on the example string you gave, it looks like you just want:
^(\d{5})\n
Which matches lines that start with 5 digits and contain nothing else.
You might want to permit some spaces after the first 5 digits (but nothing else):
^(\d{5})\s*\n
I'm not completely sure about the specified rules. But if you want lines that start with 5 digits and do not contain additional digits, this may work:
^(\d{5})[^\d]*$
If leading white space is okay, then:
^\s*(\d{5})[^\d]*$
Here is the Rubular link that shows the result.
^\D*(\d{5})(\s(\D)*$|()$)
This should (it's untested) match:
line starting with five digits (or some non-digits and then five digits), then
a space, and ending with some non-numbers
line starting and ending with five
digits (or some non-digits and then five digits)
\1 would be the five digits
\2 would be the whole second half, if any
\3 would be the word after the digits, if any
edited to fit the asker's edited question
edit again: I came up with a much more elegant solution:
^\D*(\d{5})\D*$