Regular Expression(Ruby): match non-word, but exclude spaces - ruby

^(?=(.*\d){4,})(?=(.*[A-Z]){3})(?!\s)(?=.*\W{2,})(?=(.*[a-z]){2,}).{12,14}$
The RegExp above is trying to:
match at least 4 digits - (?=(.*\d){4,})
match exactly 3 upper case letters - (?=(.*[A-Z]){3})
don't match spaces - (?!\s)
match at least 2 non-word characters - (?=.*\W{2,})
match at least 2 lower - (?=(.*[a-z]){2,})
string must be between 12 and 14 in length - .{12,14}
But I am having a challenge getting this to avoid matching spaces. It seems like because \W also includes spaces, my preceding negative look-ahead on spaces is being ignored.
For example:
b4A#Ac33*8Pd -- should match
b4A#Ac3 3*8Pd -- should not match
rubular link
Edited to provide further clarification:
Basically, I am trying to avoid having to spell out all the characters in the POSIX [:punct:] class ie !"#$%&'()*+,./:;<=>?#\^_\{|}~-` .. that is why I had a need to use \W .. But I would also want to exclude spaces
I can use a second pair of eyes, and more experienced suggestions here ..
Edited again, to correct mix-ups in counts specified in sub-patterns, as pointed out in the accepted answer below.

Instead of using dot ., use non spaces \S:
^(?=(.*\d){3,})(?=(.*[A-Z]){2})(?=.*\W{1,})(?=(.*[a-z]){1,})\S{12,14}$
// here ___^^
And is this a typo match at least 4 digits - (?=(.*\d){3,}),
it should be:
match at least 3 digits - (?=(.*\d){3,})
or
match at least 4 digits - (?=(.*\d){4,})
Same for other counts.

Related

Why does this regex not match numbers and single letters?

Why does this regex not match 3a?
(\/\d{1,4}?|\d{1,4}?|\d{1,4}[A-z]{1})
Using \d{1,4}\D{1}, the result is the same.
Streets numbers:
/1
78
3a
89/
-1 (special case)
1
https://regex101.com/r/cYCafR/3
The digits+letter combination is not matched due to the order of alternatives in your pattern. The \d{1,4}? matches the digit before the letter, and \d{1,4}[A-z]{1} does not even have a chance to step in. See the Remember That The Regex Engine Is Eager article.
The \/\d{1,4}? will match a / and a single digit after the slash, and \d{1,4}? will always match a single digit, as {min,max}? is a lazy range/interval/limiting quantifier and as such only matches as few chars as possible. See Laziness Instead of Greediness.
Besides, [A-z] is a typo, it should be [A-Za-z].
It seems you want
\d{1,4}[A-Za-z]|\/?\d{1,4}
See the regex demo. If it should be at the start of a line, use
^(?:\d{1,4}[A-Za-z]|\/?\d{1,4})
See this regex demo.
Details
^ - start of a line
(?: - start of a non-capturing group
\d{1,4}[A-Za-z] - 1 to 4 digits and an ASCII letter
| - or
\/? - an optional /
\d{1,4} - 1 to 4 digits
) - end of the group.
Your regex uses lazy quantifiers like {1,4}?. These will match one character, and stop, because the rest of the pattern (i.e. nothing) matches the rest of the string. See here for how greedy vs lazy quantifiers work.
Another reason is that you put the \d{1,4}[A-z]{1} case last. This case will only be tried if the first two cases don't match. With 3a, the 3 already matches the second case, so the last case won't be considered.
You seem to just want:
^(\d{1,4}[A-Za-z]|\/?\d{1,4})
Note how the \/\d{1,4} case and the \d{1,4} case in your original regex are combined into one case \/?\d{1,4}.

Regex for two letters and up to 13 digits incorrectly accepts additional letters

I am trying to build a regexp for a model field that follows this rule:
starts with two letters
can be filled up with digits, up to 13 digits
Valid examples:
US333
FR52389000
Invalid examples:
11111
T11
I thought I found the right regex:
/[a-zA-Z][a-zA-Z]\d*/
But proof testing it with http://rubular.com/ seems to validate RR444kjj
Can someone point out the mistake?
You need to use a limiting quantifier with \d and correct anchors.
/\A[[:alpha:]]{2}\d{0,13}\z/
See the regex demo.
\A - start of the string (note that the ^ anchor matches the start of the line in Ruby regex)
[[:alpha:]]{2} - 2 letters (to make sure you only allow ASCII letters, use [a-zA-Z]{2})
\d{0,13} - 0 to 13 digits
\z - end of string (note that the $ anchor matches the end of the line in Ruby regex).

Ruby regex specify length of captured group

I need to match a string of variable length(between 5 and 12), composed of uppercase letters and one or more digits between 1 and 8.
How can I specify that I need the whole captured group's length to be between 5 and 12?
I have tried with parenthesis but with no luck.
I have tried this
\s([A-Z]+[1-8]+[A-Z]+){5,12}\s
My idea was to use the quantifier {5,12} to limit the length of the captured group between parenthesis, but clearly it doesn't work like that.
The string needs to be identified inside a normal text just like
"THE STRING I NEED TO DECODE IS SOMETHING LIKE FD1531FHHKWF BUT NOT LIKE g4G58234JJ"
You actually have two conditions to met:
The length of the match is to be specified with curly brackets {5,12}, and before and after there should be not letters/digits. So:
/(?!\b[A-Z]+\b)\b[A-Z1-8]{5,12}\b/
First, we assure that the lookahead for letters only is negative, then we look for the pattern.
Use positive look-ahead on total size of regex
\s(?=^.{5,12}$)([A-Z]+[1-8]+[A-Z]+)\s
Explanation
(?= # look-ahead match start
^.{5,12}$ # 3 to 15 characters from start to end
) # look-ahead match end

Syntax Highlighting in Notepad++: how to highlight timestamps in log files

I am using Notepad++ to check logs. I want to define custom syntax highlighting for timestamps and log levels. Highlighting logs levels works fine (defined as keywords). However, I am still struggling with highlighting timestamps of the form
06 Mar 2014 08:40:30,193
Any idea how to do that?
If you just want simple highlighting, you can use Notepad++'s regex search mode. Open the Find dialog, switch to the Mark tab, and make sure Regular Expression is set as the search mode. Assuming the timestamp is at the start of the line, this Regex should work for you:
^\d{2}\s[A-Za-z]+\s\d{4}\s\d{2}:\d{2}:\d{2},[\d]+
Breaking it down bit by bit:
^ means the following Regex should be anchored to the start of the line. If your timestamp appears anywhere but the start of a line, delete this.
\d means match any digit (0-9). {n} is a qualifier that means to match the preceding bit of Regex exactly n times, so \d{2} means match exactly two digits.
\s means match any whitespace character.
[A-Za-z] means match any character in the set A-Z or the set a-z, and the + is a qualifier that means match the preceding bit of Regex 1 or more times. So we're looking for an alphabetic character sequence containing one or more alphabetic characters.
\s means match any whitespace character.
\d{4} is just like \d{2} earlier, only now we're matching exactly 4 digits.
\s means match any whitespace character.
\d{2} means match exactly two digits.
: matches a colon.
\d{2} matches exactly two digits.
: matches another colon.
\d{2} matches another two digits.
, matches a comma.
[\d]+ works similarly to the alphabetic search sequence we set up earlier, only this one's for digits. This finds one or more digits.
When you run this Regex on your document, the Mark feature will highlight anything that matches it. Unlike the temporary highlighting the "Find All in Document" search type can give you, Mark highlighting lasts even after you click somewhere else in the document.

regex for matching german postal codes but not a

following string:
23434 5465434
58495 / 46949345
58495 - 46949345
58495 / 55643
d 44444 ssdfsdf
64784
45643 dfgh
58495/55643
48593/48309596
675643235
34565435 34545
it only want to extract the bold ones. its a five digit number(german).
it should not match telephone numbers 43564 366334 or 45433 / 45663,etc as in my example above.
i tried something like ^\b\d{5} but thats not a good beginning.
some hints for me to get this working?
thanks for all hints
You could add a negative look-ahead assertion to avoid the matches with phone numbers.
\b[0124678][0-9]{4}\b(?!\s?[ \/-]\s?[0-9]+)
If you're using Ruby 1.9, you can add a negative look-behind assertion as well.
You haven't specified what distinguishes the number you're trying to search for.
Based on the example string you gave, it looks like you just want:
^(\d{5})\n
Which matches lines that start with 5 digits and contain nothing else.
You might want to permit some spaces after the first 5 digits (but nothing else):
^(\d{5})\s*\n
I'm not completely sure about the specified rules. But if you want lines that start with 5 digits and do not contain additional digits, this may work:
^(\d{5})[^\d]*$
If leading white space is okay, then:
^\s*(\d{5})[^\d]*$
Here is the Rubular link that shows the result.
^\D*(\d{5})(\s(\D)*$|()$)
This should (it's untested) match:
line starting with five digits (or some non-digits and then five digits), then
a space, and ending with some non-numbers
line starting and ending with five
digits (or some non-digits and then five digits)
\1 would be the five digits
\2 would be the whole second half, if any
\3 would be the word after the digits, if any
edited to fit the asker's edited question
edit again: I came up with a much more elegant solution:
^\D*(\d{5})\D*$

Resources