This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
I found this from a code challenge:
def time_correct(t)
return unless t =~ /^\d{2}(:\d{2}){2}$/
end
it is used to find out whether e.g. "0;:44:07" is a regular time string ("HH:MM:SS") or not.
I don't understand the regex though. Can someone explain the /^\d{2}(:\d{2}){2}$/ to me please? Thanks!
On /^\d{2}(:\d{2}){2}$/:
/.../ delimiters the regex expression.
^ matches the start of the line, if on multi line mode, or the beginning of the string otherwise.
\d matches one digit
{2} states that the preceding statement \d must match 2 times.
(...) delimiters a capture group. It group things together as the usual math parenthesis concept and also allow you to you refer to them latter using \i, where i is the index of the group. Example, (a)(b), a is the group 1 and b is the group 2.
\d{2} just explained on the steps 3 and 4.
{2} the same as on the step 4, but here the preceding is the capture group (:\d{2}), which must repeat also 2 times.
$ matches the end of the line, if on multi line mode, or the end of the string otherwise.
If the multi line mode is enabled, your expression matches only things like:
22:33:44
02:33:44
But not as
22:33:44 d
d 22:33:44
f 02:33:44 f
If multi line is not enabled, your expression only matches a string containing a valid expression as:
22:33:44
But nothing, on a string with two valid lines:
22:33:44
02:33:44
This is a link for live testing: https://regex101.com/r/cdSdt4/1
Related
I am currently working on a ruby program to calculate terms. It works perfectly fine except for one thing: brackets. I need to filter the content or at least, to put the content into an array, but I have tried for an hour to come up with a solution. Here is my code:
splitted = term.split(/\(+|\)+/)
I need an array instead of the brackets, for example:
"1-(2+3)" #=>["1", "-", ["2", "+", "3"]]
I already tried this:
/(\((?<=.*)\))/
but it returned:
Invalid pattern in look-behind.
Can someone help me with this?
UPDATE
I forgot to mention, that my program will split the term, I only need the content of the brackets to be an array.
If you need to keep track of the hierarchy of parentheses with arrays, you won't manage it just with regular expressions. You'll need to parse the string word by word, and keep a stack of expressions.
Pseudocode:
Expressions = new stack
Add new array on stack
while word in string:
if word is "(": Add new array on stack
Else if word is ")": Remove the last array from the stack and add it to the (next) last array of the stack
Else: Add the word to the last array of the stack
When exiting the loop, there should be only one array in the stack (if not, you have inconsistent opening/closing parentheses).
Note: If your ultimate goal is to evaluate the expression, you could save time and parse the string in Postfix aka Reverse-Polish Notation.
Also consider using off-the-shelf libraries.
A solution depends on the pattern you expect between the parentheses, which you have not specified. (For example, for "(st12uv)" you might want ["st", "12", "uv"], ["st12", "uv"], ["st1", "2uv"] and so on). If, as in your example, it is a natural number followed by a +, followed by another natural number, you could do this:
str = "1-( 2+ 3)"
r = /
\(\s* # match a left parenthesis followed by >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
(\+) # match a plus sign in a capture group
\s* # match >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
\) # match a right parenthesis
/x
str.scan(r0).first
=> ["2", "+", "3"]
Suppose instead + could be +, -, * or /. Then you could change:
(\+)
to:
([-+*\/])
Note that, in a character class, + needn't be escaped and - needn't be escaped if it is the first or last character of the class (as in those cases it would not signify a range).
Incidentally, you received the error message, "Invalid pattern in look-behind" because Ruby's lookarounds cannot contain variable-length matches (i.e., .*). With positive lookbehinds you can get around that by using \K instead. For example,
r = /
\d+ # match one or more digits
\K # forget everything previously matched
[a-z]+ # match one or more lowercase letters
/x
"123abc"[r] #=> "abc"
I need a very simple string validator that would show where is first symbol not corresponding to the desired format. I want to use regex but in this case I have to find the place where the string stops corresponding to the expression and I can't find a method that would do that.
(It's got to be a fairly simple method... maybe there isn't one?)
For example if I have regex:
/^Q+E+R+$/
with string:
"QQQQEEE2ER"
The desired result should be 7
An idea: what you can do is to tokenize your pattern and write it with optional nested capturing groups:
^(Q+(E+(R+($)?)?)?)?
Then you only need to count the number of capture groups you obtain to know where the regex engine stops in the pattern and you can determine the offset of the match end in the string with the whole match length.
As #zx81 notices it in his comment, if one of the elements can match the next element (example Q can match the element E), things become different.
Let's say that Q is \w (and can match E and R). For the string QQQEEERRR the precedent pattern will give only one capturing group (the greedy \w+ matches all) when ^(\w+)(E+)(R+)$ will give three groups: QQQEE, E, RRR
To obtain the same result you need to add an alternation:
^((?:\w+(?=E)|\w+)(E+(R+($)?)?)?)?
In the alternation, the case where E exists must be tested first, and only if this branch fails (with the lookahead), then the other branch where E doesn't exist is used.
Thus the full pattern can be rewritten like this to deal with this specific case:
^((?:Q+(?=E)|Q+)((?:E+(?=R)|E+)((?:R+(?=$)|R+)($)?)?)?)?
Perhaps could you take a look to the gem amatch too.
This is an interesting task that can be accomplished with a neat regex trick:
^(?:(?=(Q+)))?(?:(?=(Q+E+)))?(?:(?=(Q+E+R+)))?(?:(?=(Q+E+R+$)))?
We have four optional lookaheads checking various parts of the pattern and capturing the partial matches to Groups 1, 2, 3 and 4 incrementally.
Group 1 contains Q+ if it can be matched, in your example QQQQ.
Group 2 contains Q+E+ if it can be matched, in your example EEE.
Group 3 contains Q+E+R+ if it can be matched, in your example nil.
Group 3 contains Q+E+R+$ if it can be matched, in your example nil.
In your code, check which is the last Group that is set by testing !$1.nil?, !$2.nil? and so on.
The last one set gives you the length that is matchable, so in your example $2.length gives you the 7 you wanted.
Incidentally, the fact that Group 2 is the last one set also tells you that we fail on R+.
For your example, you could do the following.
Code
Change your regex from:
/^Q+E+R+$/
to
R = /^(Q*)(E*)(R*)/
and then apply the following method to the string:
def nbr_matched_chars(str)
str.scan(R).flatten.reduce(0) {|t,e| return t if e.nil?; t+e.size }
end
str matches the original regex if and only if nbr_matched_chars(str) == str.size.
Examples
nbr_matched_chars("QQQQEEE2ER") #=> 7
nbr_matched_chars("QQQQEEEERR") #=> 10 (= "QQQQEEEERR".size)
nbr_matched_chars("QQAQQEEEER") #=> 2
Explanation
To see why this [evidently :-)] works, we can look at the results of invoking String#scan, followed by Array#flatten:
"QQQQEEE2ER".scan(r).flatten #=> ["QQQQ", "EEE" , nil ]
"QQQQEEEERR".scan(r).flatten #=> ["QQQQ", "EEEE", "RR"]
"QQAQQEEEER".scan(r).flatten #=> ["QQ" , nil , nil ]
I am trying to make a Regexp to match a expression which has more or equal to two '1's.
Here is what I have written till now -
puts "Match." if /(1){1,5}/ =~ test_string
This correctly matches strings having '1' more than or equal to two, but it still matches if the numbers of occurrences of '1' is greater than 5.
How can I correct this Regexp to only match strings having 1 to 5 occurrences of 1?
There are possibly better versions, but this seems to do the trick:
/^([^1]*1){1,5}[^1]*$/
Broken down:
^ - Start of string
[^1]*1 - Zero or more non-1 characters
1 - A '1'.
([^1]*1){1,5} - This pattern occurring between one and five times.
[^1]* - Zero or more non-1 characters
$ - End of string
#Adrian Wragg already have explained the answer,as asked by OP.But I would like to propose another possible solution for this problem,which is below:
puts "Match." if "#{test_string}".count("1") >= 2
If you have strings which contain characters other than one, here is a Regex that will do the job. See an example here at Rubular.
/\A([^1]*1[^1]*){1,5}\Z/
This will match any strings with 2 or more ones. See an example here at Rubular.
/\A1{2,}\Z/
This will match any string with 1-5 ones. See an example here at Rubular.
/\A1{1,5}\Z/
I would like to insert a <wbr> tag every 5 characters.
Input: s = 'HelloWorld-Hello guys'
Expected outcome: Hello<wbr>World<wbr>-Hell<wbr>o guys
s = 'HelloWorld-Hello guys'
s.scan(/.{5}|.+/).join("<wbr>")
Explanation:
Scan groups all matches of the regexp into an array. The .{5} matches any 5 characters. If there are characters left at the end of the string, they will be matched by the .+. Join the array with your string
There are several options to do this. If you just want to insert a delimiter string you can use scan followed by join as follows:
s = '12345678901234567'
puts s.scan(/.{1,5}/).join(":")
# 12345:67890:12345:67
.{1,5} matches between 1 and 5 of "any" character, but since it's greedy, it will take 5 if it can. The allowance for taking less is to accomodate the last match, where there may not be enough leftovers.
Another option is to use gsub, which allows for more flexible substitutions:
puts s.gsub(/.{1,5}/, '<\0>')
# <12345><67890><12345><67>
\0 is a backreference to what group 0 matched, i.e. the whole match. So substituting with <\0> effectively puts whatever the regex matched in literal brackets.
If whitespaces are not to be counted, then instead of ., you want to match \s*\S (i.e. a non whitespace, possibly preceded by whitespaces).
s = '123 4 567 890 1 2 3 456 7 '
puts s.gsub(/(\s*\S){1,5}/, '[\0]')
# [123 4 5][67 890][ 1 2 3 45][6 7]
Attachments
Source code and output on ideone.com
References
regular-expressions.info
Finite Repetition, Greediness
Character classes
Grouping and Backreferences
Dot Matches (Almost) Any Character
Here is a solution that is adapted from the answer to a recent question:
class String
def in_groups_of(n, sep = ' ')
chars.each_slice(n).map(&:join).join(sep)
end
end
p 'HelloWorld-Hello guys'.in_groups_of(5,'<wbr>')
# "Hello<wbr>World<wbr>-Hell<wbr>o guy<wbr>s"
The result differs from your example in that the space counts as a character, leaving the final s in a group of its own. Was your example flawed, or do you mean to exclude spaces (whitespace in general?) from the character count?
To only count non-whitespace (“sticking” trailing whitespace to the last non-whitespace, leaving whitespace-only strings alone):
# count "hard coded" into regexp
s.scan(/(?:\s*\S(?:\s+\z)?){1,5}|\s+\z/).join('<wbr>')
# parametric count
s.scan(/\s*\S(?:\s+\z)?|\s+\z/).each_slice(5).map(&:join).join('<wbr>')
I have the following string:
"h3. My Title Goes Here"
I basically want to remove the first four characters from the string so that I just get back:
"My Title Goes Here".
The thing is I am iterating over an array of strings and not all have the h3. part in front so I can't just ditch the first four characters blindly.
I checked the docs and the closest thing I could find was chomp, but that only works for the end of a string.
Right now I am doing this:
"h3. My Title Goes Here".reverse.chomp(" .3h").reverse
This gives me my desired output, but there has to be a better way. I don't want to reverse a string twice for no reason. Is there another method that will work?
To alter the original string, use sub!, e.g.:
my_strings = [ "h3. My Title Goes Here", "No h3. at the start of this line" ]
my_strings.each { |s| s.sub!(/^h3\. /, '') }
To not alter the original and only return the result, remove the exclamation point, i.e. use sub. In the general case you may have regular expressions that you can and want to match more than one instance of, in that case use gsub! and gsub—without the g only the first match is replaced (as you want here, and in any case the ^ can only match once to the start of the string).
You can use sub with a regular expression:
s = 'h3. foo'
s.sub!(/^h[0-9]+\. /, '')
puts s
Output:
foo
The regular expression should be understood as follows:
^ Match from the start of the string.
h A literal "h".
[0-9] A digit from 0-9.
+ One or more of the previous (i.e. one or more digits)
\. A literal period.
A space (yes, spaces are significant by default in regular expressions!)
You can modify the regular expression to suit your needs. See a regular expression tutorial or syntax guide, for example here.
A standard approach would be to use regular expressions:
"h3. My Title Goes Here".gsub /^h3\. /, '' #=> "My Title Goes Here"
gsub means globally substitute and it replaces a pattern by a string, in this case an empty string.
The regular expression is enclosed in / and constitutes of:
^ means beginning of the string
h3 is matched literally, so it means h3
\. - a dot normally means any character so we escape it with a backslash
is matched literally