Matching S01E01 in Ruby
with
/S\d+?E\d+?/ix
which works, however, S01 E01 does not. I thought the /x should ignore white spaces?
/x ignores whitespace inside your regex, not in the text you're matching.
You're looking for
/S\d+?\s*E\d+?/i
The x option ignores whitespace in the regex itself, which allows you to better format your regexes for reading without modifying their meaning. You could write:
irb(main):008:0> r = /
irb(main):009:0/ S
irb(main):010:0/ d+?
irb(main):011:0/ E
irb(main):012:0/ d+?
irb(main):013:0/ /ix
To get an regex with the same meaning as your example.
Related
A string must begin with 3 or 4 letters (not numbers), and a ":" symbol should follow these letters, and after the colon there should be three more characters, like AAA. For example, AAAA:AAA or AAA:AAA.
I`m starting to build this, but regex is so much pain for me, can anyone help me with this?
Here is what I have now:
^[a-zA-Z]{3,4}(:)$
Your regex is almost there: you need to add [a-zA-Z]{3}.
I prefer the [[:alpha:]] POSIX class in Ruby to match letters though.
/[[:alpha:]]/ - Alphabetic character
POSIX bracket expressions are also similar to character classes. They provide a portable alternative to the above, with the added benefit that they encompass non-ASCII characters.
So, here is a possible regex:
\A[[:alpha:]]{3,4}:[[:alpha:]]{3}\z
See demo
The regex matches:
\A - start of string (in RoR, you have to use \A instead of ^, or you will get errors)
[[:alpha:]]{3,4} - 3 or 4 letters
: - literal :
[[:alpha:]]{3} - 3 letters
\z - end of string (in RoR, you have to use \z instead of $, or you will get errors)
To allow just AAA or AAAA, you need to introduce an optional (? quantifier) non-capturing group ((?:...) construction):
\A[[:alpha:]]{3,4}(?::[[:alpha:]]{3})?\z
^^^ ^^
See another demo
Try using this (quotes if regex in your dialect must be passed as a string)
"^[a-zA-Z]{3,4}:[a-zA-Z]{3}$"
I often see the gsub function being called with the pattern parameter enclosed in forward slashes. For example:
>> phrase = "*** and *** ran to the ###."
>> phrase.gsub(/\*\*\*/, "WOOF")
=> "WOOF and WOOF ran to the ###."
I thought maybe it had something to do with escaping asterisks, but using single quotes and double quotes works just as well:
>> phrase = "*** and *** ran to the ###."
>> phrase.gsub('***', "WOOF")
=> "WOOF and WOOF ran to the ###."
>> phrase.gsub("***", "WOOF")
=> "WOOF and WOOF ran to the ###."
Is it just convention to use forward slash? What am I missing?
Use forward slashes if you need to use regular expressions.
If you use a string argument with gsub, it will just do a plain character match.
In your example, you need backslashes to escape the asterisks when using a regular expression, because asterisks have a special meaning in regex (optionally match something any number of times). They are not necessary when using a string, because they are just matched exactly.
In your example, you probably don't need to use a regular expression, since it is a simple pattern. However, if you wanted to match *** only when it was at the beginning of a string (e.g. the first bunch in your example), then you would want to use a regex, for example:
phrase.gsub(/^\*{3}/, "WOOF")
For more information on regular expressions, see: http://www.regular-expressions.info/.
For more information on using regular expressions in Ruby, see: http://ruby-doc.org/core-2.2.0/Regexp.html.
To play with regular expressions as they work in Ruby, try: http://rubular.com/.
You are missing reading the documentation:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ādā, instead of a digit.
http://ruby-doc.org/core-2.1.4/String.html#method-i-gsub
In other words, you can give a string or a regular expression. Regular expressions can be delimited several ways:
Regexps are created using the /.../ and %r{...} literals, and by the Regexp::new constructor.
http://ruby-doc.org/core-2.2.2/Regexp.html
The benefit of %r and of the alternate %r delimiters is you can usually find a delimiter that doesn't collide with characters in the pattern, which would force escaping them, as in your example.
* has to be escaped because it has special meaning in a regex, but in a string it does not.
I want to create a regular expression such as:
/\\\s*\\\s*$/
I am trying it in this way:
Regexp.new('\\\s*\\\s*$') # => /\\s*\\s*$/
What am I doing wrong?
Well (\\) matches a single backslash. Backslash serves as an escape character for Regexp.
rgx = Regexp.new('\\\\\\s*\\\\\\s*$')
A more verbose way of doing this would be the following as #Cary Swoveland stated.
rgx = Regexp.new('\\{3}s*\\{3}s*$')
Using literal notation avoids some confusion. This compiles to what you said you want:
/\\\s*\\\s*$/
Though, to be clear, this still matches a single backslash, optional whitespace a single backslash and more optional whitespace. Backslashes are escaped when you inspect a regexp.
I'm working on some text processing in Ruby 1.8.7 to support some custom shortcodes that I've created. Here are some examples of my shortcode:
[CODE first-part]
[CODE first-part second-part]
I'm using the following RegEx to grab the
text.gsub!( /\[CODE (\S+)\s?(\S?)\]/i, replacementText )
The problem is this: the regex doesn't work on the following text:
[CODE first-part][CODE first-part-again]
The results are as follows:
1. first-part][CODE
2. first-part-again
It seems that the \s? is the problematic part of the regex that is searching on until it hits the last space, not the first one. When I change the regex to the following:
\[CODE ([\w-]+)\s?(\S*)\]/i
It works fine. The only concern I have is what all \w vs \s as I want to make sure the \w will match URL-safe characters.
I'm sure there's a perfectly valid explanation, but it's eluding me. Any ideas? Thanks!
Actually, thinking about it, just using [^\]] might not be enough, as it will swallow up all spaces as well. You also need to exclude those:
/\[CODE[ ]([^\]\s]+)\s?([^\]\s]*)\]/i
Note the [ ] - I just think it makes literal spaces more readable.
Working demo.
Explained in free-spacing mode:
\[CODE[ ] # match your identifier
( # capturing group 1
[^\]\s]+ # match one or more non-], non-whitespace characters
) # end of group 1
\s? # match an optional whitespace character
( # capturing group 2
[^\]\s]+ # match zero or more non-], non-whitespace characters
) # end of group 2
\] # match the closing ]
As none of the character classes in the pattern includes ], you can never possibly go beyond the end of the square bracketed expression.
By the way, if you find unnecessary escapes in regex as obscuring as I do, here is the minimal version:
/\[CODE[ ]([^]\s]+)\s?([^]\s]*)]/i
But that is definitely a matter of taste.
The problem was with the greedy \S+ in this
/\[CODE (\S+)\s?(\S?)\]/i
You could try:
/\[CODE (\S+?)\s?(\S?)\]/i
but actually your new character class is IMO superiror.
Even better might be:
/\[CODE ([^\]]+?)\s?([^\]]*)\]/i
I'm trying to learn RegEx in Ruby, based on what I'm reading in "The Rails Way". But, even this simple example has me stumped. I can't tell if it is a typo or not:
text.gsub(/\s/, "-").gsub([^\W-], '').downcase
It seems to me that this would replace all spaces with -, then anywhere a string starts with a non letter or number followed by a dash, replace that with ''. But, using irb, it fails first on ^:
syntax error, unexpected '^', expecting ']'
If I take out the ^, it fails again on the W.
>> text = "I love spaces"
=> "I love spaces"
>> text.gsub(/\s/, "-").gsub(/[^\W-]/, '').downcase
=> "--"
Missing //
Although this makes a little more sense :-)
>> text.gsub(/\s/, "-").gsub(/([^\W-])/, '\1').downcase
=> "i-love-spaces"
And this is probably what is meant
>> text.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase
=> "i-love-spaces"
\W means "not a word"
\w means "a word"
The // generate a regexp object
/[^\W-]/.class
=> Regexp
Step 1: Add this to your bookmarks. Whenever I need to look up regexes, it's my first stop
Step 2: Let's walk through your code
text.gsub(/\s/, "-")
You're calling the gsub function, and giving it 2 parameters.
The first parameter is /\s/, which is ruby for "create a new regexp containing \s (the // are like special "" for regexes).
The second parameter is the string "-".
This will therefore replace all whitespace characters with hyphens. So far, so good.
.gsub([^\W-], '').downcase
Next you call gsub again, passing it 2 parameters.
The first parameter is [^\W-]. Because we didn't quote it in forward-slashes, ruby will literally try run that code. [] creates an array, then it tries to put ^\W- into the array, which is not valid code, so it breaks.
Changing it to /[^\W-]/ gives us a valid regex.
Looking at the regex, the [] says 'match any character in this group. The group contains \W (which means non-word character) and -, so the regex should match any non-word character, or any hyphen.
As the second thing you pass to gsub is an empty string, it should end up replacing all the non-word characters and hyphens with empty string (thereby stripping them out )
.downcase
Which just converts the string to lower case.
Hope this helps :-)
You forgot the slashes. It should be /[^\W-]/
Well, .gsub(/[^\W-]/,'') says replace anything that's a not word nor a - for nothing.
You probably want
>> text.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase
=> "i-love-spaces"
Lower case \w (\W is just the opposite)
The slashes are to say that the thing between them is a regular expression, much like quotes say the thing between them is a string.