I am learning Ruby and I have something to match with (/^1\/1. Guess a word from an anagram [RUBY]{4}$/)
Please, what does "1\/1." mean in this expression. Can anyone explain what's going on for me.
Thanks
Generally speaking, a backslash in a regular expression escapes the next character, so that it's treated as an ordinary character rather than whatever its special meaning would be. For instance a* matches zero or more of the letter a, but a\* matches, literally, an a followed by a star. Since most regular expressions in Ruby are wrapped in the delimiter /, we can't directly put forward slashes in our regex. If we had written
/^1/1. Guess a word from an anagram [RUBY]{4}$/
Then the regex would be /^1/ and the rest of the line would be a very confusing syntax error. This is for the same reasons that we can't put " characters directly inside of a "-delimited string.
So a backslash treats it as an actual slash in the expression rather than a delimiter.
/^1\/1. Guess a word from an anagram [RUBY]{4}$/
We're literally matches a 1 followed by a slash followed by a 1 at the start of the line.
Related
I've found interesting thing in ruby. Do anybody know why is behavior?
tried '+'.gsub!('+', '\+') and expected "\\+" but got ""(empty string)
gsub is implemented, after some indirection, as rb_sub_str_bang in C, which calls rb_reg_regsub.
Now, gsub is supposed to allow the replacement string to contain backreferences. That is, if you pass a regular expression as the first argument and that regex defines a capture group, then your replacement string can include \1 to indicate that that capture group should be placed at that position.
That behavior evidently still happens if you pass an ordinary, non-regex string as the pattern. Your verbatim string obviously won't have any capture groups, so it's a bit silly in this case. But trying to replace, for instance, + with \1 in the string + will give the empty string, since \1 says to go get the first capture group, which doesn't exist and hence is vacuously "".
Now, you might be thinking: + isn't a number. And you'd be right. You're replacing + with \+. There are several other backreferences allowed in your replacement string. I couldn't find any official documentation where these are written down, but the source code does quite fine. To summarize the code:
Digits \1 through \9 refer to numbered capture groups.
\k<...> refers to a named capture group, with the name in angled brackets.
\0 or \& refer to the whole substring that was matched, so (\0) as a replacement string would enclose the match in parentheses.
A backslash followed by a backtick (I have no idea how to write that using StackOverflow's markdown) refers to the entire string up to the match.
\' refers to the entire string following the match.
\+ refers to the final capture group, i.e. the one with the highest number.
\\ is a literal backslash.
(Most of these are based on Perl variables of a similar name)
So, in your examples,
\+ as the replacement string says "take the last capture group". There is no capture group, so you get the empty string.
\- is not a valid backreference, so it's replaced verbatim.
\ok is, likewise, not a backreference, so it's replaced verbatim.
In \\+, Ruby eats the first backslash sequence, so the actual string at runtime is \+, equivalent to the first example.
For \\\+, Ruby processes the first backslash sequence, so we get \\+ by the time the replacement function sees it. \\ is a literal backslash, and + is no longer part of an escape sequence, so we get \+.
Can someone help me with Ruby regex to check any word with letters starting with t and ending with r and replace with word Twitter? Thank you
I find that Rubular is very useful for working out how regexes work in Ruby.
You have two questions here. First, what regex will recognise what you want. Second, how to replace that found string with something else.
Your regex will be something like /\bt\w*r\b/. The elements here are \b, which is a word boundary. Then, we have the letter t, then any number of word characters \w*, then the letter r, and finally another word boundary \b. (Without the word-boundary characters, your regex will find t...r inside other words, too, so will work on things like 'stress', 'stirs' etc.
To do the replacement you want the gsub method.
new_string = your_string.gsub(/\bt\w*r\b/i, 'Twitter')
This will substitute the string Twitter for the found regex. The i on the end of the regex makes it case-insensitive - omit this if you want it to only find the lower-case text as in the regex.
I'm looking for words starting with a hashtag: "#yolo"
My regex for this was very simple: /#\w+/
This worked fine until I hit words that ended with a question mark: "#yolo?".
I updated my regex to allow for words and any non whitespace character as well: /#[\w\S]*/.
The problem is I sometimes need to pull a match from a word starting with two '#' characters, up until whitespace, that may contain a special character in it or at the end of the word (which I need to capture).
Example:
"##yolo?"
And I would like to end up with:
"#yolo?"
Note: the regular expressions are for Ruby.
P.S. I'm testing these out here: http://rubular.com/
Maybe this would work
#(#?[\S]+)
What about
#[^#\s]+
\w is a subset of ^\s (i.e. \S) so you don't need both. Also, I assume you don't want any more #s in the match, so we use [^#\s] which negates both whitespace and # characters.
For example, consider the following expressions:
no_space = "This is a test".match(/(\w+)(\w+)/)
with_space = "This is a test".match(/(\w+) (\w+)/)
The expression no_space is now the matchdata object #<MatchData "This" 1:"Thi" 2:"s">, while with_space is #<MatchData "This is" 1:"This" 2:"is">. What is going on here? It seems to me like the literal space between tokens indicates to ruby that it should match multiple words if possible, while not having a space causes the match to be limited to one word. Any explanation or clarification on the subject would be appreciated.
Thanks.
\w doesn't match space, and + is greedy unless you follow it by ?, so Ruby tries to match as many \w as possible, as long as the rest of the express also matches, effectively consuming Thi in the first capture, and s in the second.
When you add a space, Ruby matches as many \w until a space character, and then as many \w, therefore matching This and is.
Please let me know if this isn't clear.
With the regular expression /(\w+)(\w+)/, the only characters that can be matched are word characters (letters, digits, and underscores). A regular expression will only ever match consecutive characters in a string, so unless you include something in the regular expression to match the spaces between words the regex can't match more than a single word.
why this snippet:
'He said "Hello"' =~ /(\w)\1/
matches "ll"? I thought that the \w part matches "H", and hence \1 refers to "H", thus nothing should be matched? but why this result?
I thought that the \w part matches "H"
\w matches any alphanumerical character (and underscore). It also happens to match H but that’s not terribly interesting since the regular expression then goes on to say that this has to be matched twice – which H can’t in your text (since it doesn’t appear twice consecutively), and neither is any of the other characters, just l. So the regular expression matches ll.
You're thinking of /^(\w)\1/. The caret symbol specifies that the match must start at the beginning of the line. Without that, the match can start anywhere in the string (it will find the first match).
and you're right, nothing was matched at that position. then regex went further and found match, which it returned to you.
\w is of course matches any word character, not just 'H'.
The point is, "\1" means one repetition of the "(\w)" block, only the letter "l" is doubled and will match your regex.
A nice page for toying around with ruby and regular expressions is Rubular