How does 'String#gsub { }' (with a block) work? - ruby

When I do,
> "fooo".gsub("o") {puts "Found an 'o'"}
Found an 'o'
Found an 'o'
Found an 'o'
=> "f"
gsub removes all 'o's. How does this work?
I think gsub passes each character to the block, but since block is doing nothing to the character itself (like catching it), it is dropped.
I think this is the case because, when I do
> "fooo".gsub("o"){|ch| ch.upcase}
=> "fOOO"
the block is catching the character and turning it into uppercase.
But when I do,
> "fooo".gsub("o", "u"){|ch| ch.upcase}
=> "fuuu"
How does Ruby handle the block in this case?
I found that Ruby plugs the blocks into methods using yield. (check this) But I am still not sure about my explanation for the first code example and third example. Can anyone put some more light on this?

The documentation of method String#gsub explains how it works, depending of what parameters it gets:
gsub(pattern, replacement) → new_str
gsub(pattern, hash) → new_str
gsub(pattern) {|match| block } → new_str
gsub(pattern) → enumerator
Returns a copy of str with all occurrences of pattern substituted for the second argument. The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. \\d will match a backslash followed by d, instead of a digit.
If replacement is a String it will be substituted for the matched text. It may contain back-references to the pattern’s capture groups of the form \\d, where d is a group number, or \\k<n>, where n is a group name. If it is a double-quoted string, both back-references must be preceded by an additional backslash. However, within replacement the special match variables, such as $&, will not refer to the current match.
If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string.
In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.
The result inherits any tainting in the original string or any supplied replacement string.
When neither a block nor a second argument is supplied, an Enumerator is returned.
The answer to your question looks straightforward now.
When only one argument is passed (the pattern), "the value returned by the block will be substituted for the match on each call".
Two arguments and a block is a case not covered by the documentation because it is not a valid combination. It seems that when two arguments are passed, String#gsub doesn't expect a block and ignores it.
Update
The purpose of String#gsub is to do a "global search", i.e. find all occurrences of some string or pattern and replace them.
The first argument, pattern, is the string or pattern to search for. There is nothing special about it. It can be a string or a regular expression. String#gsub searches it and finds zero or more matches (occurrences).
With only one argument and no block, String#gsub returns an iterator because it can find the pattern but it doesn't have a replacement string to use.
There are three ways to provide it the replacements for the matches (the first three cases described in the documentation quoted above):
a String is used to replace all the matches; it is usually used to remove parts from a string (by providing the empty string as replacement) or mask fragments of it (credit card numbers, passwords, email addresses etc);
a Hash is used to provide different replacements for each match; it is useful when the matches are known in advance;
a block is provided when the replacements depend on the matched substrings but the matches are not known in advance; for example, a block can convert each matching substring to uppercase and return it to let String#gsub use it as replacement.

The return value of puts is nil, which is converted to blank by to_s. Hence, each matched "o" is replaced with blank.

Related

Ruby method gsub with string '+'

I've found interesting thing in ruby. Do anybody know why is behavior?
tried '+'.gsub!('+', '\+') and expected "\\+" but got ""(empty string)
gsub is implemented, after some indirection, as rb_sub_str_bang in C, which calls rb_reg_regsub.
Now, gsub is supposed to allow the replacement string to contain backreferences. That is, if you pass a regular expression as the first argument and that regex defines a capture group, then your replacement string can include \1 to indicate that that capture group should be placed at that position.
That behavior evidently still happens if you pass an ordinary, non-regex string as the pattern. Your verbatim string obviously won't have any capture groups, so it's a bit silly in this case. But trying to replace, for instance, + with \1 in the string + will give the empty string, since \1 says to go get the first capture group, which doesn't exist and hence is vacuously "".
Now, you might be thinking: + isn't a number. And you'd be right. You're replacing + with \+. There are several other backreferences allowed in your replacement string. I couldn't find any official documentation where these are written down, but the source code does quite fine. To summarize the code:
Digits \1 through \9 refer to numbered capture groups.
\k<...> refers to a named capture group, with the name in angled brackets.
\0 or \& refer to the whole substring that was matched, so (\0) as a replacement string would enclose the match in parentheses.
A backslash followed by a backtick (I have no idea how to write that using StackOverflow's markdown) refers to the entire string up to the match.
\' refers to the entire string following the match.
\+ refers to the final capture group, i.e. the one with the highest number.
\\ is a literal backslash.
(Most of these are based on Perl variables of a similar name)
So, in your examples,
\+ as the replacement string says "take the last capture group". There is no capture group, so you get the empty string.
\- is not a valid backreference, so it's replaced verbatim.
\ok is, likewise, not a backreference, so it's replaced verbatim.
In \\+, Ruby eats the first backslash sequence, so the actual string at runtime is \+, equivalent to the first example.
For \\\+, Ruby processes the first backslash sequence, so we get \\+ by the time the replacement function sees it. \\ is a literal backslash, and + is no longer part of an escape sequence, so we get \+.

Find exact word in string and not partial

I have the following string
str = "feminino blue"
I need to know if there is a string called "mini" inside this string.
When I use include? method, the return is true because "feMINino" has "min"
Is there a way to search for the exact word that is passed as param?
Thanks
Sounds like a use case for regular expressions, which can match all kinds of more complex string patterns. You can read through that page for all the specifics (and it's very valuable to learn, not just as a Ruby concept; Regexes are used in almost every modern language), but this should cover your use case.
/\bmini\b/ =~ str
\b means "match a word boundary", so exactly one of the things to the left or right should be a word character and the other side should not (i.e. should be whitespace or the beginning/end of the string).
This will return nil if there's no match or the index of the match if there is one. Since nil is falsy and all numbers are truthy, this return value is safe to use in an if statement if all you need is a yes/no answer.
If the string you're working with is not constant and is instead in a variable called, say, my_word, you can interpolate it.
/\b#{Regexp.quote(my_word)}\b/ =~ str

ruby sub when replacement string starts with \0

In ruby, sub does not allow to replace a string by another one starting with '\0'.
'a'.sub('a','\\0b')
Returns:
'ab'
The doc says that \0 is interpreted as a backreference, but as the first parameter is not a Regexp, I don't understand why it works like that.
If you want your second argument to be interpreted as a plain String you can escape it like:
'a'.sub('a', Regexp.escape('\0b'))
or
'a'.sub('a', '\\\0b')
both returns:
"\\0b"
Explanation about this behaviour can be found in documentation
sub(pattern, replacement) → new_str
The pattern is typically a Regexp; if given as a String, any regular
expression metacharacters it contains will be interpreted literally,
e.g. '\d' will match a backslash followed by 'd', instead of a digit.
If replacement is a String it will be substituted for the matched
text. It may contain back-references to the pattern's capture groups
of the form "\d", where d is a group number, or "\k<n>", where n is a
group name. If it is a double-quoted string, both back-references must
be preceded by an additional backslash. However, within replacement
the special match variables, such as $&, will not refer to the current
match. If replacement is a String that looks like a pattern's capture
group but is actually not a pattern capture group e.g. "\'", then it
will have to be preceded by two backslashes like so "\'".

What is the difference between these three alternative ways to write Ruby regular expressions?

I want to match the path "/". I've tried the following alternatives, and the first two do match, but I don't know why the third doesn't:
/\A\/\z/.match("/") # <MatchData "/">
"/\A\/\z/".match("/") # <MatchData "/">
Regexp.new("/\A\/\z/").match("/") # nil
What's going on here? Why are they different?
The first snippet is the only correct one.
The second example is... misleading. That string literal "/\A\/\z/" is, obviously, not a regex. It's a string. Strings have #match method which converts its argument to a regexp (if not already one) and match against it. So, in this example, it's '/' that is the regular expression, and it matches a forward slash found in the other string.
The third line is completely broken: don't need the surrounding slashes there, they are part of regex literal, which you didn't use. Also use single quoted strings, not double quoted (which try to interpret escape sequences like \A)
Regexp.new('\A/\z').match("/") # => #<MatchData "/">
And, of course, none of the above is needed if you just want to check if a string consists of only one forward slash. Just use the equality check in this case.
s == '/'

What does <\1> mean in String#sub?

I was reading this and I did not understand it. I have two questions.
What is the difference ([aeiou]) and [aeiou]?
What does <\1> mean?
"hello".sub(/([aeiou])/, '<\1>') #=> "h<e>llo"
Here it documented:
If replacement is a String it will be substituted for the matched text. It may contain back-references to the pattern’s capture groups of the form "\d", where d is a group number, or "\k<n>", where n is a group name. If it is a double-quoted string, both back-references must be preceded by an additional backslash. However, within replacement the special match variables, such as &$, will not refer to the current match.
Character Classes
A character class is delimited with square brackets ([, ]) and lists characters that may appear at that point in the match. /[ab]/ means a or b, as opposed to /ab/ which means a followed by b.
Hope above definition made clear what [aeiou] is.
Capturing
Parentheses can be used for capturing. The text enclosed by the nth group of parentheses can be subsequently referred to with n. Within a pattern use the backreference \n; outside of the pattern use MatchData[n].
Hope above definition made clear what ([aeiou]) is.
([aeiou]) - any characters inside the character class [..],which will be found first from the string "hello",is the value of \1(i.e.the first capture group). In this example value of \1 is e,which will be replaced by <e> (as you defined <\1>). That's how "h<e>llo" has been generated from the string hello using String#sub method.
The doc you post says
It may contain back-references to the pattern’s capture groups of the
form "\d", where d is a group number, or "\k", where n is a group
name.
So \1 matches whatever was captured in the first () group, i.e. one of [aeiou] and then uses it in the replacement <\1>

Resources