What is between { }? - ruby

There is a piece of code:
def test_sub_is_like_find_and_replace
assert_equal "one t-three", "one two-three".sub(/(t\w*)/) { $1[0, 1] }
end
I found it really hard to understand what is between { } braces. Could anyone explain it please?

The {...} is a block. Ruby will pass the matched value to the block, and substitute the return value of the block back into the string. The String#sub documentation explains this more fully:
In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.
Edit: Per Michael's comment, if you're confused about $1[0, 1], this is just taking the first capture ($1) and taking a substring of it (the first character, specifically). $1 is a global variable set to the contents of the first capture after a regex (in true Perl fashion), and since it's a string, the #[] operator is used to take a substring of it starting at index 0, with a length of 1.

The sub method either takes two arguments, first being the text to replace replace and the second being the replacement, or one argument being the text to replace and a block defining how to handle the replacement.
The block method is useful if you can't define your replacement as a simple string.
For example:
"foo".sub(/(\w)/) { $1.upcase }
# => "Foo"
"foo".sub(/(\w+)/) { $1.upcase }
# => "FOO"
The gsub method works the same way, but applies more than once:
"foo".gsub(/(\w)/) { $1.upcase }
# => "FOO"
In all cases, $1 refers to the contents captured by the brackets (\w).
Your code, illustrated
r = "one two-three".sub(/(t\w*)/) do
$1 # => "two"
$1[0, 1] # => "t"
end
r # => "one t-three"

sub is taking in a regular expression in it. The $1 is a reserved global variable that contains the match for the regular expression.
The brackets represent a block of code used that will substitute the match with the string returned by the block. In this case
puts $1
#=> "two"
puts $1[0, 1]
#=> "t"

Related

Strange Ruby behavior when replacing with \'X string

I am generating some RTF strings and need the \'bd code. I am having problem with the sub and gsub commands.
puts 'abc'.sub('a',"\\'B") => "bcBbc"
The statement replaces the target with 'B' without the \', and copies the remainder of the string to the front. I've tried a lot of variations and it seems that the problem is the \' itself.
I have ways of getting around this, but I'd like to know whether I'm doing something fundamentally wrong or whether this is a quirk with Ruby.
Thanks
From the Ruby documentation:
Similarly, \&, \', \`, and \+ correspond to special variables, $&, $', $`, and $+, respectively.
And here, the documentation goes on to say:
$~ is equivalent to ::last_match;
$& contains the complete matched text;
$` contains string before match;
$' contains string after match;
$1, $2 and so on contain text matching first, second, etc capture group;
$+ contains last capture group.
Example:
m = /s(\w{2}).*(c)/.match('haystack') #=> #<MatchData "stac" 1:"ta" 2:"c">
$~ #=> #<MatchData "stac" 1:"ta" 2:"c">
Regexp.last_match #=> #<MatchData "stac" 1:"ta" 2:"c">
$& #=> "stac"
# same as m[0]
$` #=> "hay"
# same as m.pre_match
$' #=> "k"
# same as m.post_match
$1 #=> "ta"
# same as m[1]
$2 #=> "c"
# same as m[2]
$3 #=> nil
# no third group in pattern
$+ #=> "c"
# same as m[-1]
So, \' in a substitution replacement string has a special meaning. It means "The portion of the original string after the match" - which, in this case, is "bc".
So rather than getting \'Bbc, you get bcBbc
Therefore unfortunately, in this odd scenario, you'd need to double-escape the backslashes:
puts 'abc'.sub('a',"\\\\'B") => "\'Bbc"

Multiple Ruby chomp! statements

I'm writing a simple method to detect and strip tags from text strings. Given this input string:
{{foobar}}
The function has to return
foobar
I thought I could just chain multiple chomp! methods, like so:
"{{foobar}}".chomp!("{{").chomp!("}}")
but this won't work, because the first chomp! returns a NilClass. I can do it with regular chomp statements, but I'm really looking for a one-line solution.
The String class documentation says that chomp! returns a Str if modifications have been made - therefore, the second chomp! should work. It doesn't, however. I'm at a loss at what's happening here.
For the purposes of this question, you can assume that the input string is always a tag which begins and ends with double curly braces.
You can definitely chain multiple chomp statements (the non-bang version), still having a one-line solution as you wanted:
"{{foobar}}".chomp("{{").chomp("}}")
However, it will not work as expected because both chomp! and chomp removes the separator only from the end of the string, not from the beginning.
You can use sub
"{{foobar}}".sub(/{{(.+)}}/, '\1')
# => "foobar"
"alfa {{foobar}} beta".sub(/{{(.+)}}/, '\1')
# => "alfa foobar beta"
# more restrictive
"{{foobar}}".sub(/^{{(.+)}}$/, '\1')
# => "foobar"
Testing this out, it's clear that chomp! will return nil if the separator it's provided as an argument is not present at the end of the string.
So "{{text}}".chomp!("}}") returns a string, but "{{text}}".chomp!("{{") reurns nil.
See here for an answer of how to chomp at the beginning of a string. But recognize that chomp only looks at the end of the string. So you can call str.reverse.chomp!("{{").reverse to remove the opening brackets.
You could also use a regex:
string = "{{text}}"
puts [/^\{\{(.+)\}\}$/, 1]
# => "text"
Try tr:
'{{foobar}}'.tr('{{', '').tr('}}', '')
You can also use gsub or sub but if the replacement is not needed as pattern, then tr should be faster.
If there are always curly braces, then you can just slice the string:
'{{foobar}}'[2...-2]
If you plan to make a method which returns the string without curly braces then DO NOT use bang versions. Modifying the input parameter of a method will be suprising!
def strip(string)
string.tr!('{{', '').tr!('}}', '')
end
a = '{{foobar}}'
b = strip(a)
puts b #=> foobar
puts a #=> foobar

Finding the first duplicate character in the string Ruby

I am trying to call the first duplicate character in my string in Ruby.
I have defined an input string using gets.
How do I call the first duplicate character in the string?
This is my code so far.
string = "#{gets}"
print string
How do I call a character from this string?
Edit 1:
This is the code I have now where my output is coming out to me No duplicates 26 times. I think my if statement is wrongly written.
string "abcade"
puts string
for i in ('a'..'z')
if string =~ /(.)\1/
puts string.chars.group_by{|c| c}.find{|el| el[1].size >1}[0]
else
puts "no duplicates"
end
end
My second puts statement works but with the for and if loops, it returns no duplicates 26 times whatever the string is.
The following returns the index of the first duplicate character:
the_string =~ /(.)\1/
Example:
'1234556' =~ /(.)\1/
=> 4
To get the duplicate character itself, use $1:
$1
=> "5"
Example usage in an if statement:
if my_string =~ /(.)\1/
# found duplicate; potentially do something with $1
else
# there is no match
end
s.chars.map { |c| [c, s.count(c)] }.drop_while{|i| i[1] <= 1}.first[0]
With the refined form from Cary Swoveland :
s.each_char.find { |c| s.count(c) > 1 }
Below method might be useful to find the first word in a string
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
I believe the question can be interpreted in either of two ways (neither involving the first pair of adjacent characters that are the same) and offer solutions to each.
Find the first character in the string that is preceded by the same character
I don't believe we can use a regex for this (but would love to be proved wrong). I would use the method suggested in a comment by #DaveNewton:
require 'set'
def first_repeat_char(str)
str.each_char.with_object(Set.new) { |c,s| return c unless s.add?(c) }
nil
end
first_repeat_char("abcdebf") #=> b
first_repeat_char("abcdcbe") #=> c
first_repeat_char("abcdefg") #=> nil
Find the first character in the string that appears more than once
r = /
(.) # match any character in capture group #1
.* # match any character zero of more times
? # do the preceding lazily
\K # forget everything matched so far
\1 # match the contents of capture group 1
/x
"abcdebf"[r] #=> b
"abccdeb"[r] #=> b
"abcdefg"[r] #=> nil
This regex is fine, but produces the warning, "regular expression has redundant nested repeat operator '*'". You can disregard the warning or suppress it by doing something clunky, like:
r = /([^#{0.chr}]).*?\K\1/
where ([^#{0.chr}]) means "match any character other than 0.chr in capture group 1".
Note that a positive lookbehind cannot be used here, as they cannot contain variable-length matches (i.e., .*).
You could probably make your string an array and use detect. This should return the first char where the count is > 1.
string.split("").detect {|x| string.count(x) > 1}
I'll use positive lookahead with String#[] method :
"abcccddde"[/(.)(?=\1)/] #=> c
As a variant:
str = "abcdeff"
p str.chars.group_by{|c| c}.find{|el| el[1].size > 1}[0]
prints "f"

Ruby Koans - Regex and .sub: Don't understand reason behind answer

For clarification, here's the exact question in the about_regular_expressions.rb file that I'm having trouble with:
def test_sub_is_like_find_and_replace
assert_equal __, "one two-three".sub(/(t\w*)/) { $1[0, 1] }
end
I know what the answer is to this, but I don't understand what's happening to get that answer. I'm pretty new to Ruby and to regex, and in particular I'm confused about the code between the braces and how that's coming into play.
The code inside the braces is a block that sub uses to replace the match:
In the block form [...] The value returned by the block will be substituted for the match on each call.
The block receives the match as an argument but the usual regex variables ($1, $2, ...) are also available.
In this specific case, the $1 inside the block is "two" and the array notation extracts the first character of $1 (which is "t" in this case). So, the block returns "t" and sub replaces the "two" in the original string with just "t".

Ruby regex question wrt the sub method on String

I'm running through the Koans tutorial (which is a great way to learn) and I've encountered this statement:
assert_equal __, "one two-three".sub(/(t\w*)/) { $1[0, 1] }
In this statement the __ is where I'm supposed to put my expected result to make the test execute correctly. I have stared at this for a while and have pulled most of it apart but I cannot figure out what the last bit means:
{ $1[0, 1] }
The expected answer is:
"one t-three"
and I was expecting:
"t-t"
{ $1[0, 1] } is a block containing the expression $1[0,1]. $1[0,1] evaluates to the first character of the string $1, which contains the contents of the first capturing group of the last matched regex.
When sub is invoked with a regex and a block, it will find the first match of the regex, invoke the block, and then replace the matched substring with the result of the block.
So "one two-three".sub(/(t\w*)/) { $1[0, 1] } searches for the pattern t\w*. This finds the substring "two". Since the whole thing is in a capturing group, this substring is stored in $1. Now the block is called and returns "two"[0,1], which is "t". So "two" is replaced by "t" and you get "one t-three".
An important thing to note is that sub, unlike gsub, only replaces the first occurrence, not ever occurrence of the pattern.
#sepp2k already gave a really good answer, I just wanted to add how you could have used IRB to maybe get there yourself:
>> "one two-three".sub(/(t\w*)/) { $1 } #=> "one two-three"
>> "one two-three".sub(/(t\w*)/) { $1[0] } #=> "one t-three"
>> "one two-three".sub(/(t\w*)/) { $1[1] } #=> "one w-three"
>> "one two-three".sub(/(t\w*)/) { $1[2] } #=> "one o-three"
>> "one two-three".sub(/(t\w*)/) { $1[3] } #=> "one -three"
>> "one two-three".sub(/(t\w*)/) { $1[0,3] } #=> "one two-three"
>> "one two-three".sub(/(t\w*)/) { $1[0,2] } #=> "one tw-three"
>> "one two-three".sub(/(t\w*)/) { $1[0,1] } #=> "one t-three"
Cribbing from the documentation (http://ruby-doc.org/core/classes/String.html#M001185), here are answers to your two questions "why is the return value 'one t-three'" and "what does { $1[0, 1] } mean?"
What does { $1[0, 1] } mean?
The method String#sub can take either two arguments, or one argument and a block. The latter is the form being used here and it's just like the method Integer.times, which takes a block:
5.times { puts "hello!" }
So that explains the enclosing curly braces.
$1 is the substring matching the first capture group of the regex, as described here. [0, 1] is the string method "[]" which returns a substring based on the array values - here, the first character.
Put together, { $1[0, 1] } is a block which returns the first character in $1, where $1 is the substring to have been matched by a capture group when a regex was last used to match a string.
Why is the return value 'one t-three'?
The method String#sub ('substitute'), unlike its brother String#gsub ('globally substitute'), replaces the first portion of the string matching the regex with its replacement. Hence the method is going to replace the first substring matching "(t\w*)" with the value of the block described above - i.e. with its first character. Since 'two' is the first substring matching (t\w*) (a 't' followed by any number of letters), it is replaced by its first character, 't'.

Resources