String destructive methods don't seem to work on string slice - ruby

I have a string and I want to capitalize the first letter. I tried the following:
x='abc'
x[0].upcase! # => "A"
x # => "abc"
It doesn't work as intended, even though the method upcase! is destructive. The following works:
x='abc'
x[0] = x[0].upcase # => "A"
x # => "Abc"
Can someone explain why upcase! doesn't work above?

x[0] is returning a new string. You are modifying this new string in place and then you do nothing with it.
Note:
You can use the capitalize method to do the same:
x='abc'
x.capitalize!
x #=> 'Abc'

upcase! is destructive, and it modifies the string x[0], but not x. You haven't done anything to x with upcase!.

It is a different string object. You can find out if you use object_id on your objects and compare them.

Related

How to sort a string x by the order in which they appear in string y?

I'm trying to create a method sort_by_letter that takes two string arguments and sorts the first by each letter in the order they appear in the second string.
x = "cat"
y = "kndttayc"
sort_by_letter(x, y)
#=> "tac"
try this
x.each_char.sort_by { |str| y.index str }.join
Subush's answer is straightforward, and works fine, but if the strings become long, and you want to care about efficiency, you might also want to do this:
h = y.each_char.with_index.to_h
#=> {"k"=>0, "n"=>1, "d"=>2, "t"=>4, "a"=>5, "y"=>6, "c"=>7}
x.each_char.sort_by{|c| h[c]}.join
#=> "tac"
Note: See Simple Lime's comment to the question. Subush's answer and my answer are respectively correct under different interpretation regarding this point.

Why does `gsub!` return `nil`?

I am using a hash map to advance the character by position: "a" into "b", etc., and to capitalize vowels.
def LetterChanges(str)
str.to_s
puts str
h = ("a".."z").to_a
i = ("b".."z").to_a.push("a")
hash = Hash[h.zip i]
new_str = str.downcase.gsub(/[a-z]/,hash)
new_str.gsub!(/[aeiou]/) {|n| n.upcase }
end
LetterChanges("hello world")
LetterChanges("sentence")
LetterChanges("replace!*")
LetterChanges("coderbyte")
LetterChanges("beautiful^")
LetterChanges("oxford")
LetterChanges("123456789ae")
LetterChanges("this long cake#&")
LetterChanges("a b c dee")
LetterChanges("a confusing /:sentence:/[ this is not!!!!!!!~")
The above code works as expected except for the examples "replace!*" and "123456789ae", for which it returns nil. Why is this?
String#gsub! modifies the original string, and returns that string or nil if no replacements were performed.
String#gsub does not modify the original string but always return the result even if nothing was changed.
Either return new_str at the end of your method, or use gsub instead of gsub!.
This is somewhat of a pattern in Ruby - when multiple version of a method exist, the one with ! will modify the receiver and the one without will simply return the result.
As an aside, it looks like you're not using the result of str.to_s for anything. If you know it's a string, to_s is pointless. If it might not be, you should make use of the result, for example like so:
str = str.to_s
String#gsub! returns nil when no substitution is performed.
new_str.gsub!(/[aeiou]/) {|n| n.upcase }
returns nil when new_str doesn't contain any vowel letters. This is the case for example, if str is "replace!*", new_str is sfqmbdf!*, no vowels.

How does Ruby's replace work?

I'm looking at ruby's replace: http://www.ruby-doc.org/core/classes/String.html#M001144
It doesn't seem to make sense to me, you call replace and it replaces the entire string.
I was expecting:
replace(old_value, new_value)
Is what I am looking for gsub then?
replace seems to be different than in most other languages.
I agree that replace is generally used as some sort of pattern replace in other languages, but Ruby is different :)
Yes, you are thinking of gsub:
ruby-1.9.2-p136 :001 > "Hello World!".gsub("World", "Earth")
=> "Hello Earth!"
One thing to note is that String#replace may seem pointeless, however it does remove 'taintediness". You can read more up on tained objects here.
I suppose the reason you feel that replace does not make sense is because there is assigment operator = (not much relevant to gsub).
The important point is that String instances are mutable objects. By using replace, you can change the content of the string while retaining its identity as an object. Compare:
a = 'Hello' # => 'Hello'
a.object_id # => 84793190
a.replace('World') # => 'World'
a.object_id # => 84793190
a = 'World' # => 'World'
a.object_id # => 84768100
See that replace has not changed the string object's id, whereas simple assignment did change it. This difference has some consequences. For example, suppose you assigned some instance variables to the string instance. By replace, that information will be retained, but if you assign the same variable simply to a different string, all that information is gone.
Yes, it is gsub and it is taken from awk syntax. I guess replace stands for the internal representation of the string, since, according to documentation, tainted-ness is removed too.

Regex with named capture groups getting all matches in Ruby

I have a string:
s="123--abc,123--abc,123--abc"
I tried using Ruby 1.9's new feature "named groups" to fetch all named group info:
/(?<number>\d*)--(?<chars>\s*)/
Is there an API like Python's findall which returns a matchdata collection? In this case I need to return two matches, because 123 and abc repeat twice. Each match data contains of detail of each named capture info so I can use m['number'] to get the match value.
Named captures are suitable only for one matching result.
Ruby's analogue of findall is String#scan. You can either use scan result as an array, or pass a block to it:
irb> s = "123--abc,123--abc,123--abc"
=> "123--abc,123--abc,123--abc"
irb> s.scan(/(\d*)--([a-z]*)/)
=> [["123", "abc"], ["123", "abc"], ["123", "abc"]]
irb> s.scan(/(\d*)--([a-z]*)/) do |number, chars|
irb* p [number,chars]
irb> end
["123", "abc"]
["123", "abc"]
["123", "abc"]
=> "123--abc,123--abc,123--abc"
Chiming in super-late, but here's a simple way of replicating String#scan but getting the matchdata instead:
matches = []
foo.scan(regex){ matches << $~ }
matches now contains the MatchData objects that correspond to scanning the string.
You can extract the used variables from the regexp using names method. So what I did is, I used regular scan method to get the matches, then zipped names and every match to create a Hash.
class String
def scan2(regexp)
names = regexp.names
scan(regexp).collect do |match|
Hash[names.zip(match)]
end
end
end
Usage:
>> "aaa http://www.google.com.tr aaa https://www.yahoo.com.tr ddd".scan2 /(?<url>(?<protocol>https?):\/\/[\S]+)/
=> [{"url"=>"http://www.google.com.tr", "protocol"=>"http"}, {"url"=>"https://www.yahoo.com.tr", "protocol"=>"https"}]
#Nakilon is correct showing scan with a regex, however you don't even need to venture into regex land if you don't want to:
s = "123--abc,123--abc,123--abc"
s.split(',')
#=> ["123--abc", "123--abc", "123--abc"]
s.split(',').inject([]) { |a,s| a << s.split('--'); a }
#=> [["123", "abc"], ["123", "abc"], ["123", "abc"]]
This returns an array of arrays, which is convenient if you have multiple occurrences and need to see/process them all.
s.split(',').inject({}) { |h,s| n,v = s.split('--'); h[n] = v; h }
#=> {"123"=>"abc"}
This returns a hash, which, because the elements have the same key, has only the unique key value. This is good when you have a bunch of duplicate keys but want the unique ones. Its downside occurs if you need the unique values associated with the keys, but that appears to be a different question.
If using ruby >=1.9 and the named captures, you could:
class String
def scan2(regexp2_str, placeholders = {})
return regexp2_str.to_re(placeholders).match(self)
end
def to_re(placeholders = {})
re2 = self.dup
separator = placeholders.delete(:SEPARATOR) || '' #Returns and removes separator if :SEPARATOR is set.
#Search for the pattern placeholders and replace them with the regex
placeholders.each do |placeholder, regex|
re2.sub!(separator + placeholder.to_s + separator, "(?<#{placeholder}>#{regex})")
end
return Regexp.new(re2, Regexp::MULTILINE) #Returns regex using named captures.
end
end
Usage (ruby >=1.9):
> "1234:Kalle".scan2("num4:name", num4:'\d{4}', name:'\w+')
=> #<MatchData "1234:Kalle" num4:"1234" name:"Kalle">
or
> re="num4:name".to_re(num4:'\d{4}', name:'\w+')
=> /(?<num4>\d{4}):(?<name>\w+)/m
> m=re.match("1234:Kalle")
=> #<MatchData "1234:Kalle" num4:"1234" name:"Kalle">
> m[:num4]
=> "1234"
> m[:name]
=> "Kalle"
Using the separator option:
> "1234:Kalle".scan2("#num4#:#name#", SEPARATOR:'#', num4:'\d{4}', name:'\w+')
=> #<MatchData "1234:Kalle" num4:"1234" name:"Kalle">
I needed something similar recently. This should work like String#scan, but return an array of MatchData objects instead.
class String
# This method will return an array of MatchData's rather than the
# array of strings returned by the vanilla `scan`.
def match_all(regex)
match_str = self
match_datas = []
while match_str.length > 0 do
md = match_str.match(regex)
break unless md
match_datas << md
match_str = md.post_match
end
return match_datas
end
end
Running your sample data in the REPL results in the following:
> "123--abc,123--abc,123--abc".match_all(/(?<number>\d*)--(?<chars>[a-z]*)/)
=> [#<MatchData "123--abc" number:"123" chars:"abc">,
#<MatchData "123--abc" number:"123" chars:"abc">,
#<MatchData "123--abc" number:"123" chars:"abc">]
You may also find my test code useful:
describe String do
describe :match_all do
it "it works like scan, but uses MatchData objects instead of arrays and strings" do
mds = "ABC-123, DEF-456, GHI-098".match_all(/(?<word>[A-Z]+)-(?<number>[0-9]+)/)
mds[0][:word].should == "ABC"
mds[0][:number].should == "123"
mds[1][:word].should == "DEF"
mds[1][:number].should == "456"
mds[2][:word].should == "GHI"
mds[2][:number].should == "098"
end
end
end
I really liked #Umut-Utkan's solution, but it didn't quite do what I wanted so I rewrote it a bit (note, the below might not be beautiful code, but it seems to work)
class String
def scan2(regexp)
names = regexp.names
captures = Hash.new
scan(regexp).collect do |match|
nzip = names.zip(match)
nzip.each do |m|
captgrp = m[0].to_sym
captures.add(captgrp, m[1])
end
end
return captures
end
end
Now, if you do
p '12f3g4g5h5h6j7j7j'.scan2(/(?<alpha>[a-zA-Z])(?<digit>[0-9])/)
You get
{:alpha=>["f", "g", "g", "h", "h", "j", "j"], :digit=>["3", "4", "5", "5", "6", "7", "7"]}
(ie. all the alpha characters found in one array, and all the digits found in another array). Depending on your purpose for scanning, this might be useful. Anyway, I love seeing examples of how easy it is to rewrite or extend core Ruby functionality with just a few lines!
A year ago I wanted regular expressions that were more easy to read and named the captures, so I made the following addition to String (should maybe not be there, but it was convenient at the time):
scan2.rb:
class String
#Works as scan but stores the result in a hash indexed by variable/constant names (regexp PLACEHOLDERS) within parantheses.
#Example: Given the (constant) strings BTF, RCVR and SNDR and the regexp /#BTF# (#RCVR#) (#SNDR#)/
#the matches will be returned in a hash like: match[:RCVR] = <the match> and match[:SNDR] = <the match>
#Note: The #STRING_VARIABLE_OR_CONST# syntax has to be used. All occurences of #STRING# will work as #{STRING}
#but is needed for the method to see the names to be used as indices.
def scan2(regexp2_str, mark='#')
regexp = regexp2_str.to_re(mark) #Evaluates the strings. Note: Must be reachable from here!
hash_indices_array = regexp2_str.scan(/\(#{mark}(.*?)#{mark}\)/).flatten #Look for string variable names within (#VAR#) or # replaced by <mark>
match_array = self.scan(regexp)
#Save matches in hash indexed by string variable names:
match_hash = Hash.new
match_array.flatten.each_with_index do |m, i|
match_hash[hash_indices_array[i].to_sym] = m
end
return match_hash
end
def to_re(mark='#')
re = /#{mark}(.*?)#{mark}/
return Regexp.new(self.gsub(re){eval $1}, Regexp::MULTILINE) #Evaluates the strings, creates RE. Note: Variables must be reachable from here!
end
end
Example usage (irb1.9):
> load 'scan2.rb'
> AREA = '\d+'
> PHONE = '\d+'
> NAME = '\w+'
> "1234-567890 Glenn".scan2('(#AREA#)-(#PHONE#) (#NAME#)')
=> {:AREA=>"1234", :PHONE=>"567890", :NAME=>"Glenn"}
Notes:
Of course it would have been more elegant to put the patterns (e.g. AREA, PHONE...) in a hash and add this hash with patterns to the arguments of scan2.
Piggybacking off of Mark Hubbart's answer, I added the following monkey-patch:
class ::Regexp
def match_all(str)
matches = []
str.scan(self) { matches << $~ }
matches
end
end
which can be used as /(?<letter>\w)/.match_all('word'), and returns:
[#<MatchData "w" letter:"w">, #<MatchData "o" letter:"o">, #<MatchData "r" letter:"r">, #<MatchData "d" letter:"d">]
This relies on, as others have said, the use of $~ in the scan block for the match data.
I like the match_all given by John, but I think it has an error.
The line:
match_datas << md
works if there are no captures () in the regex.
This code gives the whole line up to and including the pattern matched/captured by the regex. (The [0] part of MatchData) If the regex has capture (), then this result is probably not what the user (me) wants in the eventual output.
I think in the case where there are captures () in regex, the correct code should be:
match_datas << md[1]
The eventual output of match_datas will be an array of pattern capture matches starting from match_datas[0]. This is not quite what may be expected if a normal MatchData is wanted which includes a match_datas[0] value which is the whole matched substring followed by match_datas[1], match_datas[[2],.. which are the captures (if any) in the regex pattern.
Things are complex - which may be why match_all was not included in native MatchData.

How to replace the last occurrence of a substring in ruby?

I want to replace the last occurrence of a substring in Ruby. What's the easiest way?
For example, in abc123abc123, I want to replace the last abc to ABC. How do I do that?
How about
new_str = old_str.reverse.sub(pattern.reverse, replacement.reverse).reverse
For instance:
irb(main):001:0> old_str = "abc123abc123"
=> "abc123abc123"
irb(main):002:0> pattern="abc"
=> "abc"
irb(main):003:0> replacement="ABC"
=> "ABC"
irb(main):004:0> new_str = old_str.reverse.sub(pattern.reverse, replacement.reverse).reverse
=> "abc123ABC123"
"abc123abc123".gsub(/(.*(abc.*)*)(abc)(.*)/, '\1ABC\4')
#=> "abc123ABC123"
But probably there is a better way...
Edit:
...which Chris kindly provided in the comment below.
So, as * is a greedy operator, the following is enough:
"abc123abc123".gsub(/(.*)(abc)(.*)/, '\1ABC\3')
#=> "abc123ABC123"
Edit2:
There is also a solution which neatly illustrates parallel array assignment in Ruby:
*a, b = "abc123abc123".split('abc', -1)
a.join('abc')+'ABC'+b
#=> "abc123ABC123"
Since Ruby 2.0 we can use \K which removes any text matched before it from the returned match. Combine with a greedy operator and you get this:
'abc123abc123'.sub(/.*\Kabc/, 'ABC')
#=> "abc123ABC123"
This is about 1.4 times faster than using capturing groups as Hirurg103 suggested, but that speed comes at the cost of lowering readability by using a lesser-known pattern.
more info on \K: https://www.regular-expressions.info/keep.html
Here's another possible solution:
>> s = "abc123abc123"
=> "abc123abc123"
>> s[s.rindex('abc')...(s.rindex('abc') + 'abc'.length)] = "ABC"
=> "ABC"
>> s
=> "abc123ABC123"
When searching in huge streams of data, using reverse will definitively* lead to performance issues. I use string.rpartition*:
sub_or_pattern = "!"
replacement = "?"
string = "hello!hello!hello"
array_of_pieces = string.rpartition sub_or_pattern
( array_of_pieces[(array_of_pieces.find_index sub_or_pattern)] = replacement ) rescue nil
p array_of_pieces.join
# "hello!hello?hello"
The same code must work with a string with no occurrences of sub_or_pattern:
string = "hello_hello_hello"
# ...
# "hello_hello_hello"
*rpartition uses rb_str_subseq() internally. I didn't check if that function returns a copy of the string, but I think it preserves the chunk of memory used by that part of the string. reverse uses rb_enc_cr_str_copy_for_substr(), which suggests that copies are done all the time -- although maybe in the future a smarter String class may be implemented (having a flag reversed set to true, and having all of its functions operating backwards when that is set), as of now, it is inefficient.
Moreover, Regex patterns can't be simply reversed. The question only asks for replacing the last occurrence of a sub-string, so, that's OK, but readers in the need of something more robust won't benefit from the most voted answer (as of this writing)
You can achieve this with String#sub and greedy regexp .* like this:
'abc123abc123'.sub(/(.*)abc/, '\1ABC')
simple and efficient:
s = "abc123abc123abc"
p = "123"
s.slice!(s.rindex(p), p.size)
s == "abc123abcabc"
string = "abc123abc123"
pattern = /abc/
replacement = "ABC"
matches = string.scan(pattern).length
index = 0
string.gsub(pattern) do |match|
index += 1
index == matches ? replacement : match
end
#=> abc123ABC123
I've used this handy helper method quite a bit:
def gsub_last(str, source, target)
return str unless str.include?(source)
top, middle, bottom = str.rpartition(source)
"#{top}#{target}#{bottom}"
end
If you want to make it more Rails-y, extend it on the String class itself:
class String
def gsub_last(source, target)
return self unless self.include?(source)
top, middle, bottom = self.rpartition(source)
"#{top}#{target}#{bottom}"
end
end
Then you can just call it directly on any String instance, eg "fooBAR123BAR".gsub_last("BAR", "FOO") == "fooBAR123FOO"
.gsub /abc(?=[^abc]*$)/, 'ABC'
Matches a "abc" and then asserts ((?=) is positive lookahead) that no other characters up to the end of the string are "abc".

Resources