Ruby multiple string replacement - ruby

str = "Hello☺ World☹"
Expected output is:
"Hello:) World:("
I can do this: str.gsub("☺", ":)").gsub("☹", ":(")
Is there any other way so that I can do this in a single function call?. Something like:
str.gsub(['s1', 's2'], ['r1', 'r2'])

Since Ruby 1.9.2, String#gsub accepts hash as a second parameter for replacement with matched keys. You can use a regular expression to match the substring that needs to be replaced and pass hash for values to be replaced.
Like this:
'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"
'(0) 123-123.123'.gsub(/[()-,. ]/, '') #=> "0123123123"
In Ruby 1.8.7, you would achieve the same with a block:
dict = { 'e' => 3, 'o' => '*' }
'hello'.gsub /[eo]/ do |match|
dict[match.to_s]
end #=> "h3ll*"

Set up a mapping table:
map = {'☺' => ':)', '☹' => ':(' }
Then build a regex:
re = Regexp.new(map.keys.map { |x| Regexp.escape(x) }.join('|'))
And finally, gsub:
s = str.gsub(re, map)
If you're stuck in 1.8 land, then:
s = str.gsub(re) { |m| map[m] }
You need the Regexp.escape in there in case anything you want to replace has a special meaning within a regex. Or, thanks to steenslag, you could use:
re = Regexp.union(map.keys)
and the quoting will be take care of for you.

You could do something like this:
replacements = [ ["☺", ":)"], ["☹", ":("] ]
replacements.each {|replacement| str.gsub!(replacement[0], replacement[1])}
There may be a more efficient solution, but this at least makes the code a bit cleaner

Late to the party but if you wanted to replace certain chars with one, you could use a regex
string_to_replace.gsub(/_|,| /, '-')
In this example, gsub is replacing underscores(_), commas (,) or ( ) with a dash (-)

Another simple way, and yet easy to read is the following:
str = '12 ene 2013'
map = {'ene' => 'jan', 'abr'=>'apr', 'dic'=>'dec'}
map.each {|k,v| str.sub!(k,v)}
puts str # '12 jan 2013'

You can also use tr to replace multiple characters in a string at once,
Eg., replace "h" to "m" and "l" to "t"
"hello".tr("hl", "mt")
=> "metto"
looks simple, neat and faster (not much difference though) than gsub
puts Benchmark.measure {"hello".tr("hl", "mt") }
0.000000 0.000000 0.000000 ( 0.000007)
puts Benchmark.measure{"hello".gsub(/[hl]/, 'h' => 'm', 'l' => 't') }
0.000000 0.000000 0.000000 ( 0.000021)

Riffing on naren's answer above, I'd go with
tr = {'a' => '1', 'b' => '2', 'z' => '26'}
mystring.gsub(/[#{tr.keys}]/, tr)
So
'zebraazzeebra'.gsub(/[#{tr.keys}]/, tr) returns
"26e2r112626ee2r1"

Related

Convert array string objects into hashes?

I have an array:
a = ["us.production => 1", "us.stats => 1", "us.stats.total_active => 1", "us.stats.inactive => 0"]
How can I modify it into a hash object? e.g.:
h = {"us.production" => 1, "us.stats" => 1, "us.stats.total_active" => 1, "us.stats.inactive" => 0}
Thank you,
If pattern you are having is proper and constant, you can try following,
h = a.map { |x| x.split(' => ') }.to_h
# => {"us.production"=>"1", "us.stats"=>"1", "us.stats.total_active"=>"1", "us.stats.inactive"=>"0"}
Instead it is better to use split(/\s*=>\s*/) instead of split(' => ')
You can split every string with String#split and then convert an array of pairs into a hash with Array#to_h:
a = ["us.production => 1", "us.stats => 1", "us.stats.total_active => 1", "us.stats.inactive => 0"]
pairs = a.map{|s| s.split(/\s*=>\s*/)}
# => [["us.production", "1"], ["us.stats", "1"], ["us.stats.total_active", "1"], ["us.stats.inactive", "0"]]
pairs.to_h
# => {"us.production"=>"1", "us.stats"=>"1", "us.stats.total_active"=>"1", "us.stats.inactive"=>"0"}
/\s*=>\s*/ is a regular expression that matches first any number of whitespaces with \s*, then => then again any nuber of whitespaces. As it's a String#split delimiter, this part of string won't be present in a string pair.
The other answers to date are incorrect because they leave the values as strings, whereas the spec is that they be integers. That can be easily corrected. One way is to change s.split(/\s*=>\s*/) in #mrzasa's answer to k,v = s.split(/\s*=>\s*/); [k,v.to_i]. Another way is to tack .transform_values(&:to_i) to the ends of the expressions given in those answers. I expect the authors of those answers either didn't notice that integers were required or intended to leave it as an exercise for the OP to do the (rather uninteresting) conversion.
To make a single pass through the array and avoid the creation of a temporary array and local variables (other than block variables), I suggest using Enumerable#each_with_object (rather than map and to_h), and use regular expressions to extract both keys and values (rather than using String#split):
a = ["us.production => 1", "us.stats=>1", "us.stats.total_active => 1"]
a.each_with_object({}) { |s,h| h[s[/.*[^ ](?= *=>)/]] = s[/\d+\z/].to_i }
#=> {"us.production"=>1, "us.stats"=>1, "us.stats.total_active"=>1}
The first regular expression reads, "match zero or more characters (.*) followed by a character that is not a space ([^ ]), provided that is followed by zero or more spaces (*) followed by the string "=>". (?= *=>) is a positive lookahead.
The second regular expression reads, "match one or more digits (\d+) at the end of the string (the anchor \z). If that string could represent a negative integer, change that regex to /-?\d+\z/ (? makes the minus sign optional).

gsub method and regex (case sensitive and case insensitive)

In ruby, I want to substitute some letters in a string, is there a better way of doing this?
string = "my random string"
string.gsub(/a/, "#").gsub(/i/, "1").gsub(/o/, "0")`
And if I want to substitute both "a" and "A" with a "#", I know I can do .gsub(/a/i, "#"), but what if I want to substitute every "a" with an "e" and every "A" with an "E"? Is there a way of abstracting it instead of specifying both like .gsub(/a/, "e").gsub(/A/, "E")?
You can use a Hash. eg:
h = {'a' => '#', 'b' => 'B', 'A' => 'E'}
"aAbBcC".gsub(/[abA]/, h)
# => "#EBBcC"
Not really an answer to your question, but more an other way to proceed: use the translation:
'aaAA'.tr('aA', 'eE')
# => 'eeEE'
For the same transformation, you can also use the ascii table:
'aaAA'.gsub(/a/i) {|c| (c.ord + 4).chr}
# => 'eeEE'
other example (the last character is used by default):
'aAaabbXXX'.tr('baA', 'B#')
# => '####BBXXX'
Here are two variants of #Santosh's answer:
str ="aAbBcC"
h = {'a' => '#', 'b' => 'B', 'A' => 'E'}
#1
str.gsub(/[#{h.keys.join}]/, h) #=> "#EBBcC"
#2
h.default_proc = ->(_,k) { k }
str.gsub(/./, h) #=> "#EBBcC"
These offer better maintainability should h could change in future
You can also pass gsub a block
"my random string".gsub(/[aoi]/) do |match|
case match; when "a"; "#"; when "o"; "0"; when "i"; "I" end
end
# => "my r#nd0m strIng"
The use of a hash is of course much more elegant in this case, but if you have complex rules of substitution it can come in handy to dedicate a class to it.
"my random string".gsub(/[aoi]/) {|match| Substitute.new(match).result}
# => "my raws0m strAINng"

Scanning through a hash and return a value if true

Based on my hash, I want to match it if it's in the string:
def conv
str = "I only have one, two or maybe sixty"
hash = {:one => 1, :two => 2, :six => 6, :sixty => 60 }
str.match( Regexp.union( hash.keys.to_s ) )
end
puts conv # => <blank>
The above does not work but this only matches "one":
str.match( Regexp.union( hash[0].to_s ) )
Edited:
Any idea how to match "one", "two" and sixty in the string exactly?
If my string has "sixt" it return "6" and that should not happen based on #Cary's answer.
You need to convert each element of hash.keys to a string, rather than converting the array hash.keys to a string, and you should use String#scan rather than String#match. You may also need to play around with the regex until it returns everyhing you want and nothing you don't want.
Let's first look at your example:
str = "I only have one, two or maybe sixty"
hash = {:one => 1, :two => 2, :six => 6, :sixty => 60}
We might consider constructing the regex with word breaks (\b) before and after each word we wish to match:
r0 = Regexp.union(hash.keys.map { |k| /\b#{k.to_s}\b/ })
#=> /(?-mix:\bone\b)|(?-mix:\btwo\b)|(?-mix:\bsix\b)|(?-mix:\bsixty\b)/
str.scan(r0)
#=> ["one", "two", "sixty"]
Without the word breaks, scan would return ["one", "two", "six"], as "sixty" in str would match "six". (Word breaks are zero-width. One before a string requires that the string be preceded by a non-word character or be at the beginning of the string. One after a string requires that the string be followed by a non-word character or be at the end of the string.)
Depending on your requirements, word breaks may not be sufficient or suitable. Suppose, for example (with hash above):
str = "I only have one, two, twenty-one or maybe sixty"
and we do not wish to match "twenty-one". However,
str.scan(r0)
#=> ["one", "two", "one", "sixty"]
One option would be to use a regex that demands that matches be preceded by whitespace or be at the beginning of the string, and be followed by whitespace or be at the end of the string:
r1 = Regexp.union(hash.keys.map { |k| /(?<=^|\s)#{k.to_s}(?=\s|$)/ })
str.scan(r1)
#=> ["sixty"]
(?<=^|\s) is a positive lookbehind; (?=\s|$) is a positive lookahead.
Well, that avoided the match of "twenty-one" (good), but we no longer matched "one" or "two" (bad) because of the comma following each of those words in the string.
Perhaps the solution here is to first remove punctuation, which allows us to then apply either of the above regexes:
str.tr('.,?!:;-','')
#=> "I only have one two twentyone or maybe sixty"
str.tr('.,?!:;-','').scan(r0)
#=> ["one", "two", "sixty"]
str.tr('.,?!:;-','').scan(r1)
#=> ["one", "two", "sixty"]
You may also want to change / at the end of the regex to /i to make the match insensitive to case.1
1 Historical note for readers who want to know why 'a' is called lower case and 'A' is called upper case.

How can I upcase first occurrence of an alphabet in alphanumeric string?

Is there any easy way to convert strings like 3500goat to 3500Goat and goat350rat to Goat350rat?
I am trying to convert the first occurrence of alphabet in an alphanumeric string to uppercase. I was trying the code below using the method sub, but no luck.
stringtomigrate = 3500goat
stringtomigrate.sub!(/\D{0,1}/) do |w|
w.capitalize
This should work:
string.sub(/[a-zA-Z]/) { |s| s.upcase }
or a shorthand:
string.sub(/[a-zA-Z]/, &:upcase)
examples:
'3500goat'.sub(/[a-zA-Z]/, &:upcase)
# => "3500Goat"
'goat350rat'.sub(/[a-zA-Z]/, &:upcase)
# => "Goat350rat"
Try this
1.9.3-p545 :060 > require 'active_support/core_ext'
=> true
1.9.3-p545 :099 > "goat350rat to Goat350rat".sub(/[a-zA-Z]/){ |x| x.titleize}
=> "Goat350rat to Goat350rat"

Faster to consolidate regexes in a hash of regexes in Ruby?

I have a string that I want to map to an integer. Many strings can map to the same integer, so I'm using regexes to match strings that should map to the same integer.
Example:
str = "hello"
REGEXES.each do |key, val|
if str =~ key
print val
end
end
where REGEXES is a hash that maps regex to integer.
Which is better:
REGEXES = [/hello/ => 2, /foo/ => 2, /bar/ => 3]
or
REGEXES = [/(hello|foo)/ => 2, /bar/ => 3]
Benchmark is your friend:
require 'benchmark'
str = 'hello'
num = 1000000
Benchmark.bmbm do |x|
x.report('individual keys:') do
regexes = [/hello/ => 2, /foo/ => 2, /bar/ => 3]
num.times do
regexes.each {|key, val| str =~ key}
end
end
x.report('combined keys: ') do
regexes = [/(hello|foo)/ => 2, /bar/ => 3]
num.times do
regexes.each {|key, val| str =~ key}
end
end
end
Result:
Rehearsal ----------------------------------------------------
individual keys:   1.600000   0.010000   1.610000 (  1.780246)
combined keys:     1.610000   0.010000   1.620000 (  1.761067)
------------------------------------------- total: 3.230000sec
                       user     system      total        real
individual keys:   1.570000   0.000000   1.570000 (  1.589879)
combined keys:     1.590000   0.010000   1.600000 (  1.678724)
As you can see, in this case there's not much difference.
I'd suggest that you try it with your full hash of regexes/integers, and see if the difference is more significant. If there is, well, there's your answer. If there's not, you can probably feel free to use whichever makes more sense.
I'd go with second version since it'll reduce loop cycles count. Still if there're too much values mapping to the same int, you can split them up in separate hash elements.

Resources